From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH] mm: Warn about costly page allocation
Date: Mon, 9 Jul 2012 17:46:57 +0900 [thread overview]
Message-ID: <20120709084657.GA7915@bbox> (raw)
In-Reply-To: <20120709082200.GX14154@suse.de>
Hi Mel,
On Mon, Jul 09, 2012 at 09:22:00AM +0100, Mel Gorman wrote:
> On Mon, Jul 09, 2012 at 11:38:20AM +0900, Minchan Kim wrote:
> > Since lumpy reclaim was introduced at 2.6.23, it helped higher
> > order allocation.
> > Recently, we removed it at 3.4 and we didn't enable compaction
> > forcingly[1]. The reason makes sense that compaction.o + migration.o
> > isn't trivial for system doesn't use higher order allocation.
> > But the problem is that we have to enable compaction explicitly
> > while lumpy reclaim enabled unconditionally.
> >
> > Normally, admin doesn't know his system have used higher order
> > allocation and even lumpy reclaim have helped it.
> > Admin in embdded system have a tendency to minimise code size so that
> > they can disable compaction. In this case, we can see page allocation
> > failure we can never see in the past. It's critical on embedded side
> > because...
> >
> > Let's think this scenario.
> >
> > There is QA team in embedded company and they have tested their product.
> > In test scenario, they can allocate 100 high order allocation.
> > (they don't matter how many high order allocations in kernel are needed
> > during test. their concern is just only working well or fail of their
> > middleware/application) High order allocation will be serviced well
> > by natural buddy allocation without lumpy's help. So they released
> > the product and sold out all over the world.
> > Unfortunately, in real practice, sometime, 105 high order allocation was
> > needed rarely and fortunately, lumpy reclaim could help it so the product
> > doesn't have a problem until now.
> >
> > If they use latest kernel, they will see the new config CONFIG_COMPACTION
> > which is very poor documentation, and they can't know it's replacement of
> > lumpy reclaim(even, they don't know lumpy reclaim) so they simply disable
>
> Depending on lumpy reclaim or compaction for high-order kernel allocations
> is dangerous. Both depend on being able to move MIGRATE_MOVABLE allocations
> to satisy the high-order allocation. If used regularly for high-order kernel
> allocations and they are long-lived, the system will eventually be unable
> to grant these allocations, with or without compaction or lumpy reclaim.
Indeed.
>
> Be also aware that lumpy reclaim was very aggressive when reclaiming pages
> to satisfy an allocation. Compaction is not and compaction can be temporarily
> disabled if an allocation attempt fails. If lumpy reclaim was being depended
> upon to satisfy high-order allocations, there is no guarantee, particularly
> with 3.4, that compaction will succeed as it does not reclaim aggressively.
It's good explanation and let's add it in description.
>
> > that option for size optimization. Of course, QA team still test it but they
> > can't find the problem if they don't do test stronger than old.
> > It ends up release the product and sold out all over the world, again.
> > But in this time, we don't have both lumpy and compaction so the problem
> > would happen in real practice. A poor enginner from Korea have to flight
> > to the USA for the fix a ton of products. Otherwise, should recall products
> > from all over the world. Maybe he can lose a job. :(
> >
> > This patch adds warning for notice. If the system try to allocate
> > PAGE_ALLOC_COSTLY_ORDER above page and system enters reclaim path,
> > it emits the warning. At least, it gives a chance to look into their
> > system before the relase.
> >
> > This patch avoids false positive by alloc_large_system_hash which
> > allocates with GFP_ATOMIC and a fallback mechanism so it can make
> > this warning useless.
> >
> > [1] c53919ad(mm: vmscan: remove lumpy reclaim)
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> > mm/page_alloc.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index a4d3a19..1155e00 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2276,6 +2276,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > return alloc_flags;
> > }
> >
> > +#if defined(CONFIG_DEBUG_VM) && !defined(CONFIG_COMPACTION)
> > +static inline void check_page_alloc_costly_order(unsigned int order)
> > +{
> > + if (unlikely(order > PAGE_ALLOC_COSTLY_ORDER)) {
> > + printk_once("WARNING: You are tring to allocate %d-order page."
> > + " You might need to turn on CONFIG_COMPACTION\n", order);
> > + }
>
> WARN_ON_ONCE would tell you what is trying to satisfy the allocation.
Do you mean that it would be better to use WARN_ON_ONCE rather than raw printk?
If so, I would like to insist raw printk because WARN_ON_ONCE could be disabled
by !CONFIG_BUG.
If I miss something, could you elaborate it more?
>
> It should further check if this is a GFP_MOVABLE allocation or not and if
> not, then it should either be documented that compaction may only delay
> allocation failures and that they may need to consider reserving the memory
> in advance or doing something like forcing MIGRATE_RESERVE to only be used
> for high-order allocations.
Okay. but I got confused you want to add above description in code directly
like below or write it down in comment of check_page_alloc_costly_order?
static inline void check_page_alloc_costly_order(unsigned int order, gfp_t gfp_flags)
{
if (unlikely(order > PAGE_ALLOC_COSTLY_ORDER)) {
printk_once("WARNING: You are tring to allocate %d-order page."
" You might need to turn on CONFIG_COMPACTION\n", order);
if (gfp_flags is not GFP_MOVABLE)
printk_once("Compaction doesn't make sure .....\n");
}
}
Thanks for the comment, Mel.
>
> > +}
> > +#else
> > +static inline void check_page_alloc_costly_order(unsigned int order)
> > +{
> > +}
> > +#endif
> > +
> > static inline struct page *
> > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > struct zonelist *zonelist, enum zone_type high_zoneidx,
> > @@ -2353,6 +2367,8 @@ rebalance:
> > if (!wait)
> > goto nopage;
> >
> > + check_page_alloc_costly_order(order);
> > +
> > /* Avoid recursion of direct reclaim */
> > if (current->flags & PF_MEMALLOC)
> > goto nopage;
> > --
> > 1.7.9.5
> >
>
> --
> Mel Gorman
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-07-09 8:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-09 2:38 [PATCH] mm: Warn about costly page allocation Minchan Kim
2012-07-09 8:22 ` Mel Gorman
2012-07-09 8:46 ` Minchan Kim [this message]
2012-07-09 9:12 ` Mel Gorman
2012-07-09 12:50 ` Minchan Kim
2012-07-09 13:05 ` Mel Gorman
2012-07-09 13:19 ` Minchan Kim
2012-07-09 20:53 ` Andrew Morton
2012-07-09 12:53 ` Cong Wang
2012-07-09 14:12 ` Minchan Kim
2012-07-11 2:45 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120709084657.GA7915@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).