linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Zhang Yanmin <yanmin_zhang@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 02/22] Do not sanity check order in the fast path
Date: Fri, 24 Apr 2009 11:34:05 +0100	[thread overview]
Message-ID: <20090424103405.GC14283@csn.ul.ie> (raw)
In-Reply-To: <1240508211.10627.139.camel@nimitz>

On Thu, Apr 23, 2009 at 10:36:50AM -0700, Dave Hansen wrote:
> On Thu, 2009-04-23 at 10:58 +0100, Mel Gorman wrote:
> > > How about this:  I'll go and audit the use of order in page_alloc.c to
> > > make sure that having an order>MAX_ORDER-1 floating around is OK and
> > > won't break anything. 
> > 
> > Great. Right now, I think it's ok but I haven't audited for this
> > explicily and a second set of eyes never hurts.
> 
> OK, after looking through this, I have a couple of ideas.  One is that
> we do the MAX_ORDER check in __alloc_pages_internal(), but *after* the
> first call to get_page_from_freelist().  That's because I'm worried if
> we ever got into the reclaim code with a >MAX_ORDER 'order'.  Such as:
> 
> void wakeup_kswapd(struct zone *zone, int order)
> {
> ...
>         if (pgdat->kswapd_max_order < order)
>                 pgdat->kswapd_max_order = order;
>         if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>                 return;
>         if (!waitqueue_active(&pgdat->kswapd_wait))
>                 return;
>         wake_up_interruptible(&pgdat->kswapd_wait);
> }
> 
> unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>                                 gfp_t gfp_mask, nodemask_t *nodemask)
> {
>         struct scan_control sc = {
> ...
>                 .order = order,
>                 .mem_cgroup = NULL,
>                 .isolate_pages = isolate_pages_global,
>                 .nodemask = nodemask,
>         };
> 
>         return do_try_to_free_pages(zonelist, &sc);
> }
> 

That is perfectly rational.

> This will keep us only checking 'order' once for each
> alloc_pages_internal() call.  It is an extra branch, but it is out of
> the really, really hot path since we're about to start reclaim here
> anyway.
> 

I combined yours and Andrew's suggestions into a patch that applies on
top of the series. Dave, as it's basically your patch it needs your
sign-off if you agree it's ok.

I tested this with a bodge that allocates with increasing orders up to a
very large number.  As you'd expect, it worked until it hit order-11 on an
x86 machine and failed for every higher order by returning NULL.

It reports a warning once. We'll probably drop the warning in time but this
will be a chance to check if there are callers that really are being stupid
and are not just callers that are trying to get the best buffer for the job.

=====
Sanity check order in the page allocator slow path

Callers may speculatively call different allocators in order of preference
trying to allocate a buffer of a given size. The order needed to allocate
this may be larger than what the page allocator can normally handle. While
the allocator mostly does the right thing, it should not direct reclaim or
wakeup kswapd with a bogus order. This patch sanity checks the order in the
slow path and returns NULL if it is too large.

Needs-signed-off-by from Dave Hansen here before merging. Based on his
not-signed-off-by patch.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1464aca..1c60141 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1434,7 +1434,6 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 	int did_zlc_setup = 0;		/* just call zlc_setup() one time */
 
 	classzone_idx = zone_idx(preferred_zone);
-	VM_BUG_ON(order >= MAX_ORDER);
 
 zonelist_scan:
 	/*
@@ -1692,6 +1691,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	struct task_struct *p = current;
 
 	/*
+	 * In the slowpath, we sanity check order to avoid ever trying to
+	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
+	 * be using allocators in order of preference for an area that is
+	 * too large. 
+	 */
+	if (WARN_ON_ONCE(order >= MAX_ORDER))
+		return NULL;
+
+	/*
 	 * GFP_THISNODE (meaning __GFP_THISNODE, __GFP_NORETRY and
 	 * __GFP_NOWARN set) should not cause reclaim since the subsystem
 	 * (f.e. slab) using GFP_THISNODE may choose to trigger reclaim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-04-24 10:33 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22 13:53 [PATCH 00/22] Cleanup and optimise the page allocator V7 Mel Gorman
2009-04-22 13:53 ` [PATCH 01/22] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-04-22 13:53 ` [PATCH 02/22] Do not sanity check order in the fast path Mel Gorman
2009-04-22 16:13   ` Dave Hansen
2009-04-22 17:11     ` Mel Gorman
2009-04-22 17:30       ` Dave Hansen
2009-04-23  0:13         ` Mel Gorman
2009-04-23  1:34           ` Dave Hansen
2009-04-23  9:58             ` Mel Gorman
2009-04-23 17:36               ` Dave Hansen
2009-04-24  2:57                 ` KOSAKI Motohiro
2009-04-24 10:34                 ` Mel Gorman [this message]
2009-04-24 14:16                   ` Dave Hansen
2009-04-23 19:26             ` Dave Hansen
2009-04-23 19:45               ` Dave Hansen
2009-04-24  9:21                 ` Mel Gorman
2009-04-24 14:25                   ` Dave Hansen
2009-04-22 20:11       ` David Rientjes
2009-04-22 20:20         ` Christoph Lameter
2009-04-23  7:44         ` Pekka Enberg
2009-04-23 22:44       ` Andrew Morton
2009-04-22 13:53 ` [PATCH 03/22] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-04-22 13:53 ` [PATCH 04/22] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-04-22 13:53 ` [PATCH 05/22] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-04-22 13:53 ` [PATCH 06/22] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-04-22 13:53 ` [PATCH 07/22] Calculate the preferred zone for allocation only once Mel Gorman
2009-04-23 22:48   ` Andrew Morton
2009-04-22 13:53 ` [PATCH 08/22] Calculate the migratetype " Mel Gorman
2009-04-22 13:53 ` [PATCH 09/22] Calculate the alloc_flags " Mel Gorman
2009-04-23 22:52   ` Andrew Morton
2009-04-24 10:47     ` Mel Gorman
2009-04-24 17:51       ` Andrew Morton
2009-04-22 13:53 ` [PATCH 10/22] Remove a branch by assuming __GFP_HIGH == ALLOC_HIGH Mel Gorman
2009-04-22 13:53 ` [PATCH 11/22] Inline __rmqueue_smallest() Mel Gorman
2009-04-22 13:53 ` [PATCH 12/22] Inline buffered_rmqueue() Mel Gorman
2009-04-22 13:53 ` [PATCH 13/22] Inline __rmqueue_fallback() Mel Gorman
2009-04-22 13:53 ` [PATCH 14/22] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-04-22 13:53 ` [PATCH 15/22] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-04-23 22:59   ` Andrew Morton
2009-04-24  0:07     ` KOSAKI Motohiro
2009-04-24  0:33     ` KOSAKI Motohiro
2009-04-24 11:33       ` Mel Gorman
2009-04-24 11:52         ` Lee Schermerhorn
2009-04-24 11:18     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 16/22] Do not setup zonelist cache when there is only one node Mel Gorman
2009-04-22 20:24   ` David Rientjes
2009-04-22 20:32     ` Lee Schermerhorn
2009-04-22 20:34       ` David Rientjes
2009-04-23  0:11         ` KOSAKI Motohiro
2009-04-23  0:19     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 17/22] Do not check for compound pages during the page allocator sanity checks Mel Gorman
2009-04-22 13:53 ` [PATCH 18/22] Use allocation flags as an index to the zone watermark Mel Gorman
2009-04-22 17:11   ` Dave Hansen
2009-04-22 17:14     ` Mel Gorman
2009-04-22 17:47       ` Dave Hansen
2009-04-23  0:27         ` KOSAKI Motohiro
2009-04-23 10:03           ` Mel Gorman
2009-04-24  6:41             ` KOSAKI Motohiro
2009-04-22 20:06   ` David Rientjes
2009-04-23  0:29     ` Mel Gorman
2009-04-27 17:00     ` [RFC] Replace the watermark-related union in struct zone with a watermark[] array Mel Gorman
2009-04-27 20:48       ` David Rientjes
2009-04-27 20:54         ` Mel Gorman
2009-04-27 20:51           ` Christoph Lameter
2009-04-27 21:04           ` David Rientjes
2009-04-30 13:35             ` Mel Gorman
2009-04-30 13:48               ` Dave Hansen
2009-05-12 14:13                 ` [RFC] Replace the watermark-related union in struct zone with a watermark[] array V2 Mel Gorman
2009-05-12 15:05                   ` [RFC] Replace the watermark-related union in struct zone with awatermark[] " Dave Hansen
2009-05-13  8:31                   ` [RFC] Replace the watermark-related union in struct zone with a watermark[] " KOSAKI Motohiro
2009-04-22 13:53 ` [PATCH 19/22] Update NR_FREE_PAGES only as necessary Mel Gorman
2009-04-23 23:06   ` Andrew Morton
2009-04-23 23:04     ` Christoph Lameter
2009-04-24 13:06     ` Mel Gorman
2009-04-22 13:53 ` [PATCH 20/22] Get the pageblock migratetype without disabling interrupts Mel Gorman
2009-04-22 13:53 ` [PATCH 21/22] Use a pre-calculated value instead of num_online_nodes() in fast paths Mel Gorman
2009-04-22 23:04   ` David Rientjes
2009-04-23  0:44     ` Mel Gorman
2009-04-23 19:29       ` David Rientjes
2009-04-24 13:31         ` [PATCH] Do not override definition of node_set_online() with macro Mel Gorman
2009-04-22 13:53 ` [PATCH 22/22] slab: Use nr_online_nodes to check for a NUMA platform Mel Gorman
2009-04-22 14:37   ` Pekka Enberg
2009-04-27  7:58 ` [PATCH 00/22] Cleanup and optimise the page allocator V7 Zhang, Yanmin
2009-04-27 14:38   ` Mel Gorman
2009-04-28  1:59     ` Zhang, Yanmin
2009-04-28 10:27       ` Mel Gorman
2009-04-28 10:31       ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() Mel Gorman
2009-04-28 16:37         ` Christoph Lameter
2009-04-28 16:51           ` Mel Gorman
2009-04-28 17:15             ` Hugh Dickins
2009-04-28 18:07               ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() V2 Mel Gorman
2009-04-28 18:25                 ` Hugh Dickins
2009-04-28 18:36               ` [PATCH] Properly account for freed pages in free_pages_bulk() and when allocating high-order pages in buffered_rmqueue() Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090424103405.GC14283@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.m.lin@intel.com \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=peterz@infradead.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).