All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: clameter@sgi.com, akpm@linux-foundation.org, linux-mm@kvack.org,
	ak@suse.de
Subject: Re: [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes
Date: Wed, 13 Jun 2007 14:57:50 -0700	[thread overview]
Message-ID: <20070613215750.GU3798@us.ibm.com> (raw)
In-Reply-To: <1181769033.6148.116.camel@localhost>

On 13.06.2007 [17:10:32 -0400], Lee Schermerhorn wrote:
> On Tue, 2007-06-12 at 13:48 -0700, clameter@sgi.com wrote:
> > GFP_THISNODE checks that the zone selected is within the pgdat (node) of the
> > first zone of a nodelist. That only works if the node has memory. A
> > memoryless node will have its first zone on another pgdat (node).
> > 
> > Thus GFP_THISNODE may be returning memory on other nodes.
> > GFP_THISNODE should fail if there is no local memory on a node.
> > 
> > So we add a check to verify that the node specified has memory in
> > alloc_pages_node(). If the node has no memory then return NULL.
> > 
> > The case of alloc_pages(GFP_THISNODE) is not changed. alloc_pages() (with
> > no memory policies in effect) is understood to prefer the current node.
> > If a process is running on a node with no memory then its default allocations
> > come from the next neighboring node. GFP_THISNODE will then force the memory
> > to come from that node.
> > 
> > Signed-off-by: Christoph Lameter <clameter@sgi.com>
> > Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > 
> > Index: linux-2.6.22-rc4-mm2/include/linux/gfp.h
> > ===================================================================
> > --- linux-2.6.22-rc4-mm2.orig/include/linux/gfp.h	2007-06-12 12:33:37.000000000 -0700
> > +++ linux-2.6.22-rc4-mm2/include/linux/gfp.h	2007-06-12 12:38:37.000000000 -0700
> > @@ -175,6 +175,13 @@ static inline struct page *alloc_pages_n
> >  	if (nid < 0)
> >  		nid = numa_node_id();
> >  
> > +	/*
> > +	 * Check for the special case that GFP_THISNODE is used on a
> > +	 * memoryless node
> > +	 */
> > +	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> > +		return NULL;
> > +
> >  	return __alloc_pages(gfp_mask, order,
> >  		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> >  }
> > 
> 
> Attached patch fixes alloc_pages_node() so that it never returns an
> off-node page when GFP_THISNODE is specified by.  This requires a fix to
> SLUB early allocation, included in the patch.  Works on HP ia64 platform
> with small DMA only node and "zone order" zonelists.  Will test on
> x86_64 real soon now...
> 
> ---
> 
> PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists
> 
> The map of nodes with memory may include nodes with just
> DMA/DMA32 memory.  Using this map/mask together with
> GFP_THISNODE will not guarantee on-node allocations at higher
> zones.  Modify checks in alloc_pages_node() to ensure that the
> first zone in the selected zonelist is "on-node".
> 
> This change will result in alloc_pages_node() returning NULL
> when GFP_THISNODE is specified and the first zone in the zonelist
> selected by (nid, gfp_zone(gfp_mask) is not on node 'nid'.  This,
> in turn, BUGs out in slub.c:early_kmem_cache_node_alloc() which
> apparently can't handle a NULL page from new_slab().  Fix SLUB
> to handle NULL page in early allocation.
> 
> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/gfp.h |   11 ++++++++---
>  mm/slub.c           |   22 ++++++++++++----------
>  2 files changed, 20 insertions(+), 13 deletions(-)
> 
> Index: Linux/include/linux/gfp.h
> ===================================================================
> --- Linux.orig/include/linux/gfp.h	2007-06-13 16:36:02.000000000 -0400
> +++ Linux/include/linux/gfp.h	2007-06-13 16:38:41.000000000 -0400
> @@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned i
>  static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> +	pg_data_t *pgdat;
> +	struct zonelist *zonelist;
> +
>  	if (unlikely(order >= MAX_ORDER))
>  		return NULL;
> 
> @@ -179,11 +182,13 @@ static inline struct page *alloc_pages_n
>  	 * Check for the special case that GFP_THISNODE is used on a
>  	 * memoryless node
>  	 */
> -	if ((gfp_mask & __GFP_THISNODE) && !node_memory(nid))
> +	pgdat = NODE_DATA(nid);
> +	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
> +	if ((gfp_mask & __GFP_THISNODE) &&
> +		pgdat != zonelist->zones[0]->zone_pgdat)

This hunk won't apply on top of the unlikely() change... Rediffed below,
adding into the set to test now...

From: Lee Schermerhorn <lee.schermerhorn@hp.com>

PATCH  fix GFP_THISNODE for DMA only nodes and zone-order zonelists

The map of nodes with memory may include nodes with just
DMA/DMA32 memory.  Using this map/mask together with
GFP_THISNODE will not guarantee on-node allocations at higher
zones.  Modify checks in alloc_pages_node() to ensure that the
first zone in the selected zonelist is "on-node".

This change will result in alloc_pages_node() returning NULL when
GFP_THISNODE is specified and the first zone in the zonelist selected by
(nid, gfp_zone(gfp_mask) is not on node 'nid'.  This, in turn, BUGs out
in slub.c:early_kmem_cache_node_alloc() which apparently can't handle a
NULL page from new_slab().  Fix SLUB to handle NULL page in early
allocation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ecd8adb..98a76c9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -168,6 +168,9 @@ FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));
 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 						unsigned int order)
 {
+	pg_data_t *pgdat;
+	struct zonelist *zonelist;
+
 	if (unlikely(order >= MAX_ORDER))
 		return NULL;
 
@@ -175,15 +178,17 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
 	if (nid < 0)
 		nid = numa_node_id();
 
+	pgdat = NODE_DATA(nid);
+	zonelist = pgdat->node_zonelists + gfp_zone(gfp_mask);
 	/*
 	 * Check for the special case that GFP_THISNODE is used on a
 	 * memoryless node
 	 */
-	if (unlikely((gfp_mask & __GFP_THISNODE) && !node_memory(nid)))
+	if (unlikely((gfp_mask & __GFP_THISNODE) &&
+		pgdat != zonelist->zones[0]->zone_pgdat))
 		return NULL;
 
-	return __alloc_pages(gfp_mask, order,
-		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
+	return __alloc_pages(gfp_mask, order, zonelist);
 }
 
 #ifdef CONFIG_NUMA
diff --git a/mm/slub.c b/mm/slub.c
index c16b75d..43a9270 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1870,16 +1870,18 @@ static struct kmem_cache_node * __init early_kmem_cache_node_alloc(gfp_t gfpflag
 	/* new_slab() disables interupts */
 	local_irq_enable();
 
-	BUG_ON(!page);
-	n = page->freelist;
-	BUG_ON(!n);
-	page->freelist = get_freepointer(kmalloc_caches, n);
-	page->inuse++;
-	kmalloc_caches->node[node] = n;
-	setup_object_debug(kmalloc_caches, page, n);
-	init_kmem_cache_node(n);
-	atomic_long_inc(&n->nr_slabs);
-	add_partial(n, page);
+	if (page) {
+		n = page->freelist;
+		BUG_ON(!n);
+		page->freelist = get_freepointer(kmalloc_caches, n);
+		page->inuse++;
+		kmalloc_caches->node[node] = n;
+		setup_object_debug(kmalloc_caches, page, n);
+		init_kmem_cache_node(n);
+		atomic_long_inc(&n->nr_slabs);
+		add_partial(n, page);
+	} else
+		kmalloc_caches->node[node] = NULL;
 	return n;
 }
 

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-06-13 20:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-12 20:48 [patch 0/3] Fixes for NUMA allocations on memoryless nodes clameter
2007-06-12 20:48 ` [patch 1/3] NUMA: introduce node_memory_map clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:08     ` Christoph Lameter
2007-06-12 21:10       ` David Rientjes
2007-06-12 21:27         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:37             ` Christoph Lameter
2007-06-12 21:38               ` David Rientjes
2007-06-12 21:36         ` Nishanth Aravamudan
2007-06-12 21:39           ` Christoph Lameter
2007-06-12 21:42             ` Nishanth Aravamudan
2007-06-12 21:45               ` David Rientjes
2007-06-12 22:26               ` Christoph Lameter
2007-06-12 22:32                 ` Nishanth Aravamudan
2007-06-13  9:14           ` Andy Whitcroft
2007-06-12 20:48 ` [patch 2/3] Fix GFP_THISNODE behavior for memoryless nodes clameter
2007-06-12 21:03   ` David Rientjes
2007-06-12 21:07     ` Christoph Lameter
2007-06-12 21:08       ` David Rientjes
2007-06-12 21:25         ` Christoph Lameter
2007-06-12 21:34           ` David Rientjes
2007-06-12 21:51             ` Nishanth Aravamudan
2007-06-13 21:10   ` Lee Schermerhorn
2007-06-13 21:57     ` Nishanth Aravamudan [this message]
2007-06-13 22:46     ` Christoph Lameter
2007-06-13 23:11       ` Nishanth Aravamudan
2007-06-13 23:15         ` Christoph Lameter
2007-06-13 23:20           ` Nishanth Aravamudan
2007-06-13 23:26             ` Christoph Lameter
2007-06-13 23:32               ` Nishanth Aravamudan
2007-06-13 23:53                 ` Christoph Lameter
2007-06-14  0:04                   ` Nishanth Aravamudan
2007-06-14 14:18       ` Lee Schermerhorn
2007-06-14 14:24         ` Christoph Lameter
2007-06-14 14:55           ` Lee Schermerhorn
2007-06-14 15:51             ` Christoph Lameter
2007-06-14  7:07     ` Christoph Lameter
2007-06-14 14:23       ` Nishanth Aravamudan
2007-06-12 20:48 ` [patch 3/3] Fix MPOL_INTERLEAVE " clameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070613215750.GU3798@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.