From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Lee.Schermerhorn@hp.com, anton@samba.org, apw@shadowen.org,
mel@csn.ul.ie, akpm@linux-foundation.org, linux-mm@kvack.org
Subject: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
Date: Thu, 7 Jun 2007 15:01:49 -0700 [thread overview]
Message-ID: <20070607220149.GC15776@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0706071103240.24988@schroedinger.engr.sgi.com>
On 07.06.2007 [11:11:02 -0700], Christoph Lameter wrote:
> On Thu, 7 Jun 2007, Nishanth Aravamudan wrote:
>
> > While testing my sysfs per-node hugepage allocator
> > (http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that
> > an alloc_pages_node(nid, GFP_THISNODE) request would sometimes
> > return a struct page such that page_to_nid(page) != nid. This was
> > because, on that particular machine, nodes 0 and 1 are populated and
> > nodes 2 and 3 are not. When a page is requested
> > get_page_from_freelist() relies on zonelist->zones[0]->zone_pgdat
> > indicating when THISNODE stops. But, because, say, node 2 has no
> > memory, the first zone_pgdat in the fallback list points to a
> > different node. Add a comment indicating that THISNODE may not
> > return pages on THISNODE if the node is unpopulated.
>
> Hmmm.... Bad semantics are developing as a result of allowing empty
> nodes with no zones. This is not correct and can have bad
> consequences.
I won't argue with you.
> As I expected: We may need more hacks to deal with it. Sigh.
Yes.
> > Am working on testing Lee/Anton's patch to add a node_populated_mask
> > and use that in the hugepage allocator path. But I think this may be
> > a problem anywhere THISNODE is used and memory is expected to come
> > from the requested node and nowhere else.
>
> What GFP_THISNODE effectively does now is to require the allocation on
> the nearest available zone to the indicated node because it does not
> allow access outside of the first encountered pgdat. But the first
> pgdat is not the node we selected if the node has no memory.
Right, that's the fallback list, right? And for unpopulated zones,
zonelist->zone[0]->zone_pgdat can refer to a different node.
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 0d2ef0b..ed826e9 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -67,6 +67,10 @@ struct vm_area_struct;
> > __GFP_HIGHMEM)
> >
> > #ifdef CONFIG_NUMA
> > +/*
> > + * NOTE: if the requested node is unpopulated (no memory), a THISNODE
> > + * request can go to other nodes due to the fallback list
>
> Change to
>
> Note: GFP_THISNODE allocates from the first available pgdat (== node
> structure) from the zonelist of a given node. The first pgdat may be the
> pgdat of another node if the node has no memory on its own.
Changed below.
gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
While testing my sysfs per-node hugepage allocator
(http://marc.info/?l=linux-mm&m=117935849517122&w=2), I found that an
alloc_pages_node(nid, GFP_THISNODE) request would sometimes return a
struct page such that page_to_nid(page) != nid. This was because, on
that particular machine, nodes 0 and 1 are populated and nodes 2 and 3
are not. When a page is requested get_page_from_freelist() relies on
zonelist->zones[0]->zone_pgdat indicating when THISNODE stops. But,
because, say, node 2 has no memory, the first zone_pgdat in the fallback
list points to a different node. Add a comment indicating that THISNODE
may not return pages on THISNODE if the node is unpopulated.
Am working on testing Lee/Anton's patch to add a node_populated_mask and
use that in the hugepage allocator path. But I think this may be a
problem anywhere THISNODE is used and memory is expected to come from
the requested node and nowhere else.
Reworked the comment based on feedback from Christoph Lameter.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ed826e9..996cf08 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -68,8 +68,10 @@ struct vm_area_struct;
#ifdef CONFIG_NUMA
/*
- * NOTE: if the requested node is unpopulated (no memory), a THISNODE
- * request can go to other nodes due to the fallback list
+ * NOTE: GFP_THISNODE allocates from the first available pgdat (== node
+ * structure) from the zonelist of the requested node. The first pgdat
+ * may be the pgdat of another node if the requested node has no memory
+ * on its own.
*/
#define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY)
#else
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-07 20:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-07 15:04 [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated Nishanth Aravamudan
2007-06-07 18:11 ` Christoph Lameter
2007-06-07 22:01 ` Nishanth Aravamudan [this message]
2007-06-07 22:05 ` [PATCH v2] " Christoph Lameter
2007-06-07 22:16 ` Nishanth Aravamudan
2007-06-11 12:49 ` Andy Whitcroft
2007-06-11 16:12 ` Christoph Lameter
2007-06-11 16:42 ` Christoph Lameter
2007-06-11 17:12 ` Nishanth Aravamudan
2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:46 ` Nishanth Aravamudan
2007-06-11 18:54 ` Christoph Lameter
2007-06-11 19:36 ` Nishanth Aravamudan
2007-06-11 19:43 ` Christoph Lameter
2007-06-11 20:18 ` Nishanth Aravamudan
2007-06-11 18:23 ` Lee Schermerhorn
2007-06-11 18:40 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070607220149.GC15776@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.