From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>,
Lee.Schermerhorn@hp.com, ak@suse.de, anton@samba.org,
mel@csn.ul.ie, akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated
Date: Mon, 11 Jun 2007 11:46:56 -0700 [thread overview]
Message-ID: <20070611184656.GA9920@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0706111122010.18327@schroedinger.engr.sgi.com>
On 11.06.2007 [11:29:14 -0700], Christoph Lameter wrote:
> On Mon, 11 Jun 2007, Nishanth Aravamudan wrote:
>
> > These are the exact semantics, I expected. so I'll be happy to test/work
> > on these fixes.
> >
> > This would also make it unnecessary to add the populated checks in
> > various places, I think, as THISNODE will mean ONLYTHISNODE (and perhaps
> > should be renamed in the series).
>
> Here is a draft on how this could work:
>
> It seems that MPOL_BIND already has the fixes needed. Adding empty zones
> does not look that easy.
> So I added a check interleave_nodes to verify that the selected node has
> memory. If not skip it.
>
> This does not work for the address based interleaving for anonymous vmas.
> I am not sure what to do there. We could change the calculation of the
> node to be based only on nodes with memory and then skip the memoryless
> ones. I have only added a comment to describe its brokennes for now.
>
> Then there is the original issue of GFP_THISNODE. I added a check in
> alloc_pages_node to verify that the node has memory. If not then fail.
> This will fix the alloc_pages_node case but not the alloc_pages() case.
>
> In the alloc_pages() case we do not specify a node. Implicitly it is
> understood that we (in the case of no memory policy / cpuset options)
> allocate from the nearest node. So it may be argued there that the
> GFP_THISNODE behavior of taking the first node from the zonelist is okay.
>
> There is some reason to worry about the performance impact from adding the
> check in alloc_pages_node and interleave. If the code is left in
> alloc_pages_node then alloc_pages_node should be uninlined and moved to
> mempolicy.c
>
> Index: linux-2.6/mm/mempolicy.c
> ===================================================================
> --- linux-2.6.orig/mm/mempolicy.c 2007-06-11 11:13:09.000000000 -0700
> +++ linux-2.6/mm/mempolicy.c 2007-06-11 11:19:03.000000000 -0700
> @@ -1125,9 +1125,11 @@ static unsigned interleave_nodes(struct
> struct task_struct *me = current;
>
> nid = me->il_next;
> - next = next_node(nid, policy->v.nodes);
> - if (next >= MAX_NUMNODES)
> - next = first_node(policy->v.nodes);
> + do {
> + next = next_node(nid, policy->v.nodes);
> + if (next >= MAX_NUMNODES)
> + next = first_node(policy->v.nodes);
> + } while (!NODE_DATA(node)->present_pages);
If something like Lee/Anton's patch were to go in (which, as Lee pointed
out, I refreshed as Patch 1/3 in the series I posted a few days ago),
this would be
while(!node_populated(nid))
> me->il_next = next;
> return nid;
> }
> @@ -1191,6 +1193,11 @@ static inline unsigned interleave_nid(st
> * for huge pages, since vm_pgoff is in units of small
> * pages, we need to shift off the always 0 bits to get
> * a useful offset.
> + *
> + * For configurations with memoryless nodes this is broken
> + * since the allocation attempts on that node will fall
> + * back to other nodes and thus one neighboring node
> + * will be overallocated from.
> */
> BUG_ON(shift < PAGE_SHIFT);
> off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
> Index: linux-2.6/include/linux/gfp.h
> ===================================================================
> --- linux-2.6.orig/include/linux/gfp.h 2007-06-11 11:19:32.000000000 -0700
> +++ linux-2.6/include/linux/gfp.h 2007-06-11 11:21:33.000000000 -0700
> @@ -134,6 +134,9 @@ static inline struct page *alloc_pages_n
> if (nid < 0)
> nid = numa_node_id();
>
> + if ((gfp_mask & __GFP_THISNODE) && !NODE_DATA(nid)->present_pages)
> + return NULL;
> +
similarly,
if ((gfp_mask & __GFP_THISNODE) && !node_populated(nid))
Presuming I understand everything correctly. Not sure which would be
preferred, or if perhaps node_populated, rather than using a nodemask
should just use NODE_DATA(nid)->present_pages?
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-11 18:48 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-07 15:04 [PATCH] gfp.h: GFP_THISNODE can go to other nodes if some are unpopulated Nishanth Aravamudan
2007-06-07 18:11 ` Christoph Lameter
2007-06-07 22:01 ` [PATCH v2] " Nishanth Aravamudan
2007-06-07 22:05 ` Christoph Lameter
2007-06-07 22:16 ` Nishanth Aravamudan
2007-06-11 12:49 ` Andy Whitcroft
2007-06-11 16:12 ` Christoph Lameter
2007-06-11 16:42 ` Christoph Lameter
2007-06-11 17:12 ` Nishanth Aravamudan
2007-06-11 18:29 ` Christoph Lameter
2007-06-11 18:46 ` Nishanth Aravamudan [this message]
2007-06-11 18:54 ` Christoph Lameter
2007-06-11 19:36 ` Nishanth Aravamudan
2007-06-11 19:43 ` Christoph Lameter
2007-06-11 20:18 ` Nishanth Aravamudan
2007-06-11 18:23 ` Lee Schermerhorn
2007-06-11 18:40 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070611184656.GA9920@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=anton@samba.org \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.