All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Mel Gorman <mel@skynet.ie>, Christoph Lameter <clameter@sgi.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rientjes@google.com,
	kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask
Date: Fri, 09 Nov 2007 12:18:52 -0500	[thread overview]
Message-ID: <1194628732.5296.14.camel@localhost> (raw)
In-Reply-To: <20071109164537.GG7507@us.ibm.com>

On Fri, 2007-11-09 at 08:45 -0800, Nishanth Aravamudan wrote:
> On 09.11.2007 [16:14:55 +0000], Mel Gorman wrote:
> > On (09/11/07 07:45), Christoph Lameter didst pronounce:
> > > On Fri, 9 Nov 2007, Mel Gorman wrote:
> > > 
> > > >  struct page * fastcall
> > > >  __alloc_pages(gfp_t gfp_mask, unsigned int order,
> > > >  		struct zonelist *zonelist)
> > > >  {
> > > > +	/*
> > > > +	 * Use a temporary nodemask for __GFP_THISNODE allocations. If the
> > > > +	 * cost of allocating on the stack or the stack usage becomes
> > > > +	 * noticable, allocate the nodemasks per node at boot or compile time
> > > > +	 */
> > > > +	if (unlikely(gfp_mask & __GFP_THISNODE)) {
> > > > +		nodemask_t nodemask;
> > > 
> > > Hmmm.. This places a potentially big structure on the stack. nodemask can 
> > > contain up to 1024 bits which means 128 bytes. Maybe keep an array of 
> > > gfp_thisnode nodemasks (node_nodemask?) and use node_nodemask[nid]?
> > > 
> > 
> > That is what I was hinting at in the comment as a possible solution.
> > 
> > > > +
> > > > +		return __alloc_pages_internal(gfp_mask, order,
> > > > +			zonelist, nodemask_thisnode(numa_node_id(), &nodemask));
> > > 
> > > Argh.... GFP_THISNODE must use the nid passed to alloc_pages_node
> > > and *not* the local numa node id. Only if the node specified to
> > > alloc_pages nodes is -1 will this work.
> > > 
> > 
> > alloc_pages_node() calls __alloc_pages_nodemask() though where in this
> > function if I'm reading it right is called without a node id. Given no
> > other details on the nid, the current one seemed a logical choice.
> 
> Yeah, I guess the context here matters (and is a little hard to follow
> because thare are a few places that change in different ways here):
> 
> For allocating pages from a particular node (GFP_THISNODE with nid),
> the nid clearly must be specified. This only happens with
> alloc_pages_node(), AFAICT. So, in that interface, the right thing is
> done and the appropriate nodemask will be built.

I agree.  In an earlier patch, Mel was ignoring nid and using
numa_node_id() here.  This was causing your [Nish's] hugetlb pool
allocation patches to fail.  Mel fixed that ~9oct07.  

> 
> On the other hand, if we call alloc_pages() with GFP_THISNODE set, there
> is no nid to base the allocation on, so we "fallback" to numa_node_id()
> [ almost like the nid had been specified as -1 ].
> 
> So I guess this is logical -- but I wonder, do we have any callers of
> alloc_pages(GFP_THISNODE) ? It seems like an odd thing to do, when
> alloc_pages_node() exists?

I don't know if we have any current callers that do this, but absent any
documentation specifying otherwise, Mel's implementation matches what
I'd expect the behavior to be if I DID call alloc_pages with 'THISNODE.
However, we could specify that THISNODE is ignored in __alloc_pages()
and recommend the use of alloc_pages_node() passing numa_node_id() as
the nid parameter to achieve the behavior.  This would eliminate the
check for 'THISNODE in __alloc_pages().  Just mask it off before calling
down to __alloc_pages_internal().

Does this make sense?

Lee



WARNING: multiple messages have this Message-ID (diff)
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Mel Gorman <mel@skynet.ie>, Christoph Lameter <clameter@sgi.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rientjes@google.com,
	kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask
Date: Fri, 09 Nov 2007 12:18:52 -0500	[thread overview]
Message-ID: <1194628732.5296.14.camel@localhost> (raw)
In-Reply-To: <20071109164537.GG7507@us.ibm.com>

On Fri, 2007-11-09 at 08:45 -0800, Nishanth Aravamudan wrote:
> On 09.11.2007 [16:14:55 +0000], Mel Gorman wrote:
> > On (09/11/07 07:45), Christoph Lameter didst pronounce:
> > > On Fri, 9 Nov 2007, Mel Gorman wrote:
> > > 
> > > >  struct page * fastcall
> > > >  __alloc_pages(gfp_t gfp_mask, unsigned int order,
> > > >  		struct zonelist *zonelist)
> > > >  {
> > > > +	/*
> > > > +	 * Use a temporary nodemask for __GFP_THISNODE allocations. If the
> > > > +	 * cost of allocating on the stack or the stack usage becomes
> > > > +	 * noticable, allocate the nodemasks per node at boot or compile time
> > > > +	 */
> > > > +	if (unlikely(gfp_mask & __GFP_THISNODE)) {
> > > > +		nodemask_t nodemask;
> > > 
> > > Hmmm.. This places a potentially big structure on the stack. nodemask can 
> > > contain up to 1024 bits which means 128 bytes. Maybe keep an array of 
> > > gfp_thisnode nodemasks (node_nodemask?) and use node_nodemask[nid]?
> > > 
> > 
> > That is what I was hinting at in the comment as a possible solution.
> > 
> > > > +
> > > > +		return __alloc_pages_internal(gfp_mask, order,
> > > > +			zonelist, nodemask_thisnode(numa_node_id(), &nodemask));
> > > 
> > > Argh.... GFP_THISNODE must use the nid passed to alloc_pages_node
> > > and *not* the local numa node id. Only if the node specified to
> > > alloc_pages nodes is -1 will this work.
> > > 
> > 
> > alloc_pages_node() calls __alloc_pages_nodemask() though where in this
> > function if I'm reading it right is called without a node id. Given no
> > other details on the nid, the current one seemed a logical choice.
> 
> Yeah, I guess the context here matters (and is a little hard to follow
> because thare are a few places that change in different ways here):
> 
> For allocating pages from a particular node (GFP_THISNODE with nid),
> the nid clearly must be specified. This only happens with
> alloc_pages_node(), AFAICT. So, in that interface, the right thing is
> done and the appropriate nodemask will be built.

I agree.  In an earlier patch, Mel was ignoring nid and using
numa_node_id() here.  This was causing your [Nish's] hugetlb pool
allocation patches to fail.  Mel fixed that ~9oct07.  

> 
> On the other hand, if we call alloc_pages() with GFP_THISNODE set, there
> is no nid to base the allocation on, so we "fallback" to numa_node_id()
> [ almost like the nid had been specified as -1 ].
> 
> So I guess this is logical -- but I wonder, do we have any callers of
> alloc_pages(GFP_THISNODE) ? It seems like an odd thing to do, when
> alloc_pages_node() exists?

I don't know if we have any current callers that do this, but absent any
documentation specifying otherwise, Mel's implementation matches what
I'd expect the behavior to be if I DID call alloc_pages with 'THISNODE.
However, we could specify that THISNODE is ignored in __alloc_pages()
and recommend the use of alloc_pages_node() passing numa_node_id() as
the nid parameter to achieve the behavior.  This would eliminate the
check for 'THISNODE in __alloc_pages().  Just mask it off before calling
down to __alloc_pages_internal().

Does this make sense?

Lee


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-11-09 17:20 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-09 14:32 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v9 Mel Gorman
2007-11-09 14:32 ` Mel Gorman
2007-11-09 14:32 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-11-09 14:32   ` Mel Gorman
2007-11-09 14:33 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Mel Gorman
2007-11-09 14:33   ` Mel Gorman
2007-11-09 15:31   ` Christoph Lameter
2007-11-09 15:31     ` Christoph Lameter
2007-11-09 14:33 ` [PATCH 3/6] Use two zonelist that are filtered by " Mel Gorman
2007-11-09 14:33   ` Mel Gorman
2007-11-09 14:33 ` [PATCH 4/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman
2007-11-09 14:33   ` Mel Gorman
2007-11-20 15:34   ` Lee Schermerhorn
2007-11-20 15:34     ` Lee Schermerhorn
2007-11-09 14:34 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-11-09 14:34   ` Mel Gorman
2008-02-29  5:01   ` Paul Jackson
2008-02-29  5:01     ` Paul Jackson
2008-02-29 14:49     ` Lee Schermerhorn
2008-02-29 14:49       ` Lee Schermerhorn
2008-03-04 20:20     ` [PATCH] 2.6.25-rc3-mm1 - Mempolicy - update stale documentation and comments Lee Schermerhorn
2008-03-04 20:20       ` Lee Schermerhorn
2008-03-05  0:35       ` Paul Jackson
2008-03-05  0:35         ` Paul Jackson
2008-03-07 11:53       ` Mel Gorman
2008-03-07 11:53         ` Mel Gorman
2007-11-09 14:34 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-11-09 14:34   ` Mel Gorman
2007-11-09 15:45   ` Christoph Lameter
2007-11-09 15:45     ` Christoph Lameter
2007-11-09 16:14     ` Mel Gorman
2007-11-09 16:14       ` Mel Gorman
2007-11-09 16:19       ` Christoph Lameter
2007-11-09 16:19         ` Christoph Lameter
2007-11-09 16:45       ` Nishanth Aravamudan
2007-11-09 16:45         ` Nishanth Aravamudan
2007-11-09 17:18         ` Lee Schermerhorn [this message]
2007-11-09 17:18           ` Lee Schermerhorn
2007-11-09 17:26           ` Christoph Lameter
2007-11-09 17:26             ` Christoph Lameter
2007-11-09 18:16             ` Nishanth Aravamudan
2007-11-09 18:16               ` Nishanth Aravamudan
2007-11-09 18:20               ` Nishanth Aravamudan
2007-11-09 18:20                 ` Nishanth Aravamudan
2007-11-09 18:22                 ` Christoph Lameter
2007-11-09 18:22                   ` Christoph Lameter
2007-11-11 14:16             ` Mel Gorman
2007-11-11 14:16               ` Mel Gorman
2007-11-12 19:07               ` Christoph Lameter
2007-11-12 19:07                 ` Christoph Lameter
2007-11-09 18:14           ` Nishanth Aravamudan
2007-11-09 18:14             ` Nishanth Aravamudan
2007-11-20 14:19     ` Mel Gorman
2007-11-20 14:19       ` Mel Gorman
2007-11-20 15:14       ` Lee Schermerhorn
2007-11-20 15:14         ` Lee Schermerhorn
2007-11-20 16:21         ` Mel Gorman
2007-11-20 16:21           ` Mel Gorman
2007-11-20 20:19           ` Christoph Lameter
2007-11-20 20:19             ` Christoph Lameter
2007-11-20 20:18       ` Christoph Lameter
2007-11-20 20:18         ` Christoph Lameter
2007-11-20 21:26         ` Mel Gorman
2007-11-20 21:26           ` Mel Gorman
2007-11-20 21:33         ` Andrew Morton
2007-11-20 21:33           ` Andrew Morton
2007-11-20 21:38           ` Christoph Lameter
2007-11-20 21:38             ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2007-09-28 14:23 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v8 Mel Gorman
2007-09-28 14:25 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-28 14:25   ` Mel Gorman
2007-10-09  1:11   ` Nishanth Aravamudan
2007-10-09  1:11     ` Nishanth Aravamudan
2007-10-09  1:56     ` Christoph Lameter
2007-10-09  1:56       ` Christoph Lameter
2007-10-09  3:17       ` Nishanth Aravamudan
2007-10-09  3:17         ` Nishanth Aravamudan
2007-10-09 15:40     ` Mel Gorman
2007-10-09 15:40       ` Mel Gorman
2007-10-09 16:25       ` Nishanth Aravamudan
2007-10-09 16:25         ` Nishanth Aravamudan
2007-10-09 18:47         ` Christoph Lameter
2007-10-09 18:47           ` Christoph Lameter
2007-10-09 18:12       ` Nishanth Aravamudan
2007-10-09 18:12         ` Nishanth Aravamudan
2007-10-10 15:53       ` Lee Schermerhorn
2007-10-10 15:53         ` Lee Schermerhorn
2007-10-10 16:05         ` Nishanth Aravamudan
2007-10-10 16:05           ` Nishanth Aravamudan
2007-10-10 16:09         ` Mel Gorman
2007-10-10 16:09           ` Mel Gorman
2007-09-13 17:52 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v7 Mel Gorman
2007-09-13 17:54 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-13 17:54   ` Mel Gorman
2007-09-12 21:04 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v6 Mel Gorman
2007-09-12 21:06 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-12 21:06   ` Mel Gorman
2007-09-11 21:30 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 (resend) Mel Gorman
2007-09-11 21:32 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-11 21:32   ` Mel Gorman
2007-09-11 15:19 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 Mel Gorman
2007-09-11 15:21 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
2007-09-11 15:21   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1194628732.5296.14.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@skynet.ie \
    --cc=nacc@us.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.