linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 0/5] Huge Pages Nodes Allowed
Date: Wed, 17 Jun 2009 14:02:16 +0100	[thread overview]
Message-ID: <20090617130216.GF28529@csn.ul.ie> (raw)
In-Reply-To: <20090616135228.25248.22018.sendpatchset@lts-notebook>

On Tue, Jun 16, 2009 at 09:52:28AM -0400, Lee Schermerhorn wrote:
> Because of assymmetries in some NUMA platforms, and "interesting"
> topologies emerging in the "scale up x86" world, we have need for
> better control over the placement of "fresh huge pages".  A while
> back Nish Aravamundan floated a series of patches to add per node
> controls for allocating pages to the hugepage pool and removing
> them.  Nish apparently moved on to other tasks before those patches
> were accepted.  I have kept a copy of Nish's patches and have
> intended to rebase and test them and resubmit.
> 
> In an [off-list] exchange with Mel Gorman, who admits to knowledge
> in the huge pages area, I asked his opinion of per node controls
> for huge pages and he suggested another approach:  using the mempolicy
> of the task that changes nr_hugepages to constrain the fresh huge
> page allocations.  I considered this approach but it seemed to me
> to be a misuse of mempolicy for populating the huge pages free
> pool. 

Why would it be a misuse? Fundamentally, the huge page pools are being
filled by the current process when nr_hugepages is being used. Or are
you concerned about the specification of hugepages on the kernel command
line?

> Interleave policy doesn't have same "this node" semantics
> that we want

By "this node" semantics, do you mean allocating from one specific node?
In that case, why would specifying a nodemask of just one node not be
sufficient?

> and bind policy would require constructing a custom
> node mask for node as well as addressing OOM, which we don't want
> during fresh huge page allocation. 

Would the required mask not already be setup when the process set the
policy? OOM is not a major concern, it doesn't trigger for failed
hugepage allocations.

> One could derive a node mask
> of allowed nodes for huge pages from the mempolicy of the task
> that is modifying nr_hugepages and use that for fresh huge pages
> with GFP_THISNODE.  However, if we're not going to use mempolicy
> directly--e.g., via alloc_page_current() or alloc_page_vma() [with
> yet another on-stack pseudo-vma :(]--I thought it cleaner to
> define a "nodes allowed" nodemask for populating the [persistent]
> huge pages free pool.
> 

How about adding alloc_page_mempolicy() that takes the explicit mempolicy
you need?

> This patch series introduces a [per hugepage size] "sysctl",
> hugepages_nodes_allowed, that specifies a nodemask to constrain
> the allocation of persistent, fresh huge pages.   The nodemask
> may be specified by a sysctl, a sysfs huge pages attribute and
> on the kernel boot command line.  
> 
> The series includes a patch to free hugepages from the pool in a
> "round robin" fashion, interleaved across all on-line nodes to
> balance the hugepage pool across nodes.  Nish had a patch to do
> this, too.
> 
> Together, these changes don't provide the fine grain of control
> that per node attributes would. 

I'm failing to understand at the moment why mem policies set by numactl
would not do the job for allocation at least. Freeing is a different problem.

> Specifically, there is no easy
> way to reduce the persistent huge page count for a specific node.
> I think the degree of control provided by these patches is the
> minimal necessary and sufficient for managing the persistent the
> huge page pool.  However, with a bit more reorganization,  we
> could implement per node controls if others would find that
> useful.
> 
> For more info, see the patch descriptions and the updated kernel
> hugepages documentation.
> 


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-06-17 13:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 13:52 [PATCH 0/5] Huge Pages Nodes Allowed Lee Schermerhorn
2009-06-16 13:52 ` [PATCH 1/5] Free huge pages round robin to balance across nodes Lee Schermerhorn
2009-06-17 13:18   ` Mel Gorman
2009-06-17 17:16     ` Lee Schermerhorn
2009-06-18 19:08       ` David Rientjes
2009-06-16 13:52 ` [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct Lee Schermerhorn
2009-06-17 13:35   ` Mel Gorman
2009-06-17 17:38     ` Lee Schermerhorn
2009-06-18  9:17       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 3/5] Use per hstate nodes_allowed to constrain huge page allocation Lee Schermerhorn
2009-06-17 13:39   ` Mel Gorman
2009-06-17 17:47     ` Lee Schermerhorn
2009-06-18  9:18       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 4/5] Add sysctl for default hstate nodes_allowed Lee Schermerhorn
2009-06-17 13:41   ` Mel Gorman
2009-06-17 17:52     ` Lee Schermerhorn
2009-06-18  9:19       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 5/5] Update huge pages kernel documentation Lee Schermerhorn
2009-06-18 18:49   ` David Rientjes
2009-06-18 19:06     ` Lee Schermerhorn
2009-06-17 13:02 ` Mel Gorman [this message]
2009-06-17 17:15   ` [PATCH 0/5] Huge Pages Nodes Allowed Lee Schermerhorn
2009-06-18  9:33     ` Mel Gorman
2009-06-18 14:46       ` Lee Schermerhorn
2009-06-18 15:00         ` Mel Gorman
2009-06-18 19:08     ` David Rientjes
2009-06-24  7:11       ` David Rientjes
2009-06-24 11:25         ` Lee Schermerhorn
2009-06-24 22:26           ` David Rientjes
2009-06-25  2:14             ` Lee Schermerhorn
2009-06-25 19:22               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090617130216.GF28529@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).