All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 0/5] Huge Pages Nodes Allowed
Date: Wed, 17 Jun 2009 14:02:16 +0100	[thread overview]
Message-ID: <20090617130216.GF28529@csn.ul.ie> (raw)
In-Reply-To: <20090616135228.25248.22018.sendpatchset@lts-notebook>

On Tue, Jun 16, 2009 at 09:52:28AM -0400, Lee Schermerhorn wrote:
> Because of assymmetries in some NUMA platforms, and "interesting"
> topologies emerging in the "scale up x86" world, we have need for
> better control over the placement of "fresh huge pages".  A while
> back Nish Aravamundan floated a series of patches to add per node
> controls for allocating pages to the hugepage pool and removing
> them.  Nish apparently moved on to other tasks before those patches
> were accepted.  I have kept a copy of Nish's patches and have
> intended to rebase and test them and resubmit.
> 
> In an [off-list] exchange with Mel Gorman, who admits to knowledge
> in the huge pages area, I asked his opinion of per node controls
> for huge pages and he suggested another approach:  using the mempolicy
> of the task that changes nr_hugepages to constrain the fresh huge
> page allocations.  I considered this approach but it seemed to me
> to be a misuse of mempolicy for populating the huge pages free
> pool. 

Why would it be a misuse? Fundamentally, the huge page pools are being
filled by the current process when nr_hugepages is being used. Or are
you concerned about the specification of hugepages on the kernel command
line?

> Interleave policy doesn't have same "this node" semantics
> that we want

By "this node" semantics, do you mean allocating from one specific node?
In that case, why would specifying a nodemask of just one node not be
sufficient?

> and bind policy would require constructing a custom
> node mask for node as well as addressing OOM, which we don't want
> during fresh huge page allocation. 

Would the required mask not already be setup when the process set the
policy? OOM is not a major concern, it doesn't trigger for failed
hugepage allocations.

> One could derive a node mask
> of allowed nodes for huge pages from the mempolicy of the task
> that is modifying nr_hugepages and use that for fresh huge pages
> with GFP_THISNODE.  However, if we're not going to use mempolicy
> directly--e.g., via alloc_page_current() or alloc_page_vma() [with
> yet another on-stack pseudo-vma :(]--I thought it cleaner to
> define a "nodes allowed" nodemask for populating the [persistent]
> huge pages free pool.
> 

How about adding alloc_page_mempolicy() that takes the explicit mempolicy
you need?

> This patch series introduces a [per hugepage size] "sysctl",
> hugepages_nodes_allowed, that specifies a nodemask to constrain
> the allocation of persistent, fresh huge pages.   The nodemask
> may be specified by a sysctl, a sysfs huge pages attribute and
> on the kernel boot command line.  
> 
> The series includes a patch to free hugepages from the pool in a
> "round robin" fashion, interleaved across all on-line nodes to
> balance the hugepage pool across nodes.  Nish had a patch to do
> this, too.
> 
> Together, these changes don't provide the fine grain of control
> that per node attributes would. 

I'm failing to understand at the moment why mem policies set by numactl
would not do the job for allocation at least. Freeing is a different problem.

> Specifically, there is no easy
> way to reduce the persistent huge page count for a specific node.
> I think the degree of control provided by these patches is the
> minimal necessary and sufficient for managing the persistent the
> huge page pool.  However, with a bit more reorganization,  we
> could implement per node controls if others would find that
> useful.
> 
> For more info, see the patch descriptions and the updated kernel
> hugepages documentation.
> 


-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-06-17 13:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 13:52 [PATCH 0/5] Huge Pages Nodes Allowed Lee Schermerhorn
2009-06-16 13:52 ` [PATCH 1/5] Free huge pages round robin to balance across nodes Lee Schermerhorn
2009-06-17 13:18   ` Mel Gorman
2009-06-17 17:16     ` Lee Schermerhorn
2009-06-18 19:08       ` David Rientjes
2009-06-16 13:52 ` [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct Lee Schermerhorn
2009-06-17 13:35   ` Mel Gorman
2009-06-17 17:38     ` Lee Schermerhorn
2009-06-18  9:17       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 3/5] Use per hstate nodes_allowed to constrain huge page allocation Lee Schermerhorn
2009-06-17 13:39   ` Mel Gorman
2009-06-17 17:47     ` Lee Schermerhorn
2009-06-18  9:18       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 4/5] Add sysctl for default hstate nodes_allowed Lee Schermerhorn
2009-06-17 13:41   ` Mel Gorman
2009-06-17 17:52     ` Lee Schermerhorn
2009-06-18  9:19       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 5/5] Update huge pages kernel documentation Lee Schermerhorn
2009-06-18 18:49   ` David Rientjes
2009-06-18 19:06     ` Lee Schermerhorn
2009-06-17 13:02 ` Mel Gorman [this message]
2009-06-17 17:15   ` [PATCH 0/5] Huge Pages Nodes Allowed Lee Schermerhorn
2009-06-18  9:33     ` Mel Gorman
2009-06-18 14:46       ` Lee Schermerhorn
2009-06-18 15:00         ` Mel Gorman
2009-06-18 19:08     ` David Rientjes
2009-06-24  7:11       ` David Rientjes
2009-06-24 11:25         ` Lee Schermerhorn
2009-06-24 22:26           ` David Rientjes
2009-06-25  2:14             ` Lee Schermerhorn
2009-06-25 19:22               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090617130216.GF28529@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.