linux-numa.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
	akpm@linux-foundation.org, Mel Gorman <mel@csn.ul.ie>,
	Greg KH <gregkh@suse.de>, Nishanth Aravamudan <nacc@us.ibm.com>,
	andi@firstfloor.org, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 4/4] hugetlb: add per node hstate attributes
Date: Thu, 30 Jul 2009 12:39:09 -0700	[thread overview]
Message-ID: <9ec263480907301239i4f6a6973m494f4b44770660dc@mail.gmail.com> (raw)
In-Reply-To: <20090729181205.23716.25002.sendpatchset@localhost.localdomain>

On Wed, Jul 29, 2009 at 11:12 AM, Lee
Schermerhorn<lee.schermerhorn@hp.com> wrote:
> PATCH/RFC 4/4 hugetlb:  register per node hugepages attributes
>
> Against: 2.6.31-rc3-mmotm-090716-1432
> atop the previously posted alloc_bootmem_hugepages fix.
> [http://marc.info/?l=linux-mm&m=124775468226290&w=4]
>
> This patch adds the per huge page size control/query attributes
> to the per node sysdevs:
>
> /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
>        nr_hugepages       - r/w
>        free_huge_pages    - r/o
>        surplus_huge_pages - r/o
>
> The patch attempts to re-use/share as much of the existing
> global hstate attribute initialization and handling as possible.
> Throughout, a node id < 0 indicates global hstate parameters.
>
> Note:  computation of "min_count" in set_max_huge_pages() for a
> specified node needs careful review.
>
> Issue:  dependency of base driver [node] dependency on hugetlbfs module.
> We want to keep all of the hstate attribute registration and handling
> in the hugetlb module.  However, we need to call into this code to
> register the per node hstate attributes on node hot plug.
>
> With this patch:
>
> (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
> ./  ../  free_hugepages  nr_hugepages  surplus_hugepages
>
> Starting from:
> Node 0 HugePages_Total:     0
> Node 0 HugePages_Free:      0
> Node 0 HugePages_Surp:      0
> Node 1 HugePages_Total:     0
> Node 1 HugePages_Free:      0
> Node 1 HugePages_Surp:      0
> Node 2 HugePages_Total:     0
> Node 2 HugePages_Free:      0
> Node 2 HugePages_Surp:      0
> Node 3 HugePages_Total:     0
> Node 3 HugePages_Free:      0
> Node 3 HugePages_Surp:      0
> vm.nr_hugepages = 0
>
> Allocate 16 persistent huge pages on node 2:
> (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
>
> Yields:
> Node 0 HugePages_Total:     0
> Node 0 HugePages_Free:      0
> Node 0 HugePages_Surp:      0
> Node 1 HugePages_Total:     0
> Node 1 HugePages_Free:      0
> Node 1 HugePages_Surp:      0
> Node 2 HugePages_Total:    16
> Node 2 HugePages_Free:     16
> Node 2 HugePages_Surp:      0
> Node 3 HugePages_Total:     0
> Node 3 HugePages_Free:      0
> Node 3 HugePages_Surp:      0
> vm.nr_hugepages = 16
>
> Global controls work as expected--reduce pool to 8 persistent huge pages:
> (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>
> Node 0 HugePages_Total:     0
> Node 0 HugePages_Free:      0
> Node 0 HugePages_Surp:      0
> Node 1 HugePages_Total:     0
> Node 1 HugePages_Free:      0
> Node 1 HugePages_Surp:      0
> Node 2 HugePages_Total:     8
> Node 2 HugePages_Free:      8
> Node 2 HugePages_Surp:      0
> Node 3 HugePages_Total:     0
> Node 3 HugePages_Free:      0
> Node 3 HugePages_Surp:      0
>
>
>
>
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
>

Thank you very much for doing this.

Google is going to need this support regardless of what finally gets
merged into mainline, so I'm thrilled you've implemented this version.

I hugely (get it? hugely :) favor this approach because it's much
simpler to reserve hugepages from this interface than a mempolicy
based approach once hugepages have already been allocated before.  For
cpusets users in particular, jobs typically get allocated on a subset
of nodes that are required for that application and they don't last
for the duration of the machine's uptime.  When a job exits and the
nodes need to be reallocated to a new cpuset, it may be a very
different set of mems based on the memory requirements or interleave
optimizations for the new job.  Allocating resources such as hugepages
are possible in this scenario via mempolicies, but it would require a
temporary mempolicy to then allocate additional hugepages from which
seems like an unnecessary requirement, especially if the job scheduler
that is governing hugepage allocations already has a mempolicy of its
own.

So it's my opinion that the mempolicy based approach is very
appropriate for tasks that allocate hugepages itself.  Other users,
particularly cpusets users, however, would require preallocation of
hugepages prior to a job being scheduled in which case a temporary
mempolicy would be required for that job scheduler.  That seems like
an inconvenience when the entire state of the system's hugepages could
easily be governed with the per-node hstate attributes and a slightly
modified user library.
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-07-30 19:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-29 18:11 [PATCH 0/4] hugetlb: V1 Per Node Hugepages attributes Lee Schermerhorn
2009-07-29 18:11 ` [PATCH 1/4] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-07-29 18:11 ` [PATCH 2/4] hugetlb: numafy several functions Lee Schermerhorn
2009-07-29 18:11 ` [PATCH 3/4] hugetlb: add private bit-field to kobject structure Lee Schermerhorn
2009-07-29 18:25   ` Greg KH
2009-07-31 18:59     ` Lee Schermerhorn
2009-07-29 18:12 ` [PATCH 4/4] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-07-30 19:39   ` David Rientjes [this message]
2009-07-31 10:36     ` Mel Gorman
2009-07-31 19:10       ` Lee Schermerhorn
2009-08-14 22:38         ` David Rientjes
2009-08-14 23:08           ` Andrew Morton
2009-08-14 23:19             ` Greg KH
2009-08-14 23:53             ` David Rientjes
2009-08-17  1:10               ` Lee Schermerhorn
2009-08-17 10:07                 ` David Rientjes
2009-08-15 10:08           ` Mel Gorman
2009-07-31 19:55       ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ec263480907301239i4f6a6973m494f4b44770660dc@mail.gmail.com \
    --to=rientjes@google.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=gregkh@suse.de \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).