From: Mel Gorman <mel@csn.ul.ie>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Nishanth Aravamudan <nacc@us.ibm.com>,
linux-numa@vger.kernel.org, Adam Litke <agl@us.ibm.com>,
Andy Whitcroft <apw@canonical.com>,
Eric Whitney <eric.whitney@hp.com>,
Randy Dunlap <randy.dunlap@oracle.com>
Subject: Re: [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management.
Date: Mon, 14 Sep 2009 14:33:30 +0100 [thread overview]
Message-ID: <20090914133329.GC11778@csn.ul.ie> (raw)
In-Reply-To: <alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com>
On Fri, Sep 11, 2009 at 03:27:30PM -0700, David Rientjes wrote:
> On Thu, 10 Sep 2009, Mel Gorman wrote:
>
> > > Would you explain why introducing a new mempolicy flag, MPOL_F_HUGEPAGES,
> > > and only using the new behavior when this is set would be inconsistent or
> > > inadvisible?
> >
> > I already explained this. The interface in numactl would look weird. There
> > would be an --interleave switch and a --hugepages-interleave that only
> > applies to nr_hugepages. The smarts could be in hugeadm to apply the mask
> > when --pool-pages-min is specified but that wouldn't help scripts that are
> > still using echo.
> >
>
> I don't think we need to address the scripts that are currently using echo
> since they're (hopefully) written to the kernel implementation, i.e. no
> mempolicy restriction on writing to nr_hugepages.
>
Ok.
> > I hate to have to do this, but how about nr_hugepages which acts
> > system-wide as it did traditionally and nr_hugepages_mempolicy that obeys
> > policies? Something like the following untested patch. It would be fairly
> > trivial for me to implement a --obey-mempolicies switch for hugeadm which
> > works in conjunction with --pool--pages-min and less likely to cause confusion
> > than --hugepages-interleave in numactl.
> >
>
> I like it.
>
Ok, when I get this tested, I'll sent it as a follow-on patch to Lee's
for proper incorporation.
> > Sorry the patch is untested. I can't hold of a NUMA machine at the moment
> > and fake NUMA support sucks far worse than I expected it to.
> >
>
> Hmm, I rewrote most of fake NUMA a couple years ago. What problems are
> you having with it?
>
On PPC64, the parameters behave differently. I couldn't convince it to
create more than one NUMA node. On x86-64, the NUMA nodes appeared to
exist and would be visible on /proc/buddyinfo for example but the sysfs
directories for the fake nodes were not created so nr_hugepages couldn't
be examined on a per-node basis for example.
> > ==== BEGIN PATCH ====
> >
> > [PATCH] Optionally use a memory policy when tuning the size of the static hugepage pool
> >
> > Patch "derive huge pages nodes allowed from task mempolicy" brought
> > huge page support more in line with the core VM in that tuning the size
> > of the static huge page pool would obey memory policies. Using this,
> > administrators could interleave allocation of huge pages from a subset
> > of nodes. This is consistent with how dynamic hugepage pool resizing
> > works and how hugepages get allocated to applications at run-time.
> >
> > However, it was pointed out that scripts may exist that depend on being
> > able to drain all hugepages via /proc/sys/vm/nr_hugepages from processes
> > that are running within a memory policy. This patch adds
> > /proc/sys/vm/nr_hugepages_mempolicy which when written to will obey
> > memory policies. /proc/sys/vm/nr_hugepages continues then to be a
> > system-wide tunable regardless of memory policy.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> > include/linux/hugetlb.h | 1 +
> > kernel/sysctl.c | 11 +++++++++++
> > mm/hugetlb.c | 35 ++++++++++++++++++++++++++++++++---
> > 3 files changed, 44 insertions(+), 3 deletions(-)
> >
>
> It'll need an update to Documentation/vm/hugetlb.txt, but this can
> probably be done in one of Lee's patches that edits the same file when he
> reposts.
>
Agreed.
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index fcb1677..fc3a659 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -21,6 +21,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
> >
> > void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
> > int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> > +int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> > int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> > int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 8bac3f5..0637655 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1171,6 +1171,17 @@ static struct ctl_table vm_table[] = {
> > .extra1 = (void *)&hugetlb_zero,
> > .extra2 = (void *)&hugetlb_infinity,
> > },
> > +#ifdef CONFIG_NUMA
> > + {
> > + .procname = "nr_hugepages_mempolicy",
> > + .data = NULL,
> > + .maxlen = sizeof(unsigned long),
> > + .mode = 0644,
> > + .proc_handler = &hugetlb_mempolicy_sysctl_handler,
> > + .extra1 = (void *)&hugetlb_zero,
> > + .extra2 = (void *)&hugetlb_infinity,
> > + },
> > +#endif
> > {
> > .ctl_name = VM_HUGETLB_GROUP,
> > .procname = "hugetlb_shm_group",
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 83decd6..68abef0 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1244,6 +1244,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
> > return ret;
> > }
> >
> > +#define NUMA_NO_NODE_OBEY_MEMPOLICY (-2)
> > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
> > static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
> > int nid)
>
> I think it would be possible to avoid adding NUMA_NO_NODE_OBEY_MEMPOLICY
> if the nodemask was allocated in the sysctl handler instead and passing it
> into set_max_huge_pages() instead of a nid. Lee, what do you think?
>
> Other than that, I like this approach because it avoids the potential for
> userspace breakage while adding the new feature in way that avoids
> confusion.
>
Indeed. While the addition of another proc tunable sucks, it seems like
the only available compromise.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-09-14 13:33 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-28 16:03 [PATCH 0/6] hugetlb: V5 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 1/6] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 2/6] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-09-03 18:39 ` David Rientjes
2009-08-28 16:03 ` [PATCH 3/6] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-09-01 14:47 ` Mel Gorman
2009-09-03 19:22 ` David Rientjes
2009-09-03 20:15 ` Lee Schermerhorn
2009-09-03 20:49 ` David Rientjes
2009-08-28 16:03 ` [PATCH 4/6] hugetlb: introduce alloc_nodemask_of_node Lee Schermerhorn
2009-09-01 14:49 ` Mel Gorman
2009-09-01 16:42 ` Lee Schermerhorn
2009-09-03 18:34 ` David Rientjes
2009-09-03 20:49 ` Lee Schermerhorn
2009-09-03 21:03 ` David Rientjes
2009-08-28 16:03 ` [PATCH 5/6] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-09-01 15:20 ` Mel Gorman
2009-09-03 19:52 ` David Rientjes
2009-09-03 20:41 ` Lee Schermerhorn
2009-09-03 21:02 ` David Rientjes
2009-09-04 14:30 ` Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
2009-09-03 20:07 ` David Rientjes
2009-09-03 21:09 ` Lee Schermerhorn
2009-09-03 21:25 ` David Rientjes
2009-09-08 10:44 ` Mel Gorman
2009-09-08 19:51 ` David Rientjes
2009-09-08 20:04 ` Mel Gorman
2009-09-08 20:18 ` David Rientjes
2009-09-08 21:41 ` Mel Gorman
2009-09-08 22:54 ` David Rientjes
2009-09-09 8:16 ` Mel Gorman
2009-09-09 20:44 ` David Rientjes
2009-09-10 12:26 ` Mel Gorman
2009-09-11 22:27 ` David Rientjes
2009-09-14 13:33 ` Mel Gorman [this message]
2009-09-14 14:15 ` Lee Schermerhorn
2009-09-14 15:41 ` Mel Gorman
2009-09-14 19:15 ` David Rientjes
2009-09-15 11:48 ` Mel Gorman
2009-09-14 19:14 ` David Rientjes
2009-09-14 21:28 ` David Rientjes
2009-09-16 10:21 ` Mel Gorman
2009-09-03 20:42 ` Randy Dunlap
2009-09-04 15:23 ` Lee Schermerhorn
-- strict thread matches above, loose matches on Subject: below --
2009-09-09 16:31 [PATCH 0/6] hugetlb: V6 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090914133329.GC11778@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=Lee.Schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=apw@canonical.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=linux-numa@vger.kernel.org \
--cc=nacc@us.ibm.com \
--cc=randy.dunlap@oracle.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).