Re: [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct
Date: Wed, 17 Jun 2009 13:38:16 -0400	[thread overview]
Message-ID: <1245260296.6235.81.camel@lts-notebook> (raw)
In-Reply-To: <20090617133541.GH28529@csn.ul.ie>

On Wed, 2009-06-17 at 14:35 +0100, Mel Gorman wrote:
> On Tue, Jun 16, 2009 at 09:52:53AM -0400, Lee Schermerhorn wrote:
> > [PATCH 2/5] add nodes_allowed members to hugepages hstate struct
> > 
> > Against:  17may09 mmotm
> > 
> > This patch adds a nodes_allowed nodemask_t pointer and a
> > __nodes_allowed nodemask_t to the hstate struct for constraining
> > fresh hugepage allocations.  It then adds sysfs attributes and
> > boot command line options to set and [for sysfs attributes] query
> > the allowed nodes mask.
> > 
> > A separate patch will hook up this nodes_allowed mask/pointer to
> > fresh huge page allocation and promoting of surplus pages to
> > persistent.  Another patch will add a 'sysctl' hugepages_nodes_allowed
> > to /proc/sys/vm.
> > 
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > 
> >  include/linux/hugetlb.h |    2 +
> >  mm/hugetlb.c            |   86 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 88 insertions(+)
> > 
> > Index: linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h
> > ===================================================================
> > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/include/linux/hugetlb.h	2009-06-04 12:59:31.000000000 -0400
> > +++ linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h	2009-06-04 12:59:32.000000000 -0400
> > @@ -193,6 +193,8 @@ struct hstate {
> >  	unsigned long resv_huge_pages;
> >  	unsigned long surplus_huge_pages;
> >  	unsigned long nr_overcommit_huge_pages;
> > +	nodemask_t *nodes_allowed;
> > +	nodemask_t __nodes_allowed;
> 
> Can you add a comment as to why a nodemask pointer and a nodemask itself
> with very similar field names are both needed?

Hmmm, thought I did that.  Guess I just thought about doing it :(.

It's sort of tied up with why I want to use "all" rather than a literal
node mask to specify allocation [attempts] across all nodes.  The
default and current behavior is to use the node_online_mask.  My
understanding is that this tracks node hot add/remove, and I wanted to
preserve that behavior by pointing at node_online_mask directly, in the
default and nodes_allowed=all cases.


> 
> >  	struct list_head hugepage_freelists[MAX_NUMNODES];
> >  	unsigned int nr_huge_pages_node[MAX_NUMNODES];
> >  	unsigned int free_huge_pages_node[MAX_NUMNODES];
> > Index: linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c
> > ===================================================================
> > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/mm/hugetlb.c	2009-06-04 12:59:31.000000000 -0400
> > +++ linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c	2009-06-04 12:59:32.000000000 -0400
> > @@ -40,6 +40,9 @@ __initdata LIST_HEAD(huge_boot_pages);
> >  static struct hstate * __initdata parsed_hstate;
> >  static unsigned long __initdata default_hstate_max_huge_pages;
> >  static unsigned long __initdata default_hstate_size;
> > +static struct hstate __initdata default_boot_hstate = {
> > +	.nodes_allowed = &node_online_map,
> > +};
> >  
> >  #define for_each_hstate(h) \
> >  	for ((h) = hstates; (h) < &hstates[max_hstate]; (h)++)
> > @@ -1102,6 +1105,9 @@ static void __init hugetlb_init_hstates(
> >  	struct hstate *h;
> >  
> >  	for_each_hstate(h) {
> > +		if (!h->nodes_allowed)
> > +			h->nodes_allowed = &node_online_map;
> > +
> 
> The reasoning for two nodes_allowed would appear to be to allow nodes_allowed
> to be some predefined mash or a hstate-private mask. As the mask has to exist
> anyway, can you not just copy it in rather than having a pointer and a mask?
> 
> It would make the check below as to whether all nodes allowed or not a bit
> more expensive but it's not a big deal.

but, it wouldn't track node hot add/remove.  I suppose we could add
handlers to fix up the hstates, but that seemed unnecessary work.

My thinking was that allocation of persistent hugepages [as opposed to
overcommitted/surplus h.p.] is an administrative action and, thus, not a
fast path.  Also, I didn't envision [my myopia, perhaps] a large number
of hstates, so I didn't think the extra pointer per hstate was a big
burden.

> 
> >  		/* oversize hugepages were init'ed in early boot */
> >  		if (h->order < MAX_ORDER)
> >  			hugetlb_hstate_alloc_pages(h);
> > @@ -1335,6 +1341,62 @@ static ssize_t nr_overcommit_hugepages_s
> >  }
> >  HSTATE_ATTR(nr_overcommit_hugepages);
> >  
> > +static ssize_t nodes_allowed_show(struct kobject *kobj,
> > +					struct kobj_attribute *attr, char *buf)
> > +{
> > +	struct hstate *h = kobj_to_hstate(kobj);
> > +	int len = 3;
> > +
> > +	if (h->nodes_allowed == &node_online_map)
> > +		strcpy(buf, "all");
> > +	else
> > +		len = nodelist_scnprintf(buf, PAGE_SIZE,
> > +					*h->nodes_allowed);
> > +
> > +	if (len)
> > +		buf[len++] = '\n';
> > +
> 
> buf doesn't get NULL terminated.

Ah, OK.  Not sure what I was using as an example here.  Maybe I thunk
that up all on my own...

> 
> > +	return len;
> > +}
> 
> Can print_nodes_state() be extended a little to print "all" and then shared
> with here?

Well, I don't know that we'd want to see "all" for the sysfs "online"
nodes attribute.  Some might not find this too informative.  Of course,
even for huge pages nodes allowed, it's not all that informative, but
IMO fits the usage.  One can always look at the sysfs online nodes
attribute to see what "all" means for nodes_allowed.  Couldn't do that
if we changed print_nodes_state().  I guess we could add a flag to
indicate whether we wanted "all" or the actual nodemask.  And, we have
to make print_nodes_state() visible.  Do you think that would be worth
while?  

> 
> > +
> > +static int set_hstate_nodes_allowed(struct hstate *h, const char *buf,
> > +					bool lock)
> > +{
> > +	nodemask_t nodes_allowed;
> > +	int ret = 1;
> > +	bool all = !strncasecmp(buf, "all", 3);
> > +
> > +	if (!all)
> > +		ret = !nodelist_parse(buf, nodes_allowed);
> > +	if (ret) {
> > +		if (lock)
> > +			spin_lock(&hugetlb_lock);
> > +
> 
> ick.
> 
> Convention for something like this is to have set_hstate_nodes_allowed
> that takes a spinlock and __set_hstate_nodes_allowed that is the
> lock-free version and call as appropriate. Can the same be done here?

Sorry.  Yeah, I can do that. 

> 
> For that matter, does taking spinlocks from __setup() break that you
> avoid taking it in that case?

I wasn't sure, but noticed that we weren't locking in the boot path for,
e.g., [nr_]hugepages, so I avoided it here.  I suppose I can just test
whether or not it breaks and, if not, just lock.

> 
> > +		if (all) {
> > +			h->nodes_allowed = &node_online_map;
> > +		} else {
> > +			h->__nodes_allowed = nodes_allowed;
> > +			h->nodes_allowed = &h->__nodes_allowed;
> > +		}
> > +
> > +		if (lock)
> > +			spin_unlock(&hugetlb_lock);
> > +	}
> > +	return ret;
> > +}
> > +
> > +static ssize_t nodes_allowed_store(struct kobject *kobj,
> > +		struct kobj_attribute *attr, const char *buf, size_t count)
> > +{
> > +	struct hstate *h = kobj_to_hstate(kobj);
> > +
> > +	if (set_hstate_nodes_allowed(h, buf, 1))
> > +		return count;
> > +	else
> > +		return 0;
> > +}
> > +HSTATE_ATTR(nodes_allowed);
> > +
> >  static ssize_t free_hugepages_show(struct kobject *kobj,
> >  					struct kobj_attribute *attr, char *buf)
> >  {
> > @@ -1362,6 +1424,7 @@ HSTATE_ATTR_RO(surplus_hugepages);
> >  static struct attribute *hstate_attrs[] = {
> >  	&nr_hugepages_attr.attr,
> >  	&nr_overcommit_hugepages_attr.attr,
> > +	&nodes_allowed_attr.attr,
> >  	&free_hugepages_attr.attr,
> >  	&resv_hugepages_attr.attr,
> >  	&surplus_hugepages_attr.attr,
> > @@ -1436,6 +1499,13 @@ static int __init hugetlb_init(void)
> >  	if (default_hstate_max_huge_pages)
> >  		default_hstate.max_huge_pages = default_hstate_max_huge_pages;
> >  
> > +	if (default_boot_hstate.nodes_allowed != &node_online_map) {
> > +		default_hstate.__nodes_allowed =
> > +					default_boot_hstate.__nodes_allowed;
> > +		default_hstate.nodes_allowed =
> > +					&default_hstate.__nodes_allowed;
> > +	}
> > +
> >  	hugetlb_init_hstates();
> >  
> >  	gather_bootmem_prealloc();
> > @@ -1471,6 +1541,7 @@ void __init hugetlb_add_hstate(unsigned 
> >  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> >  					huge_page_size(h)/1024);
> >  
> > +	h->nodes_allowed = &node_online_map;
> >  	parsed_hstate = h;
> >  }
> >  
> > @@ -1518,6 +1589,21 @@ static int __init hugetlb_default_setup(
> >  }
> >  __setup("default_hugepagesz=", hugetlb_default_setup);
> >  
> > +static int __init hugetlb_nodes_allowed_setup(char *s)
> > +{
> > +	struct hstate *h = &default_boot_hstate;
> > +
> > +	/*
> > +	 * max_hstate means we've parsed a hugepagesz= parameter, so
> > +	 * use the [most recently] parsed_hstate.  Else use default.
> > +	 */
> > +	if (max_hstate)
> > +		h = parsed_hstate;
> > +
> > +	return set_hstate_nodes_allowed(h, s, 0);
> > +}
> > +__setup("hugepages_nodes_allowed=", hugetlb_nodes_allowed_setup);
> > +
> >  static unsigned int cpuset_mems_nr(unsigned int *array)
> >  {
> >  	int node;
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-06-17 17:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 13:52 [PATCH 0/5] Huge Pages Nodes Allowed Lee Schermerhorn
2009-06-16 13:52 ` [PATCH 1/5] Free huge pages round robin to balance across nodes Lee Schermerhorn
2009-06-17 13:18   ` Mel Gorman
2009-06-17 17:16     ` Lee Schermerhorn
2009-06-18 19:08       ` David Rientjes
2009-06-16 13:52 ` [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct Lee Schermerhorn
2009-06-17 13:35   ` Mel Gorman
2009-06-17 17:38     ` Lee Schermerhorn [this message]
2009-06-18  9:17       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 3/5] Use per hstate nodes_allowed to constrain huge page allocation Lee Schermerhorn
2009-06-17 13:39   ` Mel Gorman
2009-06-17 17:47     ` Lee Schermerhorn
2009-06-18  9:18       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 4/5] Add sysctl for default hstate nodes_allowed Lee Schermerhorn
2009-06-17 13:41   ` Mel Gorman
2009-06-17 17:52     ` Lee Schermerhorn
2009-06-18  9:19       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 5/5] Update huge pages kernel documentation Lee Schermerhorn
2009-06-18 18:49   ` David Rientjes
2009-06-18 19:06     ` Lee Schermerhorn
2009-06-17 13:02 ` [PATCH 0/5] Huge Pages Nodes Allowed Mel Gorman
2009-06-17 17:15   ` Lee Schermerhorn
2009-06-18  9:33     ` Mel Gorman
2009-06-18 14:46       ` Lee Schermerhorn
2009-06-18 15:00         ` Mel Gorman
2009-06-18 19:08     ` David Rientjes
2009-06-24  7:11       ` David Rientjes
2009-06-24 11:25         ` Lee Schermerhorn
2009-06-24 22:26           ` David Rientjes
2009-06-25  2:14             ` Lee Schermerhorn
2009-06-25 19:22               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1245260296.6235.81.camel@lts-notebook \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox