All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@us.ibm.com>
To: clameter@sgi.com
Cc: lee.schermerhorn@hp.com, wli@holomorphy.com, melgor@ie.ibm.com,
	akpm@linux-foundation.org, linux-mm@kvack.org, agl@us.ibm.com,
	pj@sgi.com
Subject: [RFC][PATCH 4/5] hugetlb: fix cpuset-constrained pool resizing
Date: Mon, 6 Aug 2007 09:44:10 -0700	[thread overview]
Message-ID: <20070806164410.GO15714@us.ibm.com> (raw)
In-Reply-To: <20070806164055.GN15714@us.ibm.com>

With the previous 3 patches in this series applied, if a process is in a
constrained cpuset, and tries to grow the hugetlb pool, hugepages may be
allocated on nodes outside of the process' cpuset. More concretely,
growing the pool via

echo some_value > /proc/sys/vm/nr_hugepages

interleaves across all nodes with memory such that hugepage allocations
occur on nodes outside the cpuset. Similarly, this process is able to
change the values in values in
/sys/devices/system/node/nodeX/nr_hugepages, even when X is not in the
cpuset. This directly violates the isolation that cpusets is supposed to
guarantee.

For pool growth: fix the sysctl case by only interleaving across the
nodes in current's cpuset; fix the sysfs attribute case by verifying the
requested node is in current's cpuset. For pool shrinking: both cases
are mostly already covered by the cpuset_zone_allowed_softwall() check
in dequeue_huge_page_node(), but make sure that we only iterate over the
cpusets's nodes in try_to_free_low().

Before:

Trying to resize the pool back to     100 from the top cpuset
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    100
Node 0 HugePages_Free:      0
Done.     100 free
/cpuset/set1 /cpuset ~
Trying to resize the pool to     200 from a cpuset restricted to node 1
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    150
Node 0 HugePages_Free:     50
Done.     200 free
Trying to shrink the pool on node 0 down to 0 from a cpuset restricted
to node 1
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    150
Node 0 HugePages_Free:      0
Done.     150 free

After:

Trying to resize the pool back to     100 from the top cpuset
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    100
Node 0 HugePages_Free:      0
Done.     100 free
/cpuset/set1 /cpuset ~
Trying to resize the pool to     200 from a cpuset restricted to node 1
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    200
Node 0 HugePages_Free:      0
Done.     200 free
Trying to grow the pool on node 0 up to 50 from a cpuset restricted to
node 1
Node 3 HugePages_Free:      0
Node 2 HugePages_Free:      0
Node 1 HugePages_Free:    200
Node 0 HugePages_Free:      0
Done.     200 free

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 09ad639..af07a0b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -181,6 +181,10 @@ static int __init hugetlb_init(void)
 	for_each_node_state(i, N_HIGH_MEMORY)
 		INIT_LIST_HEAD(&hugepage_freelists[i]);
 
+	/*
+	 * at boot-time, interleave across all available nodes as there
+	 * is not any corresponding cpuset/process
+	 */
 	pol = mpol_new(MPOL_INTERLEAVE, &node_states[N_HIGH_MEMORY]);
 	if (IS_ERR(pol))
 		goto quit;
@@ -258,7 +262,7 @@ static void try_to_free_low(unsigned long count)
 {
 	int i;
 
-	for_each_node_state(i, N_HIGH_MEMORY) {
+	for_each_node_mask(i, cpuset_current_mems_allowed) {
 		try_to_free_low_node(i, count);
 		if (count >= nr_huge_pages)
 			return;
@@ -278,7 +282,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
 {
 	struct mempolicy *pol;
 
-	pol = mpol_new(MPOL_INTERLEAVE, &node_states[N_HIGH_MEMORY]);
+	pol = mpol_new(MPOL_INTERLEAVE, &cpuset_current_mems_allowed);
 	if (IS_ERR(pol))
 		return nr_huge_pages;
 	/*
@@ -286,7 +290,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
 	 * process, we need to make sure il_next has a good starting
 	 * value
 	 */
-	set_first_interleave_node(node_states[N_HIGH_MEMORY]);
+	set_first_interleave_node(cpuset_current_mems_allowed);
 	while (count > nr_huge_pages) {
 		if (!alloc_fresh_huge_page(pol))
 			break;
@@ -368,6 +372,10 @@ static ssize_t hugetlb_write_nr_hugepages_node(struct sys_device *dev,
 	unsigned long free_on_other_nodes;
 	unsigned long nr_huge_pages_req = simple_strtoul(buf, NULL, 10);
 
+	/* prevent per-node allocations from outside the allowed cpuset */
+	if (!node_isset(nid, cpuset_current_mems_allowed))
+		return count;
+
 	while (nr_huge_pages_req > nr_huge_pages_node[nid]) {
 		if (!alloc_fresh_huge_page_node(nid))
 			return count;

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-08-06 16:44 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-06 16:32 [RFC][PATCH 0/5] hugetlb NUMA improvements Nishanth Aravamudan
2007-08-06 16:37 ` [RFC][PATCH 1/5] Fix hugetlb pool allocation with empty nodes V9 Nishanth Aravamudan
2007-08-06 16:38   ` [RFC][PATCH 2/5] hugetlb: numafy several functions Nishanth Aravamudan
2007-08-06 16:40     ` [RFC][PATCH 3/5] hugetlb: add per-node nr_hugepages sysfs attribute Nishanth Aravamudan
2007-08-06 16:44       ` Nishanth Aravamudan [this message]
2007-08-06 16:45         ` [RFC][PATCH 4/5] hugetlb: fix cpuset-constrained pool resizing Nishanth Aravamudan
2007-08-06 16:48         ` [RFC][PATCH 5/5] hugetlb: interleave dequeueing of huge pages Nishanth Aravamudan
2007-08-06 18:04         ` [RFC][PATCH 4/5] hugetlb: fix cpuset-constrained pool resizing Christoph Lameter
2007-08-06 18:26           ` Nishanth Aravamudan
2007-08-06 18:41             ` Christoph Lameter
2007-08-07  0:03               ` Nishanth Aravamudan
2007-08-06 19:37           ` Lee Schermerhorn
2007-08-08  1:50           ` Nishanth Aravamudan
2007-08-08 13:26             ` Lee Schermerhorn
2007-08-06 17:59     ` [RFC][PATCH 2/5] hugetlb: numafy several functions Christoph Lameter
2007-08-06 18:15       ` Nishanth Aravamudan
2007-08-07  0:34         ` Nishanth Aravamudan
2007-08-06 18:00   ` [RFC][PATCH 1/5] Fix hugetlb pool allocation with empty nodes V9 Christoph Lameter
2007-08-06 18:19     ` Nishanth Aravamudan
2007-08-06 18:37       ` Christoph Lameter
2007-08-06 19:52         ` Lee Schermerhorn
2007-08-06 20:15           ` Christoph Lameter
2007-08-07  0:04             ` Nishanth Aravamudan
2007-08-06 16:39 ` [RFC][PATCH 0/5] hugetlb NUMA improvements Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070806164410.GO15714@us.ibm.com \
    --to=nacc@us.ibm.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=melgor@ie.ibm.com \
    --cc=pj@sgi.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.