linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed
@ 2011-08-03 12:37 Rafael Aquini
  2011-08-03 14:19 ` Christoph Lameter
  2011-08-04  1:59 ` Andi Kleen
  0 siblings, 2 replies; 4+ messages in thread
From: Rafael Aquini @ 2011-08-03 12:37 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, KOSAKI Motohiro, Stephen Wilson, Andrea Arcangeli,
	Christoph Lameter, Rik van Riel, linux-kernel

Among several other features enabled when CONFIG_CPUSETS is defined, task_struct is enhanced with the nodemask_t mems_allowed element that serves to register/report on which memory nodes the task may obtain memory. Also, two new lines that reflect the value registered at task_struct->mems_allowed are added to the '/proc/[pid]/status' file:
	  Mems_allowed:   ...,00000000,0000000f
	  Mems_allowed_list:      0-3

The system calls sys_mbind and sys_set_mempolicy, which serve to cope with NUMA memory policies, and receive a nodemask_t parameter, do not set task_struct->mems_allowed accordingly to their received nodemask, when CONFIG_CPUSETS is defined. This unawareness causes unexpected values being reported at '/proc/[pid]/status' Mems_allowed fields, for applications relying on those syscalls, or spawned by numactl.

Despite not affecting the memory policy operation itself, the aforementioned unawareness is source of confusion and annoyance when one is trying to figure out which resources are bound to a given task.

Reproduceability:
a. in a NUMA box, spaw a bash shell with numactl setting a nodelist to --membind, or --interleave:
[root@localhost ~]# numactl --membind=1,2 /bin/bash

b. in the spawned bash shell, verify '/proc/$$/status' Mems_allowed field:
[root@localhost ~]# grep Mems_allowed /proc/$$/status
Mems_allowed:     ...,00000000,0000000f
Mems_allowed_list:	0-3

As we can check, the expected reported list would be "1,2", instead of "0-3".

The attached patch is a proposal to make sys_mbind and sys_set_mempolicy system calls properly set task_struct->mems_allowed, in order to avoid the scenario shown above.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 mm/mempolicy.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8b57173..bc966da 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -95,6 +95,14 @@
 #include <asm/uaccess.h>
 #include <linux/random.h>
 
+#ifdef CONFIG_CPUSETS
+#include <linux/cpuset.h>
+#else
+static inline void set_mems_allowed(nodemask_t nodemask)
+{
+}
+#endif
+
 #include "internal.h"
 
 /* Internal flags */
@@ -1256,7 +1264,10 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
 	err = get_nodes(&nodes, nmask, maxnode);
 	if (err)
 		return err;
-	return do_mbind(start, len, mode, mode_flags, &nodes, flags);
+	err = do_mbind(start, len, mode, mode_flags, &nodes, flags);
+	if (!err)
+		set_mems_allowed(nodes);
+	return err;
 }
 
 /* Set the process memory policy */
@@ -1276,7 +1287,10 @@ SYSCALL_DEFINE3(set_mempolicy, int, mode, unsigned long __user *, nmask,
 	err = get_nodes(&nodes, nmask, maxnode);
 	if (err)
 		return err;
-	return do_set_mempolicy(mode, flags, &nodes);
+	err = do_set_mempolicy(mode, flags, &nodes);
+	if (!err)
+		set_mems_allowed(nodes);
+	return err;
 }
 
 SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
-- 
1.7.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed
  2011-08-03 12:37 [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed Rafael Aquini
@ 2011-08-03 14:19 ` Christoph Lameter
  2011-08-04  1:59 ` Andi Kleen
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Lameter @ 2011-08-03 14:19 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Stephen Wilson,
	Andrea Arcangeli, Rik van Riel, linux-kernel

On Wed, 3 Aug 2011, Rafael Aquini wrote:

> Among several other features enabled when CONFIG_CPUSETS is defined, task_struct is enhanced with the nodemask_t mems_allowed element that serves to register/report on which memory nodes the task may obtain memory. Also, two new lines that reflect the value registered at task_struct->mems_allowed are added to the '/proc/[pid]/status' file:
> 	  Mems_allowed:   ...,00000000,0000000f
> 	  Mems_allowed_list:      0-3
>
> The system calls sys_mbind and sys_set_mempolicy, which serve to cope
> with NUMA memory policies, and receive a nodemask_t parameter, do not
> set task_struct->mems_allowed accordingly to their received nodemask,
> when CONFIG_CPUSETS is defined. This unawareness causes unexpected
> values being reported at '/proc/[pid]/status' Mems_allowed fields, for
> applications relying on those syscalls, or spawned by numactl.

That is intentionally so since mbind does not restrict the memory nodes
allowed by the process. mbind means that process is directing its
allocation to a specific set of nodes. The process can still specify a
memory policy for allocation from any other node in mems_allowed.

> Despite not affecting the memory policy operation itself, the
> aforementioned unawareness is source of confusion and annoyance when one
> is trying to figure out which resources are bound to a given task.

Nope this is so for a reason.

> As we can check, the expected reported list would be "1,2", instead of "0-3".

Wrong. The process is allowed to allocate from nodes 0-3. Reporting
anything else would be misleading.

> @@ -1256,7 +1264,10 @@ SYSCALL_DEFINE6(mbind, unsigned long, start, unsigned long, len,
>  	err = get_nodes(&nodes, nmask, maxnode);
>  	if (err)
>  		return err;
> -	return do_mbind(start, len, mode, mode_flags, &nodes, flags);
> +	err = do_mbind(start, len, mode, mode_flags, &nodes, flags);
> +	if (!err)
> +		set_mems_allowed(nodes);
> +	return err;
>  }

Uhhh. set_mems_allowed() suffers from various races and cannot easiy be
used in random locations. Special serialization is required. See
cpuset_mems_allowed() and cpuset_change_task_nodemask

> @@ -1276,7 +1287,10 @@ SYSCALL_DEFINE3(set_mempolicy, int, mode, unsigned long __user *, nmask,
>  	err = get_nodes(&nodes, nmask, maxnode);
>  	if (err)
>  		return err;
> -	return do_set_mempolicy(mode, flags, &nodes);
> +	err = do_set_mempolicy(mode, flags, &nodes);
> +	if (!err)
> +		set_mems_allowed(nodes);
> +	return err;
>  }

Same issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed
  2011-08-03 12:37 [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed Rafael Aquini
  2011-08-03 14:19 ` Christoph Lameter
@ 2011-08-04  1:59 ` Andi Kleen
  2011-08-04 22:07   ` Rafael Aquini
  1 sibling, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2011-08-04  1:59 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Stephen Wilson,
	Andrea Arcangeli, Christoph Lameter, Rik van Riel, linux-kernel

Rafael Aquini <aquini@redhat.com> writes:

> Among several other features enabled when CONFIG_CPUSETS is defined,
> task_struct is enhanced with the nodemask_t mems_allowed element that
> serves to register/report on which memory nodes the task may obtain
> memory. Also, two new lines that reflect the value registered at
> task_struct->mems_allowed are added to the '/proc/[pid]/status' file:

As Christoph said this was intentionally designed this way. Originally
there was some consideration of "relative policies", but that is not
implemented and had various issues. 

They're orthogonal mechanisms.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed
  2011-08-04  1:59 ` Andi Kleen
@ 2011-08-04 22:07   ` Rafael Aquini
  0 siblings, 0 replies; 4+ messages in thread
From: Rafael Aquini @ 2011-08-04 22:07 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-mm, Andrew Morton, KOSAKI Motohiro, Stephen Wilson,
	Andrea Arcangeli, Christoph Lameter, Rik van Riel, linux-kernel

Howdy folks,

On Wed, Aug 03, 2011 at 06:59:33PM -0700, Andi Kleen wrote:
> Rafael Aquini <aquini@redhat.com> writes:
> 
> > Among several other features enabled when CONFIG_CPUSETS is defined,
> > task_struct is enhanced with the nodemask_t mems_allowed element that
> > serves to register/report on which memory nodes the task may obtain
> > memory. Also, two new lines that reflect the value registered at
> > task_struct->mems_allowed are added to the '/proc/[pid]/status' file:
> 
> As Christoph said this was intentionally designed this way. Originally
> there was some consideration of "relative policies", but that is not
> implemented and had various issues. 
> 
> They're orthogonal mechanisms.

I'd like to thank you all for taking time to look at my proposal, and
providing such a good fix for my misconceptions.

I really appreciate all your feedback.

Cheers!
-- 
Rafael Azenha Aquini <aquini@redhat.com>
Software Maintenance Engineer
Red Hat, Inc.
+55 51 3392.6288 / +55 51 9979.8008

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-08-04 22:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-03 12:37 [PATCH] mm/mempolicy.c: make sys_mbind & sys_set_mempolicy aware of task_struct->mems_allowed Rafael Aquini
2011-08-03 14:19 ` Christoph Lameter
2011-08-04  1:59 ` Andi Kleen
2011-08-04 22:07   ` Rafael Aquini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).