linux-numa.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Limit initial tasks' and top level cpuset's mems_allowed to nodes with memory
@ 2009-05-04  3:06 Lee Schermerhorn
  2009-05-04 10:02 ` Miao Xie
  0 siblings, 1 reply; 3+ messages in thread
From: Lee Schermerhorn @ 2009-05-04  3:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-numa, miaox, Mel Gorman, Doug Chapman,
	Eric Whitney, Bjorn Helgaas


Against:  2.6.20-rc3-mmotm-090428-1631

Since cpusetmm-update-tasks-mems_allowed-in-time.patch removed the call outs
to cpuset_update_task_memory_state(), tasks in the top cpuset don't get their
mems_allowed updated to just nodes with memory.  cpuset_init()initializes
the top cpuset's mems_allowed with nodes_setall() and 
cpuset_init_current_mems_allowed() and kernel_init() initialize the kernel
initialization tasks' mems_allowed to all possible nodes.  Tasks in the top
cpuset that inherit the init task's mems_allowed without modification will
have all possible nodes set.  This can be seen by examining the Mems_allowed
field in /proc/<pid>/status in such a task.

"numactl --interleave=all" also initializes the interleave node mask to all
ones, depending on the masking with mems_allowed to eliminate non-existent
nodes and nodes without memory.  As this was not happening, the interleave
policy was attempting to dereference non-existent nodes.

This patch modifies the nodes_setall() calls in two cpuset init functions and
the initialization of task #1's mems_allowed to use node_states[N_HIGH_MEMORY]. 
This mask has been initialized to contain only existing nodes with memory by
the time the respective init functions are called.

This fixes the bogus pointer deref [Nat Consumption fault on ia64] reported
in:

	[BUG] 2.6.30-rc3-mmotm-090428-1814 -- bogus pointer deref

[The time--1814--was incorrect in that subject line, but the date was correct.]

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 init/main.c     |    4 ++--
 kernel/cpuset.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.30-rc3-mmotm-090428-1631/kernel/cpuset.c
===================================================================
--- linux-2.6.30-rc3-mmotm-090428-1631.orig/kernel/cpuset.c	2009-05-03 18:26:24.000000000 -0400
+++ linux-2.6.30-rc3-mmotm-090428-1631/kernel/cpuset.c	2009-05-03 20:46:04.000000000 -0400
@@ -1846,7 +1846,7 @@ int __init cpuset_init(void)
 		BUG();
 
 	cpumask_setall(top_cpuset.cpus_allowed);
-	nodes_setall(top_cpuset.mems_allowed);
+	top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
 
 	fmeter_init(&top_cpuset.fmeter);
 	set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
@@ -2118,7 +2118,7 @@ void cpuset_cpus_allowed_locked(struct t
 
 void cpuset_init_current_mems_allowed(void)
 {
-	nodes_setall(current->mems_allowed);
+	current->mems_allowed = node_states[N_HIGH_MEMORY];
 }
 
 /**
Index: linux-2.6.30-rc3-mmotm-090428-1631/init/main.c
===================================================================
--- linux-2.6.30-rc3-mmotm-090428-1631.orig/init/main.c	2009-05-03 20:46:04.000000000 -0400
+++ linux-2.6.30-rc3-mmotm-090428-1631/init/main.c	2009-05-03 20:54:03.000000000 -0400
@@ -849,9 +849,9 @@ static int __init kernel_init(void * unu
 	lock_kernel();
 
 	/*
-	 * init can allocate pages on any node
+	 * init can allocate pages on any node with memory
 	 */
-	set_mems_allowed(node_possible_map);
+	set_mems_allowed(node_states[N_HIGH_MEMORY]);
 	/*
 	 * init can run on any cpu.
 	 */



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Limit initial tasks' and top level cpuset's mems_allowed to nodes with memory
  2009-05-04  3:06 [PATCH] Limit initial tasks' and top level cpuset's mems_allowed to nodes with memory Lee Schermerhorn
@ 2009-05-04 10:02 ` Miao Xie
  2009-05-04 15:17   ` Lee Schermerhorn
  0 siblings, 1 reply; 3+ messages in thread
From: Miao Xie @ 2009-05-04 10:02 UTC (permalink / raw)
  To: lts
  Cc: Andrew Morton, linux-mm, linux-numa, Mel Gorman, Doug Chapman,
	Eric Whitney, Bjorn Helgaas

on 2009-5-4 11:06 Lee Schermerhorn wrote:
> Against:  2.6.20-rc3-mmotm-090428-1631
> 
> Since cpusetmm-update-tasks-mems_allowed-in-time.patch removed the call outs
> to cpuset_update_task_memory_state(), tasks in the top cpuset don't get their
> mems_allowed updated to just nodes with memory.  cpuset_init()initializes
> the top cpuset's mems_allowed with nodes_setall() and 
> cpuset_init_current_mems_allowed() and kernel_init() initialize the kernel
> initialization tasks' mems_allowed to all possible nodes.  Tasks in the top
> cpuset that inherit the init task's mems_allowed without modification will
> have all possible nodes set.  This can be seen by examining the Mems_allowed
> field in /proc/<pid>/status in such a task.
> 
> "numactl --interleave=all" also initializes the interleave node mask to all
> ones, depending on the masking with mems_allowed to eliminate non-existent
> nodes and nodes without memory.  As this was not happening, the interleave
> policy was attempting to dereference non-existent nodes.
> 
> This patch modifies the nodes_setall() calls in two cpuset init functions and
> the initialization of task #1's mems_allowed to use node_states[N_HIGH_MEMORY]. 
> This mask has been initialized to contain only existing nodes with memory by
> the time the respective init functions are called.

You forget to modify the cpuset_attach(). This function will initialize the
mems_allowed of the task which is being moved into the top cpuset by node_possible_map.

Beside that, if you use node_states[N_HIGH_MEMORY] to initialize the mems_allowed
of the tasks in the top cpuset, you must update it when adding a node with memory into
the system. So you also must modify cpuset_track_online_nodes().

Thanks
Miao

> 
> This fixes the bogus pointer deref [Nat Consumption fault on ia64] reported
> in:
> 
> 	[BUG] 2.6.30-rc3-mmotm-090428-1814 -- bogus pointer deref
> 
> [The time--1814--was incorrect in that subject line, but the date was correct.]
> 
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  init/main.c     |    4 ++--
>  kernel/cpuset.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6.30-rc3-mmotm-090428-1631/kernel/cpuset.c
> ===================================================================
> --- linux-2.6.30-rc3-mmotm-090428-1631.orig/kernel/cpuset.c	2009-05-03 18:26:24.000000000 -0400
> +++ linux-2.6.30-rc3-mmotm-090428-1631/kernel/cpuset.c	2009-05-03 20:46:04.000000000 -0400
> @@ -1846,7 +1846,7 @@ int __init cpuset_init(void)
>  		BUG();
>  
>  	cpumask_setall(top_cpuset.cpus_allowed);
> -	nodes_setall(top_cpuset.mems_allowed);
> +	top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
>  
>  	fmeter_init(&top_cpuset.fmeter);
>  	set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
> @@ -2118,7 +2118,7 @@ void cpuset_cpus_allowed_locked(struct t
>  
>  void cpuset_init_current_mems_allowed(void)
>  {
> -	nodes_setall(current->mems_allowed);
> +	current->mems_allowed = node_states[N_HIGH_MEMORY];
>  }
>  
>  /**
> Index: linux-2.6.30-rc3-mmotm-090428-1631/init/main.c
> ===================================================================
> --- linux-2.6.30-rc3-mmotm-090428-1631.orig/init/main.c	2009-05-03 20:46:04.000000000 -0400
> +++ linux-2.6.30-rc3-mmotm-090428-1631/init/main.c	2009-05-03 20:54:03.000000000 -0400
> @@ -849,9 +849,9 @@ static int __init kernel_init(void * unu
>  	lock_kernel();
>  
>  	/*
> -	 * init can allocate pages on any node
> +	 * init can allocate pages on any node with memory
>  	 */
> -	set_mems_allowed(node_possible_map);
> +	set_mems_allowed(node_states[N_HIGH_MEMORY]);
>  	/*
>  	 * init can run on any cpu.
>  	 */
> 
> 
> 
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Limit initial tasks' and top level cpuset's mems_allowed to nodes with memory
  2009-05-04 10:02 ` Miao Xie
@ 2009-05-04 15:17   ` Lee Schermerhorn
  0 siblings, 0 replies; 3+ messages in thread
From: Lee Schermerhorn @ 2009-05-04 15:17 UTC (permalink / raw)
  To: miaox
  Cc: lts, Andrew Morton, linux-mm, linux-numa, Mel Gorman,
	Doug Chapman, Eric Whitney, Bjorn Helgaas

On Mon, 2009-05-04 at 18:02 +0800, Miao Xie wrote: 
> on 2009-5-4 11:06 Lee Schermerhorn wrote:
> > Against:  2.6.20-rc3-mmotm-090428-1631
> > 
> > Since cpusetmm-update-tasks-mems_allowed-in-time.patch removed the call outs
> > to cpuset_update_task_memory_state(), tasks in the top cpuset don't get their
> > mems_allowed updated to just nodes with memory.  cpuset_init()initializes
> > the top cpuset's mems_allowed with nodes_setall() and 
> > cpuset_init_current_mems_allowed() and kernel_init() initialize the kernel
> > initialization tasks' mems_allowed to all possible nodes.  Tasks in the top
> > cpuset that inherit the init task's mems_allowed without modification will
> > have all possible nodes set.  This can be seen by examining the Mems_allowed
> > field in /proc/<pid>/status in such a task.
> > 
> > "numactl --interleave=all" also initializes the interleave node mask to all
> > ones, depending on the masking with mems_allowed to eliminate non-existent
> > nodes and nodes without memory.  As this was not happening, the interleave
> > policy was attempting to dereference non-existent nodes.
> > 
> > This patch modifies the nodes_setall() calls in two cpuset init functions and
> > the initialization of task #1's mems_allowed to use node_states[N_HIGH_MEMORY]. 
> > This mask has been initialized to contain only existing nodes with memory by
> > the time the respective init functions are called.
> 
> You forget to modify the cpuset_attach(). This function will initialize the
> mems_allowed of the task which is being moved into the top cpuset by node_possible_map.

Thanks, I'll look at that.  I had tested moving tasks between cpusets
and thought that it was working, but I'd been looking at this for a
while and could have been imagining it.  I'll look for all uses of
node_possible_map, etc.

> 
> Beside that, if you use node_states[N_HIGH_MEMORY] to initialize the mems_allowed
> of the tasks in the top cpuset, you must update it when adding a node with memory into
> the system. So you also must modify cpuset_track_online_nodes().

So, we'll need to walk the tasks in the top-level cpuset and update
their mems_allowed on node on/off-line.  I'd have thought we already did
that, but must admit I didn't check.  I'll take a look at how
cpuset_track_online_nodes() interacts with mems_allowed, ...

Lee




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-05-04 15:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-04  3:06 [PATCH] Limit initial tasks' and top level cpuset's mems_allowed to nodes with memory Lee Schermerhorn
2009-05-04 10:02 ` Miao Xie
2009-05-04 15:17   ` Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).