Re: Fw: 2.6.16 crashes when running numastat on p575

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Re: Fw: 2.6.16 crashes when running numastat on p575
       [not found] <20060402213216.2e61b74e.akpm@osdl.org>
@ 2006-04-03  5:12 ` Christoph Lameter
  2006-04-03  5:15   ` Paul Jackson
  2006-04-03  8:09   ` Sonny Rao
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03  5:12 UTC (permalink / raw)
  To: Sonny Rao; +Cc: Andrew Morton, linuxppc-dev, ak, Paul Jackson, linux-kernel

On Sun, 2 Apr 2006, Andrew Morton wrote:

> I have a vague feeling that you guys worked on numastat?

This is mostly Andi's code although I added the zone_pcp() stuff. This is 
failing because zone_pcp() only returns valid information for online 
processors.

Initially all zone_pcps() point to the boot_cpuset (see zone_pcp_init)
and therefore zone_pcp) is always valid and we do not see this bug. But 
if someone downs a processor or a processor dies then free_zone_pageset() 
is called which will set zone_pcp() = NULL.

Fix NULL pointer dereference in node_read_numastat()

zone_pcp() only returns valid values if the processor is online.

Change node_read_numastat() to only scan online processors.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.16/drivers/base/node.c
===================================================================
--- linux-2.6.16.orig/drivers/base/node.c	2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/drivers/base/node.c	2006-04-02 21:59:49.000000000 -0700
@@ -106,7 +106,7 @@ static ssize_t node_read_numastat(struct
 	other_node = 0;
 	for (i = 0; i < MAX_NR_ZONES; i++) {
 		struct zone *z = &pg->node_zones[i];
-		for (cpu = 0; cpu < NR_CPUS; cpu++) {
+		for_each_online_cpu(cpu) {
 			struct per_cpu_pageset *ps = zone_pcp(z,cpu);
 			numa_hit += ps->numa_hit;
 			numa_miss += ps->numa_miss;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
@ 2006-04-03  5:15   ` Paul Jackson
  2006-04-03  5:25     ` Christoph Lameter
  2006-04-03 11:49     ` Andi Kleen
  2006-04-03  8:09   ` Sonny Rao
  1 sibling, 2 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-03  5:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, linux-kernel

-		for (cpu = 0; cpu < NR_CPUS; cpu++) {
+		for_each_online_cpu(cpu) {

Idle curiosity -- what keeps a cpu from going offline during
this scan, and leaving us with the same crash as before?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:15   ` Paul Jackson
@ 2006-04-03  5:25     ` Christoph Lameter
  2006-04-03  6:43       ` Paul Jackson
  2006-04-03 14:10       ` Nathan Lynch
  2006-04-03 11:49     ` Andi Kleen
  1 sibling, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03  5:25 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, linuxppc-dev, ak, linux-kernel

On Sun, 2 Apr 2006, Paul Jackson wrote:

> -		for (cpu = 0; cpu < NR_CPUS; cpu++) {
> +		for_each_online_cpu(cpu) {
> 
> Idle curiosity -- what keeps a cpu from going offline during
> this scan, and leaving us with the same crash as before?

Nothing keeps a processor from going offline. We could take the hotplug 
lock for every for_each_online_cpu() in the kernel. Could you take that 
up with the hotplug folks?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:25     ` Christoph Lameter
@ 2006-04-03  6:43       ` Paul Jackson
  2006-04-03 14:10       ` Nathan Lynch
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-03  6:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, linux-kernel

Christoph wrote:
>  Could you take that up with the hotplug folks?

Hmmm ... I suddenly notice that I'm on vacation
for the next two weeks.  Sorry <grin>.

In any case, I'll have to leave that one for
the hotplug folks to sort out.  I have other
priorities and limited talents.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
  2006-04-03  5:15   ` Paul Jackson
@ 2006-04-03  8:09   ` Sonny Rao
  1 sibling, 0 replies; 13+ messages in thread
From: Sonny Rao @ 2006-04-03  8:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Paul Jackson, linux-kernel, linuxppc-dev, ak

On Sun, Apr 02, 2006 at 10:12:29PM -0700, Christoph Lameter wrote:
> On Sun, 2 Apr 2006, Andrew Morton wrote:
> 
> > I have a vague feeling that you guys worked on numastat?
> 
> This is mostly Andi's code although I added the zone_pcp() stuff. This is 
> failing because zone_pcp() only returns valid information for online 
> processors.
> 
> Initially all zone_pcps() point to the boot_cpuset (see zone_pcp_init)
> and therefore zone_pcp) is always valid and we do not see this bug. But 
> if someone downs a processor or a processor dies then free_zone_pageset() 
> is called which will set zone_pcp() = NULL.
> 
> 
> Fix NULL pointer dereference in node_read_numastat()
> 
> zone_pcp() only returns valid values if the processor is online.
> 
> Change node_read_numastat() to only scan online processors.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.16/drivers/base/node.c
> ===================================================================
> --- linux-2.6.16.orig/drivers/base/node.c	2006-03-19 21:53:29.000000000 -0800
> +++ linux-2.6.16/drivers/base/node.c	2006-04-02 21:59:49.000000000 -0700
> @@ -106,7 +106,7 @@ static ssize_t node_read_numastat(struct
>  	other_node = 0;
>  	for (i = 0; i < MAX_NR_ZONES; i++) {
>  		struct zone *z = &pg->node_zones[i];
> -		for (cpu = 0; cpu < NR_CPUS; cpu++) {
> +		for_each_online_cpu(cpu) {
>  			struct per_cpu_pageset *ps = zone_pcp(z,cpu);
>  			numa_hit += ps->numa_hit;
>  			numa_miss += ps->numa_miss;


Works, thanks.

Sonny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:15   ` Paul Jackson
  2006-04-03  5:25     ` Christoph Lameter
@ 2006-04-03 11:49     ` Andi Kleen
  2006-04-03 14:18       ` Nathan Lynch
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2006-04-03 11:49 UTC (permalink / raw)
  To: Paul Jackson; +Cc: akpm, linux-kernel, linuxppc-dev, ak, Christoph Lameter

On Monday 03 April 2006 07:15, Paul Jackson wrote:
> -		for (cpu = 0; cpu < NR_CPUS; cpu++) {
> +		for_each_online_cpu(cpu) {
> 
> Idle curiosity -- what keeps a cpu from going offline during
> this scan, and leaving us with the same crash as before?


CPU hotdown uses RCU like techniques to avoid this. Only potential
problem could be on a preemptive kernel, but I hope nobody tries
cpu unplug on such a beast. The later could be probably fixed
with a rcu_read_lock() or somesuch.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03  5:25     ` Christoph Lameter
  2006-04-03  6:43       ` Paul Jackson
@ 2006-04-03 14:10       ` Nathan Lynch
  2006-04-03 17:42         ` Christoph Lameter
  1 sibling, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 14:10 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel

Christoph Lameter wrote:
> On Sun, 2 Apr 2006, Paul Jackson wrote:
> 
> > -		for (cpu = 0; cpu < NR_CPUS; cpu++) {
> > +		for_each_online_cpu(cpu) {
> > 
> > Idle curiosity -- what keeps a cpu from going offline during
> > this scan, and leaving us with the same crash as before?
> 
> Nothing keeps a processor from going offline. We could take the hotplug 
> lock for every for_each_online_cpu() in the kernel.

In this case, disabling preempt around the for_each_online_cpu loop
would prevent any cpu from going down in the meantime.  But since this
function doesn't look like it's a hot path, and we're potentially
traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.

As Paul noted, the fix as it stands isn't adequate.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 11:49     ` Andi Kleen
@ 2006-04-03 14:18       ` Nathan Lynch
  0 siblings, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 14:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: akpm, ak, linux-kernel, linuxppc-dev, Paul Jackson,
	Christoph Lameter

Andi Kleen wrote:
> On Monday 03 April 2006 07:15, Paul Jackson wrote:
> > -		for (cpu = 0; cpu < NR_CPUS; cpu++) {
> > +		for_each_online_cpu(cpu) {
> > 
> > Idle curiosity -- what keeps a cpu from going offline during
> > this scan, and leaving us with the same crash as before?
> 
> 
> CPU hotdown uses RCU like techniques to avoid this. Only potential
> problem could be on a preemptive kernel, but I hope nobody tries
> cpu unplug on such a beast.

I always turn on preempt when testing cpu hotplug (usually on ppc64).
There were several preempt vs cpu hotplug issues until around 2.6.10
or so, iirc, but I think the situation has been stable for a while
now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 14:10       ` Nathan Lynch
@ 2006-04-03 17:42         ` Christoph Lameter
  2006-04-03 18:01           ` Nathan Lynch
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 17:42 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel


On Mon, 3 Apr 2006, Nathan Lynch wrote:

> In this case, disabling preempt around the for_each_online_cpu loop
> would prevent any cpu from going down in the meantime.  But since this
> function doesn't look like it's a hot path, and we're potentially
> traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.
> 
> As Paul noted, the fix as it stands isn't adequate.

There are many other for_each_*_cpu loops in the kernel that do not have 
any of the instrumentation you suggest. I suggest you come up with a 
general solution and then go through all of them and fix this. Please be 
aware that many of these loops are performance critical.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 17:42         ` Christoph Lameter
@ 2006-04-03 18:01           ` Nathan Lynch
  2006-04-03 18:08             ` Christoph Lameter
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 18:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel

Christoph Lameter wrote:
> 
> On Mon, 3 Apr 2006, Nathan Lynch wrote:
> 
> > In this case, disabling preempt around the for_each_online_cpu loop
> > would prevent any cpu from going down in the meantime.  But since this
> > function doesn't look like it's a hot path, and we're potentially
> > traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.
> > 
> > As Paul noted, the fix as it stands isn't adequate.
> 
> There are many other for_each_*_cpu loops in the kernel that do not have 
> any of the instrumentation you suggest. I suggest you come up with a 
> general solution and then go through all of them and fix this. Please be 
> aware that many of these loops are performance critical.

But this one isn't, right?

And I'm afraid there's a misunderstanding here -- only
for_each_online_cpu (or accessing the cpu online map in general) has
such restrictions -- for_each_possible_cpu doesn't require any locking
or preempt tricks since cpu_possible_map must not change after boot.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 18:01           ` Nathan Lynch
@ 2006-04-03 18:08             ` Christoph Lameter
  2006-04-03 21:18               ` Andrew Morton
  2006-04-04  0:25               ` Paul Jackson
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 18:08 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: akpm, linuxppc-dev, pj, ak, linux-kernel

On Mon, 3 Apr 2006, Nathan Lynch wrote:

> > There are many other for_each_*_cpu loops in the kernel that do not have 
> > any of the instrumentation you suggest. I suggest you come up with a 
> > general solution and then go through all of them and fix this. Please be 
> > aware that many of these loops are performance critical.
> 
> But this one isn't, right?

Right. One could use more expensive processing here.

> And I'm afraid there's a misunderstanding here -- only
> for_each_online_cpu (or accessing the cpu online map in general) has
> such restrictions -- for_each_possible_cpu doesn't require any locking
> or preempt tricks since cpu_possible_map must not change after boot.

Correct. We may want to audit the kernel and check that each 
for_each_possible_cpu or for_each_cpu is really correct. Developers 
frequently assume that all processors are up. There may be some 
complicated interactions with cpusets. Adding Paul to this.

However, note that I am not interested hotplug functionality. It is going 
to be a difficult task to make the kernel shutdown processors correctly. I 
can give you feedback but I am not going to do this work.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 18:08             ` Christoph Lameter
@ 2006-04-03 21:18               ` Andrew Morton
  2006-04-04  0:25               ` Paul Jackson
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2006-04-03 21:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linuxppc-dev, pj, ntl, ak, linux-kernel

Christoph Lameter <clameter@sgi.com> wrote:
>
> On Mon, 3 Apr 2006, Nathan Lynch wrote:
> 
> > > There are many other for_each_*_cpu loops in the kernel that do not have 
> > > any of the instrumentation you suggest. I suggest you come up with a 
> > > general solution and then go through all of them and fix this. Please be 
> > > aware that many of these loops are performance critical.
> > 
> > But this one isn't, right?
> 
> Right. One could use more expensive processing here.

Hopefully none of the for_each_foo() loops are performance-critical - those
things are expensive.

> > And I'm afraid there's a misunderstanding here -- only
> > for_each_online_cpu (or accessing the cpu online map in general) has
> > such restrictions -- for_each_possible_cpu doesn't require any locking
> > or preempt tricks since cpu_possible_map must not change after boot.

for_each_present_cpu() presumably has the same problems.

> Correct. We may want to audit the kernel and check that each 
> for_each_possible_cpu or for_each_cpu is really correct.

A fair bit of that has been happening in recent weeks.

But yes, we should be protecting these things with rcu_read_lock() if
possible, lock_cpu_hotplug() otherwise.

(rcu_read_lock() might not be the appropriate name for this operation -
maybe it should be an open-coded preempt_disable().  Or some other suitably
named alias; dunno).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Fw: 2.6.16 crashes when running numastat on p575
  2006-04-03 18:08             ` Christoph Lameter
  2006-04-03 21:18               ` Andrew Morton
@ 2006-04-04  0:25               ` Paul Jackson
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-04  0:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, ntl, linux-kernel

Christoph wrote:
> There may be some complicated interactions with cpusets.

Hopefully not, but it's possible.  I did my best not to assume that
cpus or memory nodes stayed online, and to build in fallbacks if one
of them disappeared out from under me.

But the code has never been tested to these assumptions, nor even
carefully reviewed to them.

So something will likely break somewhere, though quite possibly not
in the cpuset code itself, but in the scheduler or allocator code
that depends on selecting an online cpu or memory for some purpose.

Someone from the hotplug side will have to look at this, with an
understanding of how the coming and going of cpus and memory nodes
can affect the scheduler and allocators.

Look at the callers of cpuset_cpus_allowed(), cpuset_mems_allowed(),
cpuset_update_task_memory_state(), and cpuset_zone_allowed() to see
the likely hot spots for interaction between cpusets and hotplug.

The callers of these routines are important hooks into the scheduler
and allocator, and the question needs to be asked what happens if
a cpu or memory node goes offline after one of these cpuset_*()
routines is called before the scheduler or allocator finishes doing
what it is doing with what it thought was a usable cpu or node.

Also, there is currently no hotplug aware code that I know of in
the kernel/cpuset.c code, so each of the references to the globals
cpu_online_map and node_online_map therein are suspect.  Presumably as
anywhere else in the kernel these appear, or the for_each_online_*()
macros appear, the code needs to be examined for perhaps needing some
sort of guard.

So, in sum, the kernel needs to be inspected at the location of at
least each of the following calls or references, for possible problems
if the cpus or nodes online change unexpectedly:

 any_online_node()
 cpu_online_map
 cpu_present()
 cpu_present_map
 cpuset_cpus_allowed()
 cpuset_mems_allowed()
 cpuset_update_task_memory_state()
 cpuset_zone_allowed()
 for_each_online_cpu()
 for_each_online_node()
 for_each_present_cpu()
 node_online_map
 num_online_cpus()
 num_online_nodes()
 num_present_cpus()

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-04-04  0:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060402213216.2e61b74e.akpm@osdl.org>
2006-04-03  5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
2006-04-03  5:15   ` Paul Jackson
2006-04-03  5:25     ` Christoph Lameter
2006-04-03  6:43       ` Paul Jackson
2006-04-03 14:10       ` Nathan Lynch
2006-04-03 17:42         ` Christoph Lameter
2006-04-03 18:01           ` Nathan Lynch
2006-04-03 18:08             ` Christoph Lameter
2006-04-03 21:18               ` Andrew Morton
2006-04-04  0:25               ` Paul Jackson
2006-04-03 11:49     ` Andi Kleen
2006-04-03 14:18       ` Nathan Lynch
2006-04-03  8:09   ` Sonny Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).