* Re: Fw: 2.6.16 crashes when running numastat on p575
[not found] <20060402213216.2e61b74e.akpm@osdl.org>
@ 2006-04-03 5:12 ` Christoph Lameter
2006-04-03 5:15 ` Paul Jackson
2006-04-03 8:09 ` Sonny Rao
0 siblings, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 5:12 UTC (permalink / raw)
To: Sonny Rao; +Cc: Andrew Morton, linuxppc-dev, ak, Paul Jackson, linux-kernel
On Sun, 2 Apr 2006, Andrew Morton wrote:
> I have a vague feeling that you guys worked on numastat?
This is mostly Andi's code although I added the zone_pcp() stuff. This is
failing because zone_pcp() only returns valid information for online
processors.
Initially all zone_pcps() point to the boot_cpuset (see zone_pcp_init)
and therefore zone_pcp) is always valid and we do not see this bug. But
if someone downs a processor or a processor dies then free_zone_pageset()
is called which will set zone_pcp() = NULL.
Fix NULL pointer dereference in node_read_numastat()
zone_pcp() only returns valid values if the processor is online.
Change node_read_numastat() to only scan online processors.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.16/drivers/base/node.c
===================================================================
--- linux-2.6.16.orig/drivers/base/node.c 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16/drivers/base/node.c 2006-04-02 21:59:49.000000000 -0700
@@ -106,7 +106,7 @@ static ssize_t node_read_numastat(struct
other_node = 0;
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *z = &pg->node_zones[i];
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_online_cpu(cpu) {
struct per_cpu_pageset *ps = zone_pcp(z,cpu);
numa_hit += ps->numa_hit;
numa_miss += ps->numa_miss;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
@ 2006-04-03 5:15 ` Paul Jackson
2006-04-03 5:25 ` Christoph Lameter
2006-04-03 11:49 ` Andi Kleen
2006-04-03 8:09 ` Sonny Rao
1 sibling, 2 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-03 5:15 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, linux-kernel
- for (cpu = 0; cpu < NR_CPUS; cpu++) {
+ for_each_online_cpu(cpu) {
Idle curiosity -- what keeps a cpu from going offline during
this scan, and leaving us with the same crash as before?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:15 ` Paul Jackson
@ 2006-04-03 5:25 ` Christoph Lameter
2006-04-03 6:43 ` Paul Jackson
2006-04-03 14:10 ` Nathan Lynch
2006-04-03 11:49 ` Andi Kleen
1 sibling, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 5:25 UTC (permalink / raw)
To: Paul Jackson; +Cc: akpm, linuxppc-dev, ak, linux-kernel
On Sun, 2 Apr 2006, Paul Jackson wrote:
> - for (cpu = 0; cpu < NR_CPUS; cpu++) {
> + for_each_online_cpu(cpu) {
>
> Idle curiosity -- what keeps a cpu from going offline during
> this scan, and leaving us with the same crash as before?
Nothing keeps a processor from going offline. We could take the hotplug
lock for every for_each_online_cpu() in the kernel. Could you take that
up with the hotplug folks?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:25 ` Christoph Lameter
@ 2006-04-03 6:43 ` Paul Jackson
2006-04-03 14:10 ` Nathan Lynch
1 sibling, 0 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-03 6:43 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, linux-kernel
Christoph wrote:
> Could you take that up with the hotplug folks?
Hmmm ... I suddenly notice that I'm on vacation
for the next two weeks. Sorry <grin>.
In any case, I'll have to leave that one for
the hotplug folks to sort out. I have other
priorities and limited talents.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
2006-04-03 5:15 ` Paul Jackson
@ 2006-04-03 8:09 ` Sonny Rao
1 sibling, 0 replies; 13+ messages in thread
From: Sonny Rao @ 2006-04-03 8:09 UTC (permalink / raw)
To: Christoph Lameter
Cc: Andrew Morton, Paul Jackson, linux-kernel, linuxppc-dev, ak
On Sun, Apr 02, 2006 at 10:12:29PM -0700, Christoph Lameter wrote:
> On Sun, 2 Apr 2006, Andrew Morton wrote:
>
> > I have a vague feeling that you guys worked on numastat?
>
> This is mostly Andi's code although I added the zone_pcp() stuff. This is
> failing because zone_pcp() only returns valid information for online
> processors.
>
> Initially all zone_pcps() point to the boot_cpuset (see zone_pcp_init)
> and therefore zone_pcp) is always valid and we do not see this bug. But
> if someone downs a processor or a processor dies then free_zone_pageset()
> is called which will set zone_pcp() = NULL.
>
>
> Fix NULL pointer dereference in node_read_numastat()
>
> zone_pcp() only returns valid values if the processor is online.
>
> Change node_read_numastat() to only scan online processors.
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> Index: linux-2.6.16/drivers/base/node.c
> ===================================================================
> --- linux-2.6.16.orig/drivers/base/node.c 2006-03-19 21:53:29.000000000 -0800
> +++ linux-2.6.16/drivers/base/node.c 2006-04-02 21:59:49.000000000 -0700
> @@ -106,7 +106,7 @@ static ssize_t node_read_numastat(struct
> other_node = 0;
> for (i = 0; i < MAX_NR_ZONES; i++) {
> struct zone *z = &pg->node_zones[i];
> - for (cpu = 0; cpu < NR_CPUS; cpu++) {
> + for_each_online_cpu(cpu) {
> struct per_cpu_pageset *ps = zone_pcp(z,cpu);
> numa_hit += ps->numa_hit;
> numa_miss += ps->numa_miss;
Works, thanks.
Sonny
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:15 ` Paul Jackson
2006-04-03 5:25 ` Christoph Lameter
@ 2006-04-03 11:49 ` Andi Kleen
2006-04-03 14:18 ` Nathan Lynch
1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2006-04-03 11:49 UTC (permalink / raw)
To: Paul Jackson; +Cc: akpm, linux-kernel, linuxppc-dev, ak, Christoph Lameter
On Monday 03 April 2006 07:15, Paul Jackson wrote:
> - for (cpu = 0; cpu < NR_CPUS; cpu++) {
> + for_each_online_cpu(cpu) {
>
> Idle curiosity -- what keeps a cpu from going offline during
> this scan, and leaving us with the same crash as before?
CPU hotdown uses RCU like techniques to avoid this. Only potential
problem could be on a preemptive kernel, but I hope nobody tries
cpu unplug on such a beast. The later could be probably fixed
with a rcu_read_lock() or somesuch.
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 5:25 ` Christoph Lameter
2006-04-03 6:43 ` Paul Jackson
@ 2006-04-03 14:10 ` Nathan Lynch
2006-04-03 17:42 ` Christoph Lameter
1 sibling, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 14:10 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel
Christoph Lameter wrote:
> On Sun, 2 Apr 2006, Paul Jackson wrote:
>
> > - for (cpu = 0; cpu < NR_CPUS; cpu++) {
> > + for_each_online_cpu(cpu) {
> >
> > Idle curiosity -- what keeps a cpu from going offline during
> > this scan, and leaving us with the same crash as before?
>
> Nothing keeps a processor from going offline. We could take the hotplug
> lock for every for_each_online_cpu() in the kernel.
In this case, disabling preempt around the for_each_online_cpu loop
would prevent any cpu from going down in the meantime. But since this
function doesn't look like it's a hot path, and we're potentially
traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.
As Paul noted, the fix as it stands isn't adequate.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 11:49 ` Andi Kleen
@ 2006-04-03 14:18 ` Nathan Lynch
0 siblings, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 14:18 UTC (permalink / raw)
To: Andi Kleen
Cc: akpm, ak, linux-kernel, linuxppc-dev, Paul Jackson,
Christoph Lameter
Andi Kleen wrote:
> On Monday 03 April 2006 07:15, Paul Jackson wrote:
> > - for (cpu = 0; cpu < NR_CPUS; cpu++) {
> > + for_each_online_cpu(cpu) {
> >
> > Idle curiosity -- what keeps a cpu from going offline during
> > this scan, and leaving us with the same crash as before?
>
>
> CPU hotdown uses RCU like techniques to avoid this. Only potential
> problem could be on a preemptive kernel, but I hope nobody tries
> cpu unplug on such a beast.
I always turn on preempt when testing cpu hotplug (usually on ppc64).
There were several preempt vs cpu hotplug issues until around 2.6.10
or so, iirc, but I think the situation has been stable for a while
now.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 14:10 ` Nathan Lynch
@ 2006-04-03 17:42 ` Christoph Lameter
2006-04-03 18:01 ` Nathan Lynch
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 17:42 UTC (permalink / raw)
To: Nathan Lynch; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel
On Mon, 3 Apr 2006, Nathan Lynch wrote:
> In this case, disabling preempt around the for_each_online_cpu loop
> would prevent any cpu from going down in the meantime. But since this
> function doesn't look like it's a hot path, and we're potentially
> traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.
>
> As Paul noted, the fix as it stands isn't adequate.
There are many other for_each_*_cpu loops in the kernel that do not have
any of the instrumentation you suggest. I suggest you come up with a
general solution and then go through all of them and fix this. Please be
aware that many of these loops are performance critical.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 17:42 ` Christoph Lameter
@ 2006-04-03 18:01 ` Nathan Lynch
2006-04-03 18:08 ` Christoph Lameter
0 siblings, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2006-04-03 18:01 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linuxppc-dev, Paul Jackson, ak, linux-kernel
Christoph Lameter wrote:
>
> On Mon, 3 Apr 2006, Nathan Lynch wrote:
>
> > In this case, disabling preempt around the for_each_online_cpu loop
> > would prevent any cpu from going down in the meantime. But since this
> > function doesn't look like it's a hot path, and we're potentially
> > traversing lots of zones and cpus, lock_cpu_hotplug might be preferable.
> >
> > As Paul noted, the fix as it stands isn't adequate.
>
> There are many other for_each_*_cpu loops in the kernel that do not have
> any of the instrumentation you suggest. I suggest you come up with a
> general solution and then go through all of them and fix this. Please be
> aware that many of these loops are performance critical.
But this one isn't, right?
And I'm afraid there's a misunderstanding here -- only
for_each_online_cpu (or accessing the cpu online map in general) has
such restrictions -- for_each_possible_cpu doesn't require any locking
or preempt tricks since cpu_possible_map must not change after boot.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 18:01 ` Nathan Lynch
@ 2006-04-03 18:08 ` Christoph Lameter
2006-04-03 21:18 ` Andrew Morton
2006-04-04 0:25 ` Paul Jackson
0 siblings, 2 replies; 13+ messages in thread
From: Christoph Lameter @ 2006-04-03 18:08 UTC (permalink / raw)
To: Nathan Lynch; +Cc: akpm, linuxppc-dev, pj, ak, linux-kernel
On Mon, 3 Apr 2006, Nathan Lynch wrote:
> > There are many other for_each_*_cpu loops in the kernel that do not have
> > any of the instrumentation you suggest. I suggest you come up with a
> > general solution and then go through all of them and fix this. Please be
> > aware that many of these loops are performance critical.
>
> But this one isn't, right?
Right. One could use more expensive processing here.
> And I'm afraid there's a misunderstanding here -- only
> for_each_online_cpu (or accessing the cpu online map in general) has
> such restrictions -- for_each_possible_cpu doesn't require any locking
> or preempt tricks since cpu_possible_map must not change after boot.
Correct. We may want to audit the kernel and check that each
for_each_possible_cpu or for_each_cpu is really correct. Developers
frequently assume that all processors are up. There may be some
complicated interactions with cpusets. Adding Paul to this.
However, note that I am not interested hotplug functionality. It is going
to be a difficult task to make the kernel shutdown processors correctly. I
can give you feedback but I am not going to do this work.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 18:08 ` Christoph Lameter
@ 2006-04-03 21:18 ` Andrew Morton
2006-04-04 0:25 ` Paul Jackson
1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2006-04-03 21:18 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linuxppc-dev, pj, ntl, ak, linux-kernel
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Mon, 3 Apr 2006, Nathan Lynch wrote:
>
> > > There are many other for_each_*_cpu loops in the kernel that do not have
> > > any of the instrumentation you suggest. I suggest you come up with a
> > > general solution and then go through all of them and fix this. Please be
> > > aware that many of these loops are performance critical.
> >
> > But this one isn't, right?
>
> Right. One could use more expensive processing here.
Hopefully none of the for_each_foo() loops are performance-critical - those
things are expensive.
> > And I'm afraid there's a misunderstanding here -- only
> > for_each_online_cpu (or accessing the cpu online map in general) has
> > such restrictions -- for_each_possible_cpu doesn't require any locking
> > or preempt tricks since cpu_possible_map must not change after boot.
for_each_present_cpu() presumably has the same problems.
> Correct. We may want to audit the kernel and check that each
> for_each_possible_cpu or for_each_cpu is really correct.
A fair bit of that has been happening in recent weeks.
But yes, we should be protecting these things with rcu_read_lock() if
possible, lock_cpu_hotplug() otherwise.
(rcu_read_lock() might not be the appropriate name for this operation -
maybe it should be an open-coded preempt_disable(). Or some other suitably
named alias; dunno).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fw: 2.6.16 crashes when running numastat on p575
2006-04-03 18:08 ` Christoph Lameter
2006-04-03 21:18 ` Andrew Morton
@ 2006-04-04 0:25 ` Paul Jackson
1 sibling, 0 replies; 13+ messages in thread
From: Paul Jackson @ 2006-04-04 0:25 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, linuxppc-dev, ak, ntl, linux-kernel
Christoph wrote:
> There may be some complicated interactions with cpusets.
Hopefully not, but it's possible. I did my best not to assume that
cpus or memory nodes stayed online, and to build in fallbacks if one
of them disappeared out from under me.
But the code has never been tested to these assumptions, nor even
carefully reviewed to them.
So something will likely break somewhere, though quite possibly not
in the cpuset code itself, but in the scheduler or allocator code
that depends on selecting an online cpu or memory for some purpose.
Someone from the hotplug side will have to look at this, with an
understanding of how the coming and going of cpus and memory nodes
can affect the scheduler and allocators.
Look at the callers of cpuset_cpus_allowed(), cpuset_mems_allowed(),
cpuset_update_task_memory_state(), and cpuset_zone_allowed() to see
the likely hot spots for interaction between cpusets and hotplug.
The callers of these routines are important hooks into the scheduler
and allocator, and the question needs to be asked what happens if
a cpu or memory node goes offline after one of these cpuset_*()
routines is called before the scheduler or allocator finishes doing
what it is doing with what it thought was a usable cpu or node.
Also, there is currently no hotplug aware code that I know of in
the kernel/cpuset.c code, so each of the references to the globals
cpu_online_map and node_online_map therein are suspect. Presumably as
anywhere else in the kernel these appear, or the for_each_online_*()
macros appear, the code needs to be examined for perhaps needing some
sort of guard.
So, in sum, the kernel needs to be inspected at the location of at
least each of the following calls or references, for possible problems
if the cpus or nodes online change unexpectedly:
any_online_node()
cpu_online_map
cpu_present()
cpu_present_map
cpuset_cpus_allowed()
cpuset_mems_allowed()
cpuset_update_task_memory_state()
cpuset_zone_allowed()
for_each_online_cpu()
for_each_online_node()
for_each_present_cpu()
node_online_map
num_online_cpus()
num_online_nodes()
num_present_cpus()
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-04-04 0:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060402213216.2e61b74e.akpm@osdl.org>
2006-04-03 5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
2006-04-03 5:15 ` Paul Jackson
2006-04-03 5:25 ` Christoph Lameter
2006-04-03 6:43 ` Paul Jackson
2006-04-03 14:10 ` Nathan Lynch
2006-04-03 17:42 ` Christoph Lameter
2006-04-03 18:01 ` Nathan Lynch
2006-04-03 18:08 ` Christoph Lameter
2006-04-03 21:18 ` Andrew Morton
2006-04-04 0:25 ` Paul Jackson
2006-04-03 11:49 ` Andi Kleen
2006-04-03 14:18 ` Nathan Lynch
2006-04-03 8:09 ` Sonny Rao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).