linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: akpm@osdl.org, linuxppc-dev@ozlabs.org, ak@suse.com,
	ntl@pobox.com, linux-kernel@vger.kernel.org
Subject: Re: Fw: 2.6.16 crashes when running numastat on p575
Date: Mon, 3 Apr 2006 17:25:33 -0700	[thread overview]
Message-ID: <20060403172533.f03f1ba2.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0604031104110.20903@schroedinger.engr.sgi.com>

Christoph wrote:
> There may be some complicated interactions with cpusets.

Hopefully not, but it's possible.  I did my best not to assume that
cpus or memory nodes stayed online, and to build in fallbacks if one
of them disappeared out from under me.

But the code has never been tested to these assumptions, nor even
carefully reviewed to them.

So something will likely break somewhere, though quite possibly not
in the cpuset code itself, but in the scheduler or allocator code
that depends on selecting an online cpu or memory for some purpose.

Someone from the hotplug side will have to look at this, with an
understanding of how the coming and going of cpus and memory nodes
can affect the scheduler and allocators.

Look at the callers of cpuset_cpus_allowed(), cpuset_mems_allowed(),
cpuset_update_task_memory_state(), and cpuset_zone_allowed() to see
the likely hot spots for interaction between cpusets and hotplug.

The callers of these routines are important hooks into the scheduler
and allocator, and the question needs to be asked what happens if
a cpu or memory node goes offline after one of these cpuset_*()
routines is called before the scheduler or allocator finishes doing
what it is doing with what it thought was a usable cpu or node.

Also, there is currently no hotplug aware code that I know of in
the kernel/cpuset.c code, so each of the references to the globals
cpu_online_map and node_online_map therein are suspect.  Presumably as
anywhere else in the kernel these appear, or the for_each_online_*()
macros appear, the code needs to be examined for perhaps needing some
sort of guard.

So, in sum, the kernel needs to be inspected at the location of at
least each of the following calls or references, for possible problems
if the cpus or nodes online change unexpectedly:

 any_online_node()
 cpu_online_map
 cpu_present()
 cpu_present_map
 cpuset_cpus_allowed()
 cpuset_mems_allowed()
 cpuset_update_task_memory_state()
 cpuset_zone_allowed()
 for_each_online_cpu()
 for_each_online_node()
 for_each_present_cpu()
 node_online_map
 num_online_cpus()
 num_online_nodes()
 num_present_cpus()

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  parent reply	other threads:[~2006-04-04  0:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060402213216.2e61b74e.akpm@osdl.org>
2006-04-03  5:12 ` Fw: 2.6.16 crashes when running numastat on p575 Christoph Lameter
2006-04-03  5:15   ` Paul Jackson
2006-04-03  5:25     ` Christoph Lameter
2006-04-03  6:43       ` Paul Jackson
2006-04-03 14:10       ` Nathan Lynch
2006-04-03 17:42         ` Christoph Lameter
2006-04-03 18:01           ` Nathan Lynch
2006-04-03 18:08             ` Christoph Lameter
2006-04-03 21:18               ` Andrew Morton
2006-04-04  0:25               ` Paul Jackson [this message]
2006-04-03 11:49     ` Andi Kleen
2006-04-03 14:18       ` Nathan Lynch
2006-04-03  8:09   ` Sonny Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060403172533.f03f1ba2.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ak@suse.com \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=ntl@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).