From: James Bottomley <James.Bottomley@SteelEye.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: William Lee Irwin III <wli@holomorphy.com>,
Andrew Morton <akpm@osdl.org>,
Jesse Barnes <jbarnes@engr.sgi.com>,
Linus Torvalds <torvalds@osdl.org>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [sched] fix sched_domains hotplug bootstrap ordering vs. cpu_online_map issue
Date: 05 Sep 2004 18:35:16 -0400 [thread overview]
Message-ID: <1094423718.10976.27.camel@mulgrave> (raw)
In-Reply-To: <20040905114645.GA11422@elte.hu>
On Sun, 2004-09-05 at 07:46, Ingo Molnar wrote:
> > cpu_online_map is not set up at the time of sched domain
> > initialization when hotplug cpu paths are used for SMP booting. At
> > this phase of bootstrapping, cpu_possible_map can be used by the
> > various architectures using cpu hotplugging for SMP bootstrap, but the
> > manipulations of cpu_online_map done on behalf of NUMA architectures,
> > done indirectly via node_to_cpumask(), can't, because cpu_online_map
> > starts depopulated and hasn't yet been populated. On true NUMA
> > architectures this is a distinct cpumask_t from cpu_online_map and so
> > the unpatched code works on NUMA; on non-NUMA architectures the
> > definition of node_to_cpumask() this way breaks and would require an
> > invasive sweeping of users of node_to_cpumask() to change it to e.g.
> > cpu_possible_map, as cpu_possible_map is not suitable for use at
> > runtime as a substitute for cpu_online_map.
> >
> > Signed-off-by: William Irwin <wli@holomorphy.com>
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Well this patch got in, which is what I want, since it allows the
non-NUMA machines to work with hotplug CPUs again. However, is anyone
actually looking to fix this for real?
The fundamental problem is that NUMA or the scheduler (or both) are
broken with regard to hotplug.
The origin of the breakage is the differences between cpu_possible_map
and cpu_online_map. In hotplug CPU, there are two ways to do
initialisations: you can initialise from cpu_online_map, but then you
*must* have a cpu hotplug notify listener to add data structures for the
extra CPUs as they come on-line, or you can initialise from
cpu_possible_map and not bother with a notifier. The disadvantage of
the latter is that cpu_possible_map may be vastly larger than
cpu_online_map ever gets to, thus wasting valuable kernel memory.
The scheduler code is schizophrenic in this regard in that it does both:
it initialises static data structures from cpu_possible_map, but it also
has a hotplug cpu listener for starting things like the migration
threads.
I suspect the NUMA people would like us all to go to the former method
(initialise only from cpu_online_map and have a proper hotplug listener)
since their possible maps are pretty huge. However, which is it to be:
fix NUMA (to have two cpu_to_node() maps for the possible and online
cpus per node) or fix the scheduler to do initialisation correctly?
Perhaps this should be phased: change NUMA first temporarily for phase
one and then fix the scheduler (and everyone else initialising from
cpu_possible_map) in the second.
James
next prev parent reply other threads:[~2004-09-05 22:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-03 21:21 [Fwd: Re: SMP Panic caused by [PATCH] sched: consolidate sched domains] James Bottomley
2004-09-03 21:59 ` Andrew Morton
2004-09-03 22:13 ` James Bottomley
2004-09-03 22:22 ` William Lee Irwin III
[not found] ` <20040903153434.15719192.akpm@osdl.org>
2004-09-03 22:45 ` [sched] fix sched_domains hotplug bootstrap ordering vs. cpu_online_map issue William Lee Irwin III
2004-09-04 1:57 ` Nick Piggin
2004-09-05 11:46 ` Ingo Molnar
2004-09-05 22:35 ` James Bottomley [this message]
2004-09-06 2:48 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1094423718.10976.27.camel@mulgrave \
--to=james.bottomley@steeleye.com \
--cc=akpm@osdl.org \
--cc=jbarnes@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=torvalds@osdl.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox