From: Paul Jackson <pj@sgi.com>
To: Anton Blanchard <anton@samba.org>
Cc: akpm@osdl.org, Simon.Derr@bull.net, linux-kernel@vger.kernel.org,
Andi Kleen <ak@suse.de>,
IWAMOTO Toshihiro <iwamoto@valinux.co.jp>,
Dave Hansen <haveblue@us.ibm.com>
Subject: Re: [Patch 4/4] cpusets top mask just online, not all possible
Date: Sat, 11 Sep 2004 10:07:31 -0700 [thread overview]
Message-ID: <20040911100731.2f400271.pj@sgi.com> (raw)
In-Reply-To: <20040911141001.GD32755@krispykreme>
I'm adding Andi Kleen, Iwamoto-san and Dave Hansen to the cc list, since
the numa code and other hotplug work might have similar considerations.
Anton asks:
> How does this change interact with CPU hotplug?
You beat my estimate ;). I figured it would be a day before someone
asked this question. It only took you six hours. Good.
Cpusets and hotplug (CPU or Memory) aren't friends, yet.
Cpusets builds up additional data structures, used to manage a tasks CPU
and Memory placement. If more CPUs or Memory are added later on,
cpusets won't know of them nor let you use them. If CPUs or Memory are
removed later on, cpusets will still think it is ok to use them, and
potentially starve a task if that tasks cpuset had been configured to
_only_ allow using the now departed CPU or Memory.
When the move_task_off_dead_cpu() code in kernel/sched.c catches this,
the following code can break cpuset exclusive semantics if it decides
that a task has to be allowed to run anywhere because none of the places
it had been allowed are online anymore:
kernel/sched.c: move_task_off_dead_cpu()
/* No more Mr. Nice Guy. */
if (dest_cpu == NR_CPUS) {
tsk->cpus_allowed = cpuset_cpus_allowed(tsk);
if (!cpus_intersects(tsk->cpus_allowed, cpu_online_map))
cpus_setall(tsk->cpus_allowed);
dest_cpu = any_online_cpu(tsk->cpus_allowed);
It wouldn't surprise me if Andi Kleen's numa code, especially the
MPOL_BIND which builds up special restricted zonelists holding only the
bound Memory Nodes, has the same sorts of interactions with Memory
hotplug. However, I have not given this suspicion any careful thought,
so could easily be wrong.
The CPU placement code, prior to cpusets, had just been horsing the
task->cpus_allowed field around, which the CPU hotplug guys have been
able to deal with, adding or modifying code such as the above. But the
numa Memory placement code (I suspect), and with cpusets now the CPU
placement code, build up additional structures that think they know
what's available and who can use it. This assumption is violated when
stuff is plugged in and out.
Here's my current best shot at how to deal with this:
1) For now, CONFIG_CPUSETS (and CONFIG_NUMA?) marked incompatible
with CONFIG_HOTPLUG.
2) Someday soon, the cpuset (and numa?) placement code needs to add an
internal kernel call that the hotplug code can call to inform the
placement code that a CPU or Memory resource has gone on or offline,
so that the placement code can "deal with it", somehow.
3) The way that I anticipate the cpuset code will "deal with it"
will be:
a] When a CPU or Memory is added, just add it to the top cpuset.
User code can take it from there, adding the new resource
to whatever lower level cpusets it wants to.
b] When a CPU or Memory is about to be removed, walk the cpuset
tree from the bottom up, removing the resource, and if
that causes a particular cpuset to become empty (no more
online CPU or no more online Memory), then automatically
reassign any task attached to that cpuset to its parent
cpuset, and remove the soon to be empty cpuset. If the
fine user doesn't like this default forced re-placement
to parent cpuset, they should have emptied that soon to be
useless cpuset of tasks before unplugging the hardware, and
attached those tasks to whatever cpuset met their fancy.
Or, if the removal could not have been anticipated, the
user code will just have to move tasks around after the
fact.
c] The hotplug code should never add a CPU or Memory to what
a task can use, without co-ordinating with the cpuset code.
So fallback code such as the above call to cpus_setall()
should involve cpusets somehow - whether asking it for the
fallback CPUs to use, or telling it that this task just got
force migrated to CPU_MASK_ALL - not sure which.
4) Andi and the memory hotplug folks will have to speak to the question
of whether this matters to the numa placement code, and what that
might mean.
Suggestions?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
next prev parent reply other threads:[~2004-09-11 17:09 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-11 8:28 [Patch 0/4] four small cpuset patches Paul Jackson
2004-09-11 8:28 ` [Patch 1/4] cpusets display allowed masks in proc status Paul Jackson
2004-09-11 8:28 ` [Patch 2/4] cpusets simplify cpus_allowed setting in attach Paul Jackson
2004-09-11 8:28 ` [Patch 3/4] cpusets remove useless validation check Paul Jackson
2004-09-11 8:28 ` [Patch 4/4] cpusets top mask just online, not all possible Paul Jackson
2004-09-11 14:10 ` Anton Blanchard
2004-09-11 17:07 ` Paul Jackson [this message]
2004-09-11 17:28 ` Dave Hansen
2004-09-12 2:21 ` Paul Jackson
2004-09-12 4:43 ` Dave Hansen
2004-09-12 5:35 ` Paul Jackson
2004-09-12 5:42 ` Paul Jackson
2004-09-11 17:39 ` Dave Hansen
2004-09-11 18:55 ` Andi Kleen
2004-09-12 2:29 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040911100731.2f400271.pj@sgi.com \
--to=pj@sgi.com \
--cc=Simon.Derr@bull.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=anton@samba.org \
--cc=haveblue@us.ibm.com \
--cc=iwamoto@valinux.co.jp \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox