* cpusets not cpu hotplug aware
@ 2006-08-21 13:27 Anton Blanchard
2006-08-21 13:54 ` Simon Derr
2006-08-21 17:43 ` Paul Jackson
0 siblings, 2 replies; 28+ messages in thread
From: Anton Blanchard @ 2006-08-21 13:27 UTC (permalink / raw)
To: pj, simon.derr; +Cc: nathanl, linux-kernel
Hi,
We have a bug report where sched_setaffinity fails for cpus that are
hotplug added after boot. Nathan found this suspicious piece of code:
void __init cpuset_init_smp(void)
{
top_cpuset.cpus_allowed = cpu_online_map;
top_cpuset.mems_allowed = node_online_map;
}
Any reason we statically set the top level cpuset to the boot time cpu
online map?
Anton
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: cpusets not cpu hotplug aware 2006-08-21 13:27 cpusets not cpu hotplug aware Anton Blanchard @ 2006-08-21 13:54 ` Simon Derr 2006-08-21 17:43 ` Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Simon Derr @ 2006-08-21 13:54 UTC (permalink / raw) To: Anton Blanchard; +Cc: pj, simon.derr, nathanl, linux-kernel On Mon, 21 Aug 2006, Anton Blanchard wrote: > > Hi, > > We have a bug report where sched_setaffinity fails for cpus that are > hotplug added after boot. Nathan found this suspicious piece of code: > > void __init cpuset_init_smp(void) > { > top_cpuset.cpus_allowed = cpu_online_map; > top_cpuset.mems_allowed = node_online_map; > } > > Any reason we statically set the top level cpuset to the boot time cpu > online map? > Are you suggesting that all possible cpus should be in the top cpuset, instead of all existing cpus ? I'd prefer the hotplugged CPUs being added to the top cpuset afterwards. Simon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-21 13:27 cpusets not cpu hotplug aware Anton Blanchard 2006-08-21 13:54 ` Simon Derr @ 2006-08-21 17:43 ` Paul Jackson 2006-08-21 19:21 ` Anton Blanchard 1 sibling, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-21 17:43 UTC (permalink / raw) To: Anton Blanchard; +Cc: simon.derr, nathanl, linux-kernel Anton wrote: > We have a bug report where sched_setaffinity fails for cpus that are > hotplug added after boot. Nathan found this suspicious piece of code: > > void __init cpuset_init_smp(void) > { > top_cpuset.cpus_allowed = cpu_online_map; > top_cpuset.mems_allowed = node_online_map; > } > > Any reason we statically set the top level cpuset to the boot time cpu > online map? Your query confuses me, about 4 different ways ... 1) What does sched_setaffinity have to do with this part of cpusets? 2) What did you mean by "statically assigned"? At boot, whatever cpus and memory nodes are online are copied to the top_cpuset's settings. As Simon suggests, it would be up to the hotplug/hotunplug folks to update these top_cpuset settings, as cpus and nodes come and go. 3) I don't understand what you thought was suspicious here. 4) I don't understand what you expected to see instead here. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-21 17:43 ` Paul Jackson @ 2006-08-21 19:21 ` Anton Blanchard 2006-08-21 20:42 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch 2006-08-21 21:01 ` cpusets not cpu hotplug aware Paul Jackson 0 siblings, 2 replies; 28+ messages in thread From: Anton Blanchard @ 2006-08-21 19:21 UTC (permalink / raw) To: Paul Jackson; +Cc: simon.derr, nathanl, linux-kernel Hi, > Your query confuses me, about 4 different ways ... > > 1) What does sched_setaffinity have to do with this part of cpusets? long sched_setaffinity(pid_t pid, cpumask_t new_mask) { ... cpus_allowed = cpuset_cpus_allowed(p); cpus_and(new_mask, new_mask, cpus_allowed); retval = set_cpus_allowed(p, new_mask); If cpuset_cpus_allowed doesnt return the current online mask and we want to schedule on a cpu that has been added since boot it looks like we will fail. > 2) What did you mean by "statically assigned"? At boot, whatever cpus > and memory nodes are online are copied to the top_cpuset's settings. > As Simon suggests, it would be up to the hotplug/hotunplug folks to > update these top_cpuset settings, as cpus and nodes come and go. Its up to the cpusets code to register a hotplug notifier to update the top_cpuset maps. > 3) I don't understand what you thought was suspicious here. > 4) I don't understand what you expected to see instead here. If the top level cpuset pointed to cpu_online_map instead of a boot time copy we wouldnt need any hotplug gunk in the cpusets code. Maybe the notifier is the right way to go, but it seems strange to create two copies of cpu_online_map (with the associated possibiliy of the two getting out of sync). Anton ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-21 19:21 ` Anton Blanchard @ 2006-08-21 20:42 ` Nathan Lynch 2006-08-21 23:04 ` Paul Jackson 2006-08-21 21:01 ` cpusets not cpu hotplug aware Paul Jackson 1 sibling, 1 reply; 28+ messages in thread From: Nathan Lynch @ 2006-08-21 20:42 UTC (permalink / raw) To: Anton Blanchard; +Cc: Paul Jackson, simon.derr, linux-kernel, Andrew Morton Anton Blanchard wrote: > > Hi, > > > Your query confuses me, about 4 different ways ... > > > > 1) What does sched_setaffinity have to do with this part of cpusets? > > long sched_setaffinity(pid_t pid, cpumask_t new_mask) > { > ... > cpus_allowed = cpuset_cpus_allowed(p); > cpus_and(new_mask, new_mask, cpus_allowed); > retval = set_cpus_allowed(p, new_mask); > > If cpuset_cpus_allowed doesnt return the current online mask and we want > to schedule on a cpu that has been added since boot it looks like we > will fail. > > > 2) What did you mean by "statically assigned"? At boot, whatever cpus > > and memory nodes are online are copied to the top_cpuset's settings. > > As Simon suggests, it would be up to the hotplug/hotunplug folks to > > update these top_cpuset settings, as cpus and nodes come and go. > > Its up to the cpusets code to register a hotplug notifier to update the > top_cpuset maps. I think a notifier is overkill, the top_cpuset mask should just be a copy of cpu_possible_map, which doesn't change after boot so we don't have to worry about keeping it in sync. cpuset_cpus_allowed already guarantees online cpus in the returned mask. > > 3) I don't understand what you thought was suspicious here. > > 4) I don't understand what you expected to see instead here. > > If the top level cpuset pointed to cpu_online_map instead of a boot time > copy we wouldnt need any hotplug gunk in the cpusets code. > > Maybe the notifier is the right way to go, but it seems strange to > create two copies of cpu_online_map (with the associated possibiliy of > the two getting out of sync). Agreed. Here's a patch: We have this code in sched_setaffinity: cpus_allowed = cpuset_cpus_allowed(p); cpus_and(new_mask, new_mask, cpus_allowed); retval = set_cpus_allowed(p, new_mask); In the default case (the task belongs to the default toplevel cpuset), cpuset_cpus_allowed always returns a copy of whatever cpu_online_map was at boot. When a new cpu is added to the system, the user is unable to bind tasks to the new cpu -- in the code above we wind up passing an empty cpumask to set_cpus_allowed and get -EINVAL. Since cpuset_cpus_allowed already ensures that the cpumask it returns reflects only online cpus, setting the default cpuset's cpus_allowed mask to cpu_possible_map (which never changes) ensures correct behavior. Signed-off-by: Nathan Lynch <ntl@pobox.com> --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c +++ cpuhp-sched_setaffinity/kernel/cpuset.c @@ -2041,7 +2041,7 @@ out: void __init cpuset_init_smp(void) { - top_cpuset.cpus_allowed = cpu_online_map; + top_cpuset.cpus_allowed = cpu_possible_map; top_cpuset.mems_allowed = node_online_map; } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-21 20:42 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch @ 2006-08-21 23:04 ` Paul Jackson 2006-08-21 23:27 ` Nathan Lynch 0 siblings, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-21 23:04 UTC (permalink / raw) To: Nathan Lynch; +Cc: anton, simon.derr, linux-kernel, akpm Nathan wrote: > - top_cpuset.cpus_allowed = cpu_online_map; > + top_cpuset.cpus_allowed = cpu_possible_map; NAQ While this seems sensible on systems not using cpusets (but still using kernels configured for cpusets), it is surprising on systems using cpusets, on which one expects the cpuset that a task is in to reflect the cpus that a task is allowed to use. A long time ago, this code actually had cpu_possible_map here, not cpu_online_map. That was changed, in the patch: http://lkml.org/lkml/2004/9/12/104 [Patch] cpusets interoperate with hotplug online maps Lets discuss this further. See further my previous reply to Anton. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-21 23:04 ` Paul Jackson @ 2006-08-21 23:27 ` Nathan Lynch 2006-08-22 4:42 ` Paul Jackson 0 siblings, 1 reply; 28+ messages in thread From: Nathan Lynch @ 2006-08-21 23:27 UTC (permalink / raw) To: Paul Jackson; +Cc: anton, simon.derr, linux-kernel, akpm Paul Jackson wrote: > Nathan wrote: > > - top_cpuset.cpus_allowed = cpu_online_map; > > + top_cpuset.cpus_allowed = cpu_possible_map; > > NAQ > > While this seems sensible on systems not using cpusets (but still using > kernels configured for cpusets), it is surprising on systems using > cpusets, on which one expects the cpuset that a task is in to reflect > the cpus that a task is allowed to use. Will it actually break anything? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-21 23:27 ` Nathan Lynch @ 2006-08-22 4:42 ` Paul Jackson 2006-08-22 5:07 ` Andrew Morton 0 siblings, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-22 4:42 UTC (permalink / raw) To: Nathan Lynch; +Cc: anton, simon.derr, linux-kernel, akpm > Will it actually break anything? It will break any user code that thought it could actually run on the CPUs listed in its cpuset, if it happens to be in a task in the top cpuset. I'm pretty sure I've got some cpuset users who would be harmed by this change. One degree of freedom for change we do have is that, as best I know, no one is doing anything serious with both cpu hotplug/unplug and with cpusets at the same time on the same system. For instance, I am willing to wager that no one is counting on the CPUs in the top cpuset remaining constant if a CPU comes on or off line. I say this because so far as I know, serious cpuset users aren't taking CPUs on and off line. I've rather been expecting cpusets and cpu hotplug to butt heads for a couple of years now. Looks like the time has come. So if I'm right, we could change the API to have the top_cpuset cpus_allowed track the cpus_online_map dynamically, rather than being a static copy of the cpus_online_map value at system boot, with little or no negative impact on users. For adding CPUs, that is easy enough, using a register_cpu_notifier callback to add new CPUs to top_cpuset.cpus_allowed. This may seem like overkill to Nathan, but I think it is necessary, to keep the top cpusets CPUs tracking what's online, rather than changing it to what's possible (which can be a -much- bigger set of CPUs on some configurations, and likely surprising to existing cpuset-aware code.) Automatically adding newly onlined CPUs to just the top cpuset (but not to other cpusets) does treat the top cpuset as a special case, but I doubt this will surprise existing cpuset aware code, and corresponds nicely to what hotplug aware, cpuset clueless code will expect. For removing CPUs, this is a bit harder, as one cannot remove a CPU from a cpuset without first removing it from any child cpusets of that cpuset. So I will need to scan the cpuset hierarchy, from the bottom up, removing the CPU about to be offline'd, and in the case that this was the last CPU in a cpuset, finding some fallback setting for that cpuset. I suspect that means moving any poor tasks left in that soon to be useless (no CPUs) cpuset into the parent cpuset, then empty'ing the cpus_allowed for that cpuset. If the user doesn't like this default action, they should have done what they wanted first, by adapting their cpuset configuration in anticipation of taking the CPU offline. Taking CPUs offline will likely surprise (as in break) existing cpuset aware code, but I don't know any way around that. In any event, as a workaround with existing kernels, I suspect you could make use of the existing /sbin/hotplug mechanism to run the (bash syntax) following commands to add a newly online CPU $cpu to the top cpuset's cpus: test -d /dev/cpuset || mkdir /dev/cpuset test -f /dev/cpuset/cpus || mount -t cpuset cpuset /dev/cpuset /bin/echo $(</dev/cpuset/cpus),$cpu > /dev/cpuset/cpus This workaround presumes that the number of the CPU being added, $cpu, is visible to the user level hotplug script - offhand I don't know if that is so or not. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-22 4:42 ` Paul Jackson @ 2006-08-22 5:07 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2006-08-22 5:07 UTC (permalink / raw) To: Paul Jackson; +Cc: Nathan Lynch, anton, simon.derr, linux-kernel On Mon, 21 Aug 2006 21:42:24 -0700 Paul Jackson <pj@sgi.com> wrote: > > Will it actually break anything? > > It will break any user code that thought it could actually run on the > CPUs listed in its cpuset, if it happens to be in a task in the top > cpuset. I'm pretty sure I've got some cpuset users who would be > harmed by this change. > > One degree of freedom for change we do have is that, as best I know, > no one is doing anything serious with both cpu hotplug/unplug and with > cpusets at the same time on the same system. For instance, I am > willing to wager that no one is counting on the CPUs in the top cpuset > remaining constant if a CPU comes on or off line. I say this because > so far as I know, serious cpuset users aren't taking CPUs on and > off line. I've rather been expecting cpusets and cpu hotplug to > butt heads for a couple of years now. Looks like the time has come. > > So if I'm right, we could change the API to have the top_cpuset > cpus_allowed track the cpus_online_map dynamically, rather than being a > static copy of the cpus_online_map value at system boot, with little > or no negative impact on users. > > For adding CPUs, that is easy enough, using a register_cpu_notifier > callback to add new CPUs to top_cpuset.cpus_allowed. register_hotcpu_notifier(), please. That exists so all the cpu-hotplug goop can be put inside #ifdef CONFIG_HOTPLUG_CPU, > This may seem > like overkill to Nathan, but I think it is necessary, to keep the > top cpusets CPUs tracking what's online, rather than changing it to > what's possible (which can be a -much- bigger set of CPUs on some > configurations, and likely surprising to existing cpuset-aware code.) > Automatically adding newly onlined CPUs to just the top cpuset (but not > to other cpusets) does treat the top cpuset as a special case, but I > doubt this will surprise existing cpuset aware code, and corresponds > nicely to what hotplug aware, cpuset clueless code will expect. > > For removing CPUs, this is a bit harder, as one cannot remove a CPU > from a cpuset without first removing it from any child cpusets of > that cpuset. So I will need to scan the cpuset hierarchy, from the > bottom up, removing the CPU about to be offline'd, and in the case that > this was the last CPU in a cpuset, finding some fallback setting for > that cpuset. I suspect that means moving any poor tasks left in that > soon to be useless (no CPUs) cpuset into the parent cpuset, then > empty'ing the cpus_allowed for that cpuset. If the user doesn't like > this default action, they should have done what they wanted first, by > adapting their cpuset configuration in anticipation of taking the CPU > offline. Taking CPUs offline will likely surprise (as in break) > existing cpuset aware code, but I don't know any way around that. > > In any event, as a workaround with existing kernels, I suspect you could > make use of the existing /sbin/hotplug mechanism to run the (bash syntax) > following commands to add a newly online CPU $cpu to the top cpuset's cpus: > > test -d /dev/cpuset || mkdir /dev/cpuset > test -f /dev/cpuset/cpus || mount -t cpuset cpuset /dev/cpuset > /bin/echo $(</dev/cpuset/cpus),$cpu > /dev/cpuset/cpus > > This workaround presumes that the number of the CPU being added, $cpu, > is visible to the user level hotplug script - offhand I don't know if > that is so or not. Please, let's get this into the udev tarball and let it trickle out. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-21 19:21 ` Anton Blanchard 2006-08-21 20:42 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch @ 2006-08-21 21:01 ` Paul Jackson 2006-08-22 4:51 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-21 21:01 UTC (permalink / raw) To: Anton Blanchard; +Cc: simon.derr, nathanl, linux-kernel Anton wrote: > Maybe the notifier is the right way to go, but it seems strange to > create two copies of cpu_online_map (with the associated possibiliy of > the two getting out of sync). Every cpuset in the system, of which this top_cpuset is just the first, has a set of cpus and memory nodes on which tasks in that cpuset are allowed to operate. It's not just top_cpuset that we need to understand how to relate to hotplug and unplug. I'll bet there is more hidden state in mm/mempolicy.c, for mbind() and set_mempolicy(), and in kernel/sched.c for the sched_setaffinity(), which was derived from what memory nodes or cpus were online. For example, I see several fields in 'struct mempolicy' that contain node numbers in some form, and the 'cpus_allowed' field in the task struct that sched_setaffinity sets. How does hotplug and unplug interact with these various saved states? > Its up to the cpusets code to register a hotplug notifier to update the > top_cpuset maps. That, or user level code, when it adds or removes a cpu or a memory node, needs to be responsible for adding or removing that cpu or node to or from whichever cpusets are affected. For example, if you just added cpu 31, to a system that had been running on cpus 0 to 30, you can add cpu 31 to the top cpuset by doing: mkdir /dev/cpuset # if not already done mount -t cpuset cpuset /dev/cpuset # if not already done /bin/echo 0-31 > /dev/cpsuet/cpus > If cpuset_cpus_allowed doesnt return the current online mask and we want > to schedule on a cpu that has been added since boot it looks like we > will fail. In general, on systems actually using cpusets, that -is- what should happen. Just because a cpu was brought online doesn't mean it was intended to be allowed in any given tasks current cpuset. Granted, I would guess users of systems not using cpusets (but still have cpusets configured in their kernel, as is common in some distro kernels), would expect the behaviour you expected - bringing a cpu (or memory node) on or offline would make it available (or not) for something like a sched_setaffinity (or mbind/set_mempolicy) immediately, without having to invoke some magic cpuset voodoo. Offhand, this sounds to me like a choice of two modes of operation. If you aren't actually using cpusets (so every task is in the original top_cpuset) then you'd expect that cpuset to "get out of the way", meaning top_cpuset (the only cpuset, in this case) tracked whatever cpus and nodes were online at the moment. If instead you start messing with cpusets (creating more than one of them and moving tasks between them) then you'd expect cpusets to be enforced, without automatically adding newly added cpus or memory nodes to existing cpusets. Only the user knows which cpusets should get the added cpus or memory nodes in this case. I don't jump for joy over yet another modal state flag. But I don't see a better alternative -- do you? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-21 21:01 ` cpusets not cpu hotplug aware Paul Jackson @ 2006-08-22 4:51 ` Andrew Morton 2006-08-22 5:04 ` Nathan Lynch 2006-08-22 5:06 ` Paul Jackson 0 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2006-08-22 4:51 UTC (permalink / raw) To: Paul Jackson; +Cc: Anton Blanchard, simon.derr, nathanl, linux-kernel On Mon, 21 Aug 2006 14:01:48 -0700 Paul Jackson <pj@sgi.com> wrote: > Anton wrote: > > > Maybe the notifier is the right way to go, but it seems strange to > > create two copies of cpu_online_map (with the associated possibiliy of > > the two getting out of sync). > > Every cpuset in the system, of which this top_cpuset is just the first, > has a set of cpus and memory nodes on which tasks in that cpuset are > allowed to operate. It's not just top_cpuset that we need to understand > how to relate to hotplug and unplug. > > I'll bet there is more hidden state in mm/mempolicy.c, for mbind() > and set_mempolicy(), and in kernel/sched.c for the sched_setaffinity(), > which was derived from what memory nodes or cpus were online. For > example, I see several fields in 'struct mempolicy' that contain > node numbers in some form, and the 'cpus_allowed' field in the task > struct that sched_setaffinity sets. > > How does hotplug and unplug interact with these various saved states? > > > Its up to the cpusets code to register a hotplug notifier to update the > > top_cpuset maps. > > That, or user level code, when it adds or removes a cpu or a memory > node, needs to be responsible for adding or removing that cpu or node > to or from whichever cpusets are affected. > > For example, if you just added cpu 31, to a system that had been > running on cpus 0 to 30, you can add cpu 31 to the top cpuset by > doing: > > mkdir /dev/cpuset # if not already done > mount -t cpuset cpuset /dev/cpuset # if not already done > /bin/echo 0-31 > /dev/cpsuet/cpus > > > If cpuset_cpus_allowed doesnt return the current online mask and we want > > to schedule on a cpu that has been added since boot it looks like we > > will fail. > > In general, on systems actually using cpusets, that -is- what should > happen. Just because a cpu was brought online doesn't mean it was > intended to be allowed in any given tasks current cpuset. > > Granted, I would guess users of systems not using cpusets (but > still have cpusets configured in their kernel, as is common in some > distro kernels), would expect the behaviour you expected - bringing > a cpu (or memory node) on or offline would make it available (or > not) for something like a sched_setaffinity (or mbind/set_mempolicy) > immediately, without having to invoke some magic cpuset voodoo. > > Offhand, this sounds to me like a choice of two modes of operation. > > If you aren't actually using cpusets (so every task is in the > original top_cpuset) then you'd expect that cpuset to "get out > of the way", meaning top_cpuset (the only cpuset, in this case) > tracked whatever cpus and nodes were online at the moment. > > If instead you start messing with cpusets (creating more than one > of them and moving tasks between them) then you'd expect cpusets > to be enforced, without automatically adding newly added cpus or > memory nodes to existing cpusets. Only the user knows which > cpusets should get the added cpus or memory nodes in this case. > > I don't jump for joy over yet another modal state flag. But I don't see > a better alternative -- do you? > If the kernel provider (ie: distro) has enabled cpusets then it would be appropriate that they also provide a hotplug script which detects whether their user is actually using cpusets and if not, to take some sensible default action. ie: add the newly-added CPU to the system's single cpuset, no? iow: perhaps send a patch against the upstream udev package. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 4:51 ` Andrew Morton @ 2006-08-22 5:04 ` Nathan Lynch 2006-08-22 5:11 ` Andrew Morton 2006-08-22 5:14 ` Paul Jackson 2006-08-22 5:06 ` Paul Jackson 1 sibling, 2 replies; 28+ messages in thread From: Nathan Lynch @ 2006-08-22 5:04 UTC (permalink / raw) To: Andrew Morton Cc: Paul Jackson, Anton Blanchard, simon.derr, nathanl, linux-kernel Andrew Morton wrote: > On Mon, 21 Aug 2006 14:01:48 -0700 > Paul Jackson <pj@sgi.com> wrote: > > Anton wrote: > > > If cpuset_cpus_allowed doesnt return the current online mask and we want > > > to schedule on a cpu that has been added since boot it looks like we > > > will fail. > > > > In general, on systems actually using cpusets, that -is- what should > > happen. Just because a cpu was brought online doesn't mean it was > > intended to be allowed in any given tasks current cpuset. > > > > Granted, I would guess users of systems not using cpusets (but > > still have cpusets configured in their kernel, as is common in some > > distro kernels), would expect the behaviour you expected - bringing > > a cpu (or memory node) on or offline would make it available (or > > not) for something like a sched_setaffinity (or mbind/set_mempolicy) > > immediately, without having to invoke some magic cpuset voodoo. > > > > Offhand, this sounds to me like a choice of two modes of operation. > > > > If you aren't actually using cpusets (so every task is in the > > original top_cpuset) then you'd expect that cpuset to "get out > > of the way", meaning top_cpuset (the only cpuset, in this case) > > tracked whatever cpus and nodes were online at the moment. > > > > If instead you start messing with cpusets (creating more than one > > of them and moving tasks between them) then you'd expect cpusets > > to be enforced, without automatically adding newly added cpus or > > memory nodes to existing cpusets. Only the user knows which > > cpusets should get the added cpus or memory nodes in this case. > > > > I don't jump for joy over yet another modal state flag. But I don't see > > a better alternative -- do you? > > > > If the kernel provider (ie: distro) has enabled cpusets then it would be > appropriate that they also provide a hotplug script which detects whether their > user is actually using cpusets and if not, to take some sensible default action. > ie: add the newly-added CPU to the system's single cpuset, no? I think it would be more sensible for the default (i.e. user hasn't explicitly configured any cpusets) behavior on a CONFIG_CPUSETS=y kernel to match the behavior with a CONFIG_CPUSETS=n kernel. Right now we don't have that. Binding a task to a newly added cpu just fails if CONFIG_CPUSETS=y, but it works if CONFIG_CPUSETS=n. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:04 ` Nathan Lynch @ 2006-08-22 5:11 ` Andrew Morton 2006-08-22 5:14 ` Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Andrew Morton @ 2006-08-22 5:11 UTC (permalink / raw) To: Nathan Lynch Cc: Paul Jackson, Anton Blanchard, simon.derr, nathanl, linux-kernel On Tue, 22 Aug 2006 00:04:01 -0500 Nathan Lynch <ntl@pobox.com> wrote: > > If the kernel provider (ie: distro) has enabled cpusets then it would be > > appropriate that they also provide a hotplug script which detects whether their > > user is actually using cpusets and if not, to take some sensible default action. > > ie: add the newly-added CPU to the system's single cpuset, no? > > I think it would be more sensible for the default (i.e. user hasn't > explicitly configured any cpusets) behavior on a CONFIG_CPUSETS=y > kernel to match the behavior with a CONFIG_CPUSETS=n kernel. Well... why? If he-who-configured the kernel chose cpusets then it's reasonable to expect he-who-configures to have appropriate userspace support for that kernel. > Right > now we don't have that. Binding a task to a newly added cpu just > fails if CONFIG_CPUSETS=y, but it works if CONFIG_CPUSETS=n. Sure. That's because we're not taking appropriate (ie any) action when a CPU is added. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:04 ` Nathan Lynch 2006-08-22 5:11 ` Andrew Morton @ 2006-08-22 5:14 ` Paul Jackson 2006-08-23 22:11 ` Nathan Lynch 1 sibling, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-22 5:14 UTC (permalink / raw) To: Nathan Lynch; +Cc: akpm, anton, simon.derr, nathanl, linux-kernel Nathan wrote: > I think it would be more sensible for the default (i.e. user hasn't > explicitly configured any cpusets) behavior on a CONFIG_CPUSETS=y > kernel to match the behavior with a CONFIG_CPUSETS=n kernel. Basically, in this situation, that would mean that the code that masks a requested sched_setaffinity cpumask with the tasks cpuset_cpus_allowed(): cpus_allowed = cpuset_cpus_allowed(p); cpus_and(new_mask, new_mask, cpus_allowed); would not be executed in the case of a kernel configured for cpusets, when running on a system that wasn't using cpusets. That's quite doable, and for systems not actually using cpusets, makes good sense. But it makes a bit of discontinuity in the system behaviour when someone starts using cpusets. If someone makes a cpuset, then suddenly tasks in the top cpuset start seeing failed sched_setaffinity calls for any CPU that was brought online after system boot. If there is some decent way I can get the cpus_allowed of the top cpuset to track the cpu_online_map, then we avoid this discontinuity in system behaviour. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:14 ` Paul Jackson @ 2006-08-23 22:11 ` Nathan Lynch 2006-08-23 22:39 ` Paul Jackson 2006-08-23 23:49 ` Nathan Lynch 0 siblings, 2 replies; 28+ messages in thread From: Nathan Lynch @ 2006-08-23 22:11 UTC (permalink / raw) To: Paul Jackson; +Cc: akpm, anton, simon.derr, nathanl, linux-kernel Paul Jackson wrote: > Nathan wrote: > > I think it would be more sensible for the default (i.e. user hasn't > > explicitly configured any cpusets) behavior on a CONFIG_CPUSETS=y > > kernel to match the behavior with a CONFIG_CPUSETS=n kernel. > > Basically, in this situation, that would mean that the code that > masks a requested sched_setaffinity cpumask with the tasks > cpuset_cpus_allowed(): > > cpus_allowed = cpuset_cpus_allowed(p); > cpus_and(new_mask, new_mask, cpus_allowed); > > would not be executed in the case of a kernel configured for cpusets, > when running on a system that wasn't using cpusets. > > That's quite doable, and for systems not actually using cpusets, makes > good sense. > > But it makes a bit of discontinuity in the system behaviour when > someone starts using cpusets. If someone makes a cpuset, then suddenly > tasks in the top cpuset start seeing failed sched_setaffinity calls > for any CPU that was brought online after system boot. > > If there is some decent way I can get the cpus_allowed of the top > cpuset to track the cpu_online_map, then we avoid this discontinuity > in system behaviour. How about this? I've verified it fixes the issue but I'm nervous about the locking. --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c +++ cpuhp-sched_setaffinity/kernel/cpuset.c @@ -2033,6 +2033,31 @@ out: return err; } +static int cpuset_handle_cpuhp(struct notifier_block *nb, + unsigned long phase, void *_cpu) +{ + unsigned long cpu = (unsigned long)_cpu; + + mutex_lock(&manage_mutex); + lock_cpu_hotplug(); + mutex_lock(&callback_mutex); + + switch (phase) { + case CPU_ONLINE: + cpu_set(cpu, top_cpuset.cpus_allowed); + break; + case CPU_DEAD: + cpu_clear(cpu, top_cpuset.cpus_allowed); + break; + } + + mutex_unlock(&callback_mutex); + unlock_cpu_hotplug(); + mutex_unlock(&manage_mutex); + + return 0; +} + /** * cpuset_init_smp - initialize cpus_allowed * @@ -2043,6 +2068,8 @@ void __init cpuset_init_smp(void) { top_cpuset.cpus_allowed = cpu_online_map; top_cpuset.mems_allowed = node_online_map; + + hotcpu_notifier(cpuset_handle_cpuhp, 0); } /** ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 22:11 ` Nathan Lynch @ 2006-08-23 22:39 ` Paul Jackson 2006-08-23 23:39 ` Nathan Lynch 2006-08-23 23:49 ` Nathan Lynch 1 sibling, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-23 22:39 UTC (permalink / raw) To: Nathan Lynch; +Cc: akpm, anton, simon.derr, nathanl, linux-kernel Nathan wrote: > How about this? The code likely works, and the locking seems ok at first blush. And this patch seems to match just what I asked for ;). But the more I think about this, the less I like this direction. Your patch, and what I initially asked for, impose a policy and create a side affect. When you bring a cpu online, the top cpuset changes as a side affect, in order to impose a policy that the top cpuset tracks what is online. The kernel should avoid such side affects and avoid imposing policy. It should be user code that imposes the policy that the top cpuset tracks what is online. The kernel gets things going with reasonable basic defaults at system boot, then adapts to whatever user space mandates from then on. Kernels should provide generic, orthogonal mechanisms. Let user space figure out what it wants to do with them. It is not a kernel bug that the top cpuset doesn't track what is online. It would be a kernel bug if the top cpuset didn't allow just exactly whatever cpus the user space code told it to allow. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 22:39 ` Paul Jackson @ 2006-08-23 23:39 ` Nathan Lynch 2006-08-24 2:19 ` Paul Jackson 0 siblings, 1 reply; 28+ messages in thread From: Nathan Lynch @ 2006-08-23 23:39 UTC (permalink / raw) To: Paul Jackson; +Cc: akpm, anton, simon.derr, linux-kernel Paul Jackson wrote: > Nathan wrote: > > How about this? > > The code likely works, and the locking seems ok at first blush. > And this patch seems to match just what I asked for ;). > > But the more I think about this, the less I like this direction. > > Your patch, and what I initially asked for, impose a policy and create > a side affect. When you bring a cpu online, the top cpuset changes as > a side affect, in order to impose a policy that the top cpuset tracks > what is online. > > The kernel should avoid such side affects and avoid imposing policy. If you want to call that a policy that's fine, but it hardly seems to be a non-intuitive behavior to me, especially in the case that the administrator has not explicitly configured any cpusets. > It should be user code that imposes the policy that the top cpuset > tracks what is online. > > The kernel gets things going with reasonable basic defaults at system > boot, then adapts to whatever user space mandates from then on. > > Kernels should provide generic, orthogonal mechanisms. Maybe the cpuset code should just stay out of the way unless the admin has instantiated one? > Let user space figure out what it wants to do with them. The user who initially reported this probably has no idea what cpusets are or how to deal with the situation. > It is not a kernel bug that the top cpuset doesn't track what is > online. It's a bug that one's ability to bind a task to a new cpu is entirely dependent on whether you know your way around cpusets. Doesn't sound orthogonal to me. Try looking at it this way. You have an application that works on distro x, where cpu hotplug is supported but cpusets is not enabled in the kernel config. That application is cpu hotplug-aware, and binding a task to a new cpu just works. Now, with distro x+1, cpusets is enabled and that same scenario doesn't work anymore. That's generally viewed as a regression. And no, I don't like pushing the responsibility for fixing up the top-level cpuset out to userspace -- that would force apps to busy-wait (and for how long?) for the new cpu to be added to the top-level cpuset by the hotplug helper. It means something as simple as this: # echo 1 > /sys/devices/system/cpu/cpu3/online # taskset 0x8 foo has a race condition, depending on your kernel's configuration. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 23:39 ` Nathan Lynch @ 2006-08-24 2:19 ` Paul Jackson 0 siblings, 0 replies; 28+ messages in thread From: Paul Jackson @ 2006-08-24 2:19 UTC (permalink / raw) To: Nathan Lynch; +Cc: akpm, anton, simon.derr, linux-kernel Nathan wrote: > # echo 1 > /sys/devices/system/cpu/cpu3/online > # taskset 0x8 foo > > has a race condition, depending on your kernel's configuration. Good point. That's not a nice race to force on users. I relent -- not udev. > Maybe the cpuset code should just stay out of the way unless the admin > has instantiated one? I really don't like where such special case, modal behaviour leads us. Let's say we have two systems, side by side. Both have been up for many months. On one of these systems, one time, a few months ago, the sysadmin briefly made and removed a cpuset: mkdir /dev/cpuset mount -t cpuset cpuset /dev/cpuset mkdir /dev/cpuset/foo rmdir /dev/cpuset/foo umount /dev/cpuset rmdir /dev/cpuset On the other of these systems, the sysadmin never did any such thing. These two systems are now identical in every other aspect that might matter to this discussion. The sched_setaffinity call should behave the same on these two systems. If the kernel has to impose a trivial bit of policy (such as forcing the top cpuset to track what's online) in order to provide uniformally consistent (not modal) and race free behaviour, then it should. And besides, as someone else noted, it's alot easier than dealing with udev ;). Conclusion - the kernel should simply force the top_cpuset to track the online maps. See further my response to your patch. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 22:11 ` Nathan Lynch 2006-08-23 22:39 ` Paul Jackson @ 2006-08-23 23:49 ` Nathan Lynch 2006-08-24 0:12 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Nathan Lynch @ 2006-08-23 23:49 UTC (permalink / raw) To: Paul Jackson; +Cc: akpm, anton, simon.derr, nathanl, linux-kernel Nathan Lynch wrote: > Paul Jackson wrote: > > > > If there is some decent way I can get the cpus_allowed of the top > > cpuset to track the cpu_online_map, then we avoid this discontinuity > > in system behaviour. > > > How about this? I've verified it fixes the issue but I'm nervous > about the locking. > > > --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c > +++ cpuhp-sched_setaffinity/kernel/cpuset.c > @@ -2033,6 +2033,31 @@ out: > return err; > } > > +static int cpuset_handle_cpuhp(struct notifier_block *nb, > + unsigned long phase, void *_cpu) > +{ > + unsigned long cpu = (unsigned long)_cpu; > + > + mutex_lock(&manage_mutex); > + lock_cpu_hotplug(); > + mutex_lock(&callback_mutex); > + > + switch (phase) { > + case CPU_ONLINE: > + cpu_set(cpu, top_cpuset.cpus_allowed); > + break; > + case CPU_DEAD: > + cpu_clear(cpu, top_cpuset.cpus_allowed); > + break; > + } > + > + mutex_unlock(&callback_mutex); > + unlock_cpu_hotplug(); > + mutex_unlock(&manage_mutex); > + > + return 0; > +} Actually the lock/unlock_cpu_hotplug aren't necessary, cpu_add_remove_lock is already held in this context. Update cpus_allowed in top_cpuset when cpus are hotplugged; otherwise binding a task to a newly hotplugged cpu fails since the toplevel cpuset has a static copy of whatever cpu_online_map was at boot time. Signed-off-by: Nathan Lynch <ntl@pobox.com> --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c +++ cpuhp-sched_setaffinity/kernel/cpuset.c @@ -2033,6 +2033,29 @@ out: return err; } +static int cpuset_handle_cpuhp(struct notifier_block *nb, + unsigned long phase, void *_cpu) +{ + unsigned long cpu = (unsigned long)_cpu; + + mutex_lock(&manage_mutex); + mutex_lock(&callback_mutex); + + switch (phase) { + case CPU_ONLINE: + cpu_set(cpu, top_cpuset.cpus_allowed); + break; + case CPU_DEAD: + cpu_clear(cpu, top_cpuset.cpus_allowed); + break; + } + + mutex_unlock(&callback_mutex); + mutex_unlock(&manage_mutex); + + return 0; +} + /** * cpuset_init_smp - initialize cpus_allowed * @@ -2043,6 +2066,8 @@ void __init cpuset_init_smp(void) { top_cpuset.cpus_allowed = cpu_online_map; top_cpuset.mems_allowed = node_online_map; + + hotcpu_notifier(cpuset_handle_cpuhp, 0); } /** ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 23:49 ` Nathan Lynch @ 2006-08-24 0:12 ` Andrew Morton 2006-08-24 0:48 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch 2006-08-24 2:54 ` cpusets not cpu hotplug aware Paul Jackson 0 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2006-08-24 0:12 UTC (permalink / raw) To: Nathan Lynch; +Cc: Paul Jackson, anton, simon.derr, nathanl, linux-kernel On Wed, 23 Aug 2006 18:49:53 -0500 Nathan Lynch <ntl@pobox.com> wrote: > Update cpus_allowed in top_cpuset when cpus are hotplugged; > otherwise binding a task to a newly hotplugged cpu fails since the > toplevel cpuset has a static copy of whatever cpu_online_map was at > boot time. > I must say, that's a pretty simple patch. It beats dealing with udev developers ;) > > --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c > +++ cpuhp-sched_setaffinity/kernel/cpuset.c > @@ -2033,6 +2033,29 @@ out: > return err; > } > > +static int cpuset_handle_cpuhp(struct notifier_block *nb, > + unsigned long phase, void *_cpu) > +{ > + unsigned long cpu = (unsigned long)_cpu; > + > + mutex_lock(&manage_mutex); > + mutex_lock(&callback_mutex); > + > + switch (phase) { > + case CPU_ONLINE: > + cpu_set(cpu, top_cpuset.cpus_allowed); > + break; > + case CPU_DEAD: > + cpu_clear(cpu, top_cpuset.cpus_allowed); > + break; > + } > + > + mutex_unlock(&callback_mutex); > + mutex_unlock(&manage_mutex); > + > + return 0; > +} > + The above needs #ifdef CONFIG_HOTPLUG_CPU wrappers. > /** > * cpuset_init_smp - initialize cpus_allowed > * > @@ -2043,6 +2066,8 @@ void __init cpuset_init_smp(void) > { > top_cpuset.cpus_allowed = cpu_online_map; > top_cpuset.mems_allowed = node_online_map; > + > + hotcpu_notifier(cpuset_handle_cpuhp, 0); > } > > /** ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] cpuset code prevents binding tasks to new cpus 2006-08-24 0:12 ` Andrew Morton @ 2006-08-24 0:48 ` Nathan Lynch 2006-08-24 2:54 ` cpusets not cpu hotplug aware Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Nathan Lynch @ 2006-08-24 0:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Paul Jackson, anton, simon.derr, nathanl, linux-kernel Andrew Morton wrote: > On Wed, 23 Aug 2006 18:49:53 -0500 > Nathan Lynch <ntl@pobox.com> wrote: > > > > > +static int cpuset_handle_cpuhp(struct notifier_block *nb, > > + unsigned long phase, void *_cpu) > > +{ > > + unsigned long cpu = (unsigned long)_cpu; > > + > > + mutex_lock(&manage_mutex); > > + mutex_lock(&callback_mutex); > > + > > + switch (phase) { > > + case CPU_ONLINE: > > + cpu_set(cpu, top_cpuset.cpus_allowed); > > + break; > > + case CPU_DEAD: > > + cpu_clear(cpu, top_cpuset.cpus_allowed); > > + break; > > + } > > + > > + mutex_unlock(&callback_mutex); > > + mutex_unlock(&manage_mutex); > > + > > + return 0; > > +} > > + > > The above needs #ifdef CONFIG_HOTPLUG_CPU wrappers. Fixed, thanks. Update cpus_allowed in top_cpuset when cpus are hotplugged; otherwise binding a task to a newly hotplugged cpu fails since the toplevel cpuset has a static copy of whatever cpu_online_map was at boot time. Signed-off-by: Nathan Lynch <ntl@pobox.com> --- cpuhp-sched_setaffinity.orig/kernel/cpuset.c +++ cpuhp-sched_setaffinity/kernel/cpuset.c @@ -2033,6 +2033,31 @@ out: return err; } +#ifdef CONFIG_HOTPLUG_CPU +static int cpuset_handle_cpuhp(struct notifier_block *nb, + unsigned long phase, void *_cpu) +{ + unsigned long cpu = (unsigned long)_cpu; + + mutex_lock(&manage_mutex); + mutex_lock(&callback_mutex); + + switch (phase) { + case CPU_ONLINE: + cpu_set(cpu, top_cpuset.cpus_allowed); + break; + case CPU_DEAD: + cpu_clear(cpu, top_cpuset.cpus_allowed); + break; + } + + mutex_unlock(&callback_mutex); + mutex_unlock(&manage_mutex); + + return 0; +} +#endif + /** * cpuset_init_smp - initialize cpus_allowed * @@ -2043,6 +2068,8 @@ void __init cpuset_init_smp(void) { top_cpuset.cpus_allowed = cpu_online_map; top_cpuset.mems_allowed = node_online_map; + + hotcpu_notifier(cpuset_handle_cpuhp, 0); } /** ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-24 0:12 ` Andrew Morton 2006-08-24 0:48 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch @ 2006-08-24 2:54 ` Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Paul Jackson @ 2006-08-24 2:54 UTC (permalink / raw) To: Andrew Morton; +Cc: ntl, anton, simon.derr, nathanl, linux-kernel Simon, Take note below of my proposal to make the 'cpus' and 'mems' of the top cpuset read-only. Holler if that hurts. Nathan wrote: > +static int cpuset_handle_cpuhp(struct notifier_block *nb, > + unsigned long phase, void *_cpu) > +{ > + unsigned long cpu = (unsigned long)_cpu; > + > + mutex_lock(&manage_mutex); > + mutex_lock(&callback_mutex); > + > + switch (phase) { > + case CPU_ONLINE: > + cpu_set(cpu, top_cpuset.cpus_allowed); > + break; > + case CPU_DEAD: > + cpu_clear(cpu, top_cpuset.cpus_allowed); > + break; > + } > + > + mutex_unlock(&callback_mutex); > + mutex_unlock(&manage_mutex); > + > + return 0; > +} > + Andrew commented: > I must say, that's a pretty simple patch. Not simple enough ;). How about (uncompiled, untested, unanything): ================================================================= /* * The top_cpuset tracks what CPUs and Memory Nodes are online, * period. This is necessary in order to make cpusets transparent * (of no affect) on systems that are actively using CPU hotplug * but making no active use of cpusets. * * This handles CPU hotplug (cpuhp) events. If someday Memory * Nodes can be hotplugged (dynamically changing node_online_map) * then we should handle that too, perhaps in a similar way. */ #ifdef CONFIG_HOTPLUG_CPU static int cpuset_handle_cpuhp(struct notifier_block *nb, unsigned long phase, void *cpu) { mutex_lock(&manage_mutex); mutex_lock(&callback_mutex); top_cpuset.cpus_allowed = cpu_online_map; mutex_unlock(&callback_mutex); mutex_unlock(&manage_mutex); return 0; } #endif ... plus the hotcpu_notifier() initializer. ================================================================= However I get to spend the few lines of code I saved here elsewhere, adding special case code so that the top_cpuset's 'cpus' and 'mems' files are read-only. The new rule is simple: The top cpuset's cpus and mems track what is online. The user is no longer in direct control of these two cpuset settings. And I should add a few lines to Documentation/cpusets.txt, describing this. When I return from my son's 18-th birthday party this evening, I will see what I can whip up. Thanks, Nathan. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 4:51 ` Andrew Morton 2006-08-22 5:04 ` Nathan Lynch @ 2006-08-22 5:06 ` Paul Jackson 2006-08-22 5:14 ` Andrew Morton 1 sibling, 1 reply; 28+ messages in thread From: Paul Jackson @ 2006-08-22 5:06 UTC (permalink / raw) To: Andrew Morton; +Cc: anton, simon.derr, nathanl, linux-kernel Andrew wrote: > If the kernel provider (ie: distro) has enabled cpusets then it would be > appropriate that they also provide a hotplug script which detects whether their > user is actually using cpusets and if not, to take some sensible default action Interesting point - whether the default action to fix up the cpuset configuration when a CPU goes on or offline should be: 1) coded in kernel/cpuset.c, or 2) coded in a hotplug script. I've got 25 cents that says Andrew votes for (2). At least so far as the cpuset aware portion of the coding goes, it would probably be easier for me to code in a hotplug script. I should learn enough about how hotplug scripts work to see if this will really work. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:06 ` Paul Jackson @ 2006-08-22 5:14 ` Andrew Morton 2006-08-22 5:21 ` Paul Jackson 2006-08-23 14:58 ` Anton Blanchard 0 siblings, 2 replies; 28+ messages in thread From: Andrew Morton @ 2006-08-22 5:14 UTC (permalink / raw) To: Paul Jackson; +Cc: anton, simon.derr, nathanl, linux-kernel On Mon, 21 Aug 2006 22:06:25 -0700 Paul Jackson <pj@sgi.com> wrote: > Andrew wrote: > > If the kernel provider (ie: distro) has enabled cpusets then it would be > > appropriate that they also provide a hotplug script which detects whether their > > user is actually using cpusets and if not, to take some sensible default action > > Interesting point - whether the default action to fix up the > cpuset configuration when a CPU goes on or offline should be: > 1) coded in kernel/cpuset.c, or > 2) coded in a hotplug script. > > I've got 25 cents that says Andrew votes for (2). > > At least so far as the cpuset aware portion of the coding goes, > it would probably be easier for me to code in a hotplug script. > > I should learn enough about how hotplug scripts work to see if > this will really work. > Well... let's suck it and see (please). If for some reason it proves inadequate and the default kernel behaviour is significantly wrong (it seems to be) then there's an argument for modifying (ie: adding complexity to) the kernel. But I don't think we yet know that. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:14 ` Andrew Morton @ 2006-08-22 5:21 ` Paul Jackson 2006-08-23 14:58 ` Anton Blanchard 1 sibling, 0 replies; 28+ messages in thread From: Paul Jackson @ 2006-08-22 5:21 UTC (permalink / raw) To: Andrew Morton; +Cc: anton, simon.derr, nathanl, linux-kernel Andrew wrote: > > I should learn enough about how hotplug scripts work to see if > > this will really work. > > > > Well... let's suck it and see (please). Yup - good plan - will do. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-22 5:14 ` Andrew Morton 2006-08-22 5:21 ` Paul Jackson @ 2006-08-23 14:58 ` Anton Blanchard 2006-08-23 18:59 ` Paul Jackson 2006-08-23 19:08 ` Paul Jackson 1 sibling, 2 replies; 28+ messages in thread From: Anton Blanchard @ 2006-08-23 14:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Paul Jackson, simon.derr, nathanl, linux-kernel Hi, > Well... let's suck it and see (please). If for some reason it proves > inadequate and the default kernel behaviour is significantly wrong (it > seems to be) then there's an argument for modifying (ie: adding complexity > to) the kernel. I think there is. We have a userspace visible change to the sched affinity API when the cpusets option is enabled. Papering over it with a udev callback doesnt sound like the right solution. Im struggling to understand why we have this problem at all. If a task has not been touched by cpuset calls it should be allowed to use any cpu. I completely agree that once you have partitioned the task with cpusets then it should never spill onto more recently hotplug added cpus. Anton ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 14:58 ` Anton Blanchard @ 2006-08-23 18:59 ` Paul Jackson 2006-08-23 19:08 ` Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Paul Jackson @ 2006-08-23 18:59 UTC (permalink / raw) To: Anton Blanchard; +Cc: akpm, simon.derr, nathanl, linux-kernel Anton wrote: > Im struggling to understand why we have this problem at all. If a task > has not been touched by cpuset calls it should be allowed to use any > cpu. Cpusets are not like sched_setaffinity. It's not something that is optional for a task. Rather, if your kernel config has CPUSETS enabled, then every task is in a cpuset, always, By default, every task is in the top_cpuset. Today, the top_cpuset has a static copy of the online cpus as of the the system boot. We need to make the cpus in the top_cpuset become a dynamic copy of the online cpus and nodes, not a static copy. I'm confident udev can do this. Right now I'm working through some local problems testing this, and unraveling some complexities with multiple versions of the udev user code, including at least: 1) one version uses the kernel default /sbin/hotplug to callback, 2) another version uses a udevd daemon and /etc/dev.d, and 3) a third version uses udevd, but no /etc/dev.d. The cpuset side of coding this is easy enough for me; the udev side more difficult. Any suggestions on the best place for me to go with more detailed udev questions? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: cpusets not cpu hotplug aware 2006-08-23 14:58 ` Anton Blanchard 2006-08-23 18:59 ` Paul Jackson @ 2006-08-23 19:08 ` Paul Jackson 1 sibling, 0 replies; 28+ messages in thread From: Paul Jackson @ 2006-08-23 19:08 UTC (permalink / raw) To: Anton Blanchard; +Cc: akpm, simon.derr, nathanl, linux-kernel Anton wrote: > We have a userspace visible change to the sched > affinity API when the cpusets option is enabled. Yup - on cpuset kernels, sched_setaffinity is constrained to CPUs in a tasks cpuset. Also mbind and set_mempolicy are constrained to nodes in the tasks cpuset. This is a long standing, intentional part of the cpuset design. Using udev isn't papering over it. It's reflecting the reasonable preference and expectation on systems using hotplug/unplug that the default cpuset dynamically reflect the current online maps, not the static maps as they were at boot. And using udev is using just the sort of mechanism one might expect to be used to adjust the system when devices go on or off line. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2006-08-24 2:54 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-21 13:27 cpusets not cpu hotplug aware Anton Blanchard 2006-08-21 13:54 ` Simon Derr 2006-08-21 17:43 ` Paul Jackson 2006-08-21 19:21 ` Anton Blanchard 2006-08-21 20:42 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch 2006-08-21 23:04 ` Paul Jackson 2006-08-21 23:27 ` Nathan Lynch 2006-08-22 4:42 ` Paul Jackson 2006-08-22 5:07 ` Andrew Morton 2006-08-21 21:01 ` cpusets not cpu hotplug aware Paul Jackson 2006-08-22 4:51 ` Andrew Morton 2006-08-22 5:04 ` Nathan Lynch 2006-08-22 5:11 ` Andrew Morton 2006-08-22 5:14 ` Paul Jackson 2006-08-23 22:11 ` Nathan Lynch 2006-08-23 22:39 ` Paul Jackson 2006-08-23 23:39 ` Nathan Lynch 2006-08-24 2:19 ` Paul Jackson 2006-08-23 23:49 ` Nathan Lynch 2006-08-24 0:12 ` Andrew Morton 2006-08-24 0:48 ` [PATCH] cpuset code prevents binding tasks to new cpus Nathan Lynch 2006-08-24 2:54 ` cpusets not cpu hotplug aware Paul Jackson 2006-08-22 5:06 ` Paul Jackson 2006-08-22 5:14 ` Andrew Morton 2006-08-22 5:21 ` Paul Jackson 2006-08-23 14:58 ` Anton Blanchard 2006-08-23 18:59 ` Paul Jackson 2006-08-23 19:08 ` Paul Jackson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.