* 2.6.16 - sys_sched_getaffinity & hotplug
@ 2006-01-27 23:06 Jack Steiner
2006-01-28 2:58 ` Nathan Lynch
2006-01-28 3:14 ` Paul Jackson
0 siblings, 2 replies; 20+ messages in thread
From: Jack Steiner @ 2006-01-27 23:06 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel
It appears if CONFIG_HOTPLUG_CPU is enabled, then all possible
cpus (0 .. NR_CPUS-1) are set in the cpu_possible_map on IA64.
void __init
smp_build_cpu_map (void)
{
...
for (cpu = 0; cpu < NR_CPUS; cpu++) {
ia64_cpu_to_sapicid[cpu] = -1;
#ifdef CONFIG_HOTPLUG_CPU <<<<
cpu_set(cpu, cpu_possible_map); <<<<
#endif <<<<
}
sched_getaffinity() returns the cpu_possible_map and'd with the current
task p->cpus_allowed. The default cpus_allowed is all ones.
This is causing problems for apps that use sched_get_sched_affinity()
to determine which cpus that they are allowed to run on.
The call to sched_getaffinity returns:
(from strace on a 2 cpu system with NR_CPUS = 512)
sched_getaffinity(0, 1024, { ffffffffffffffff, ffffff ...
The man page for sched_getaffinity() is ambiguous. It says:
- A set bit corresponds to a legally schedulable CPU
But it also says:
- Usually, all bits in the mask are set.
Should the following change be made to sched_getaffinity().
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600
+++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600
@@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas
goto out_unlock;
retval = 0;
- cpus_and(*mask, p->cpus_allowed, cpu_possible_map);
+ cpus_and(*mask, p->cpus_allowed, cpu_online_map);
out_unlock:
read_unlock(&tasklist_lock);
--
Thanks
Jack Steiner (steiner@sgi.com)
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-27 23:06 2.6.16 - sys_sched_getaffinity & hotplug Jack Steiner @ 2006-01-28 2:58 ` Nathan Lynch 2006-01-29 13:06 ` Jack Steiner 2006-01-28 3:14 ` Paul Jackson 1 sibling, 1 reply; 20+ messages in thread From: Nathan Lynch @ 2006-01-28 2:58 UTC (permalink / raw) To: Jack Steiner; +Cc: mingo, linux-kernel Jack Steiner wrote: > > It appears if CONFIG_HOTPLUG_CPU is enabled, then all possible > cpus (0 .. NR_CPUS-1) are set in the cpu_possible_map on IA64. That's too bad... > sched_getaffinity() returns the cpu_possible_map and'd with the current > task p->cpus_allowed. The default cpus_allowed is all ones. > > This is causing problems for apps that use sched_get_sched_affinity() > to determine which cpus that they are allowed to run on. How? Are these apps expecting all set bits to correspond to online cpus? > The call to sched_getaffinity returns: > > (from strace on a 2 cpu system with NR_CPUS = 512) > sched_getaffinity(0, 1024, { ffffffffffffffff, ffffff ... > > > > The man page for sched_getaffinity() is ambiguous. It says: > - A set bit corresponds to a legally schedulable CPU > > But it also says: > - Usually, all bits in the mask are set. > > > Should the following change be made to sched_getaffinity(). > > Index: linux/kernel/sched.c > =================================================================== > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > goto out_unlock; > > retval = 0; > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); I don't think so. For one, that would be mucking around with a kernel/userspace ABI, I guess. Additionally, it would mean that the result of sched_getaffinity would vary with the number of online cpus in the system, which I don't think is desirable. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 2:58 ` Nathan Lynch @ 2006-01-29 13:06 ` Jack Steiner 0 siblings, 0 replies; 20+ messages in thread From: Jack Steiner @ 2006-01-29 13:06 UTC (permalink / raw) To: Nathan Lynch; +Cc: mingo, linux-kernel On Fri, Jan 27, 2006 at 08:58:55PM -0600, Nathan Lynch wrote: > Jack Steiner wrote: > > > > It appears if CONFIG_HOTPLUG_CPU is enabled, then all possible > > cpus (0 .. NR_CPUS-1) are set in the cpu_possible_map on IA64. > > That's too bad... Yes it is! It breaks current applications that expect a set bit to correspond to a valid cpu that a task can be scheduled on. We have MPI applications that use sched_getaffinity() to determine where to place their threads. Placing them on non-existant cpus is problematic :-) > > > > sched_getaffinity() returns the cpu_possible_map and'd with the current > > task p->cpus_allowed. The default cpus_allowed is all ones. > > > > This is causing problems for apps that use sched_get_sched_affinity() > > to determine which cpus that they are allowed to run on. > > How? Are these apps expecting all set bits to correspond to online > cpus? Yes. That is what the man page says. That is what sched_getaffinity() returns if CONFIG_HOTPLUG_CPU is not enabled. > > > > The call to sched_getaffinity returns: > > > > (from strace on a 2 cpu system with NR_CPUS = 512) > > sched_getaffinity(0, 1024, { ffffffffffffffff, ffffff ... > > > > > > > > The man page for sched_getaffinity() is ambiguous. It says: > > - A set bit corresponds to a legally schedulable CPU > > > > But it also says: > > - Usually, all bits in the mask are set. > > > > > > Should the following change be made to sched_getaffinity(). > > > > Index: linux/kernel/sched.c > > =================================================================== > > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > > goto out_unlock; > > > > retval = 0; > > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); > > > I don't think so. > > For one, that would be mucking around with a kernel/userspace ABI, I > guess. I would argue that CONFIG_HOTPLUG_CPU is what changed the API. The hotplug code (at least on IA64) has changed the meaning of the bits. In addition, it does not seem logical that an API should change on IA64 based on whether or not the CONFIG_HOTPLUG_CPU config option is enabled. > > Additionally, it would mean that the result of sched_getaffinity would > vary with the number of online cpus in the system, which I don't think > is desirable. OTOH, if sched_getaffinity() does reflect online cpus, then what does it reflect? If CONFIG_HOTPLUG_CPU is enabled, sched_getaffinity() unconditionally returns a mask with NR_CPUS bits set. This conveys no useful infornmation except for a kernel compile option. -- Thanks Jack Steiner (steiner@sgi.com) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-27 23:06 2.6.16 - sys_sched_getaffinity & hotplug Jack Steiner 2006-01-28 2:58 ` Nathan Lynch @ 2006-01-28 3:14 ` Paul Jackson 2006-01-28 3:42 ` Nathan Lynch 2006-01-28 13:32 ` Ingo Molnar 1 sibling, 2 replies; 20+ messages in thread From: Paul Jackson @ 2006-01-28 3:14 UTC (permalink / raw) To: Jack Steiner; +Cc: mingo, linux-kernel, Robert Love Jack wrote: > Should the following change be made to sched_getaffinity(). > > Index: linux/kernel/sched.c > =================================================================== > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > goto out_unlock; > > retval = 0; > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); Adding Robert Love to the cc list, as he is Mr. sched_getaffinity, I believe. I ended up doing a similar change, to the cpus (and mems) masks in the root (all encompassing) cpuset. These now show the values of cpu_online_map and node_online_map, not *_MASK_ALL. My hunches are: * This change to cpu_online_map is a good one. * The man page sentence "Usually, all bits in the mask are set." might have meant something when it was written, but it is not now clear what. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 3:14 ` Paul Jackson @ 2006-01-28 3:42 ` Nathan Lynch 2006-01-28 4:58 ` Paul Jackson 2006-01-28 13:32 ` Ingo Molnar 1 sibling, 1 reply; 20+ messages in thread From: Nathan Lynch @ 2006-01-28 3:42 UTC (permalink / raw) To: Paul Jackson; +Cc: Jack Steiner, mingo, linux-kernel, Robert Love Paul Jackson wrote: > Jack wrote: > > Should the following change be made to sched_getaffinity(). > > > > Index: linux/kernel/sched.c > > =================================================================== > > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > > goto out_unlock; > > > > retval = 0; > > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); > > Adding Robert Love to the cc list, as he is Mr. sched_getaffinity, > I believe. > > I ended up doing a similar change, to the cpus (and mems) masks > in the root (all encompassing) cpuset. Which is problematic, because cpuset_cpus_allowed -> guarantee_online_cpus restricts the task->cpus_allowed mask to cpus which happen to be online at the time of the call to sched_setaffinity. If more cpus come online later, that task can't be migrated to them. > These now show the values > of cpu_online_map and node_online_map, not *_MASK_ALL. > > My hunches are: > * This change to cpu_online_map is a good one. It's not. > * The man page sentence "Usually, all bits in the mask are set." > might have meant something when it was written, but it is not > now clear what. I think it could reasonably be interpreted as all bits in the mask are set unless the task's affinity has been modified. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 3:42 ` Nathan Lynch @ 2006-01-28 4:58 ` Paul Jackson 2006-01-28 5:23 ` Nathan Lynch 0 siblings, 1 reply; 20+ messages in thread From: Paul Jackson @ 2006-01-28 4:58 UTC (permalink / raw) To: Nathan Lynch; +Cc: steiner, mingo, linux-kernel, rml Nathan wrote: > Which is problematic, because cpuset_cpus_allowed -> > guarantee_online_cpus restricts the task->cpus_allowed mask to cpus > which happen to be online at the time of the call to > sched_setaffinity. If more cpus come online later, that task can't be > migrated to them. Well, sort of. A task could always migrate - just because a sched_getaffinity the task did in the past doesn't show a CPU as valid, doesn't stop the task from asking to pin to that CPU now. One of three lessons could be taken from your example: 1) return all possible CPUS (CPU_MASK_ALL, likely), as you recommend 2) tell the task to not stash possibly stale returns from sched_getaffinity 3) virtualize app CPU numbers relative to their containing cpuset, using an additional layer of user code. I don't think we (or at least I ;) have an adequate understanding yet of how hotplug will interact with the CPU affinity and Memory Node mempolicy system calls, both of which are easier to use if things don't come and go. These calls are still, of course, usable, but the possibilities for the task confusing itself with stale data increase, and the simple system numbering of CPUs and Nodes by these system calls makes (properly so) no effort to hide^Wvirtualize these changes. I tend to prefer lesson (3) above, but haven't yet delivered the libraries or tools needed to support this as Open Source, so can't really expect that preference to be very persuasive to others. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 4:58 ` Paul Jackson @ 2006-01-28 5:23 ` Nathan Lynch 2006-01-28 6:40 ` Paul Jackson 2006-01-28 7:04 ` Paul Jackson 0 siblings, 2 replies; 20+ messages in thread From: Nathan Lynch @ 2006-01-28 5:23 UTC (permalink / raw) To: Paul Jackson; +Cc: steiner, mingo, linux-kernel, rml Paul Jackson wrote: > Nathan wrote: > > Which is problematic, because cpuset_cpus_allowed -> > > guarantee_online_cpus restricts the task->cpus_allowed mask to cpus > > which happen to be online at the time of the call to > > sched_setaffinity. If more cpus come online later, that task can't be > > migrated to them. > > Well, sort of. > > A task could always migrate - just because a sched_getaffinity > the task did in the past doesn't show a CPU as valid, doesn't stop > the task from asking to pin to that CPU now. I was speaking of the setaffinity (not getaffinity) case -- I assumed this was what you were referring to since I couldn't find any calls to the cpuset code in the getaffinity path. > One of three lessons could be taken from your example: > 1) return all possible CPUS (CPU_MASK_ALL, likely), as you > recommend I'm only recommending not changing the current behavior of sched_getaffinity. (BTW - cpu_possible_map can be a subset of CPU_MASK_ALL on some platforms -- powerpc, at least, since we can discover the number of truly possible cpus early in boot.) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 5:23 ` Nathan Lynch @ 2006-01-28 6:40 ` Paul Jackson 2006-01-28 7:04 ` Paul Jackson 1 sibling, 0 replies; 20+ messages in thread From: Paul Jackson @ 2006-01-28 6:40 UTC (permalink / raw) To: Nathan Lynch; +Cc: steiner, mingo, linux-kernel, rml Nathan, responding to pj, responding to Nathan: > > Nathan wrote: > > > Which is problematic, because cpuset_cpus_allowed -> > > > guarantee_online_cpus restricts the task->cpus_allowed mask to cpus > > > which happen to be online at the time of the call to > > > sched_setaffinity. If more cpus come online later, that task can't be > > > migrated to them. > > > > Well, sort of. > > > > A task could always migrate - just because a sched_getaffinity > > the task did in the past doesn't show a CPU as valid, doesn't stop > > the task from asking to pin to that CPU now. > > I was speaking of the setaffinity (not getaffinity) case -- I assumed > this was what you were referring to since I couldn't find any calls to > the cpuset code in the getaffinity path. Oh dear ... and you said 'setaffinity' quite clearly. Though Jack's original post only dealt with getaffinity. I think this discussion is getting quite confused, for which I can take at least some of the credit. You observe, correctly, that the call chain: sched_setaffinity cpuset_cpus_allowed guarantee_online_cpus restricts a sched_setaffinity to CPUs online at the time of that sched_setaffinity call. However, I have no clue how you conclude from this that "If more cpus come online later, that task can't be migrated to them." At anytime that some system service or batch scheduler wants to migrate a task to some different CPUs (whether or not those CPUs were once offline), it can either attach that task to a different cpuset, or change the 'cpus' of its current cpuset. Then if it wants to properly keep that tasks placement relative to its new cpuset, it can reissue a sched_setaffinity on that tasks behalf, to again set that tasks cpus_allowed to the same, relative to the containing cpuset, CPUs as before. Nothing in the behaviour of sched_getaffinity, that Jack was considering, nor in the behaviour of sched_setaffinity, that you thought I must be considering, has any impact on which CPUs a task can be migrated to. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 5:23 ` Nathan Lynch 2006-01-28 6:40 ` Paul Jackson @ 2006-01-28 7:04 ` Paul Jackson 1 sibling, 0 replies; 20+ messages in thread From: Paul Jackson @ 2006-01-28 7:04 UTC (permalink / raw) To: Nathan Lynch; +Cc: steiner, mingo, linux-kernel, rml Nathan wrote: > I'm only recommending not changing the current behavior of > sched_getaffinity. Jack is essentially recommending -unchanging- the behaviour of sched_getaffinity. CONFIG_HOTPLUG_CPU changed it, as an unintended side affect, and Jack is asking if we should revert that change. Prior to CONFIG_HOTPLUG_CPU, on (for example) an ia64 SN2, which is compiled with 512 or 1024 NR_CPUS, the sched_getaffinity call returned at most the number of CPUs set as were online. For example, on an 8 CPU SN2 system (compiled NR_CPUS 512) that is at hand to me right now, compiled without CONFIG_HOTPLUG_CPU, the command: strace -etrace=sched_getaffinity taskset -p $$ produces the strace output: sched_getaffinity(13282, 128, { ff, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) = 128 and produces the taskset output: pid 13282's current affinity mask: ff (why the particular taskset binary I am invoking is compiled for just 128 CPUs beats me ;). This is the sort of behaviour that apps might have become to expect. And it is not clear that apps would clearly distinguish between CPUs online at the moment, and possibly online after some future hotplug event. Given the paucity of hotpluggable CPUs, it is a safe bet most apps doing this have not clearly distinguished these two cases. Now when we introduce CONFIG_HOTPLUG_CPU, as Jack reports, this set_getaffinity call is returning with all bits set (Jack was apparently using a sched_getaffinity call from an app compiled for 1024 CPUs, on a 512 NR_CPUS kernel): (from strace on a 2 cpu system with NR_CPUS = 512) sched_getaffinity(0, 1024, { ffffffffffffffff, ffffff ... This will break code that thinks this return means that there are actually available, right now, all those CPUs. The addition of CONFIG_HOTPLUG_CPU has changed the apparent (what -seems- to be happening) behaviour of sched_getaffinity. Without it, on a small system running a big NR_CPUS kernel, just a small number of bits were set. With it, all the bits are set. We need to choose, with the advent of hotplug, whether the sched_getaffinity means: 1) at most, the CPUs online now, or 2) at most, all possible online CPUs. This choice did not exist before. I recommend choosing the way that will be the "least surprising" to existing code. I believe that this would be (1) the CPUs online now, as Jack's patch accomplishes. We should not stumble blindly into changing the behaviour of a system call in an effort to seem to avoid changing it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 3:14 ` Paul Jackson 2006-01-28 3:42 ` Nathan Lynch @ 2006-01-28 13:32 ` Ingo Molnar 2006-01-28 16:08 ` Jack Steiner ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Ingo Molnar @ 2006-01-28 13:32 UTC (permalink / raw) To: Paul Jackson; +Cc: Jack Steiner, linux-kernel, Robert Love, Andrew Morton * Paul Jackson <pj@sgi.com> wrote: > Jack wrote: > > Should the following change be made to sched_getaffinity(). > > > > Index: linux/kernel/sched.c > > =================================================================== > > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > > goto out_unlock; > > > > retval = 0; > > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); > > Adding Robert Love to the cc list, as he is Mr. sched_getaffinity, I > believe. i'm to blame for the syscall, Robert is to blame for the tool side :-) In any case, Jack's change looks reasonable and obviously correct. Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 13:32 ` Ingo Molnar @ 2006-01-28 16:08 ` Jack Steiner 2006-01-28 19:27 ` Nathan Lynch 2006-01-28 20:09 ` 2.6.16 " Paul Jackson 2 siblings, 0 replies; 20+ messages in thread From: Jack Steiner @ 2006-01-28 16:08 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Jackson, linux-kernel, Robert Love, Andrew Morton On Sat, Jan 28, 2006 at 02:32:44PM +0100, Ingo Molnar wrote: > > * Paul Jackson <pj@sgi.com> wrote: > > > Jack wrote: > > > Should the following change be made to sched_getaffinity(). > > > > > > Index: linux/kernel/sched.c > > > =================================================================== > > > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > > > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > > > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > > > goto out_unlock; > > > > > > retval = 0; > > > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > > > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); > > > > Adding Robert Love to the cc list, as he is Mr. sched_getaffinity, I > > believe. > > i'm to blame for the syscall, Robert is to blame for the tool side > :-) In any case, Jack's change looks reasonable and obviously correct. > > Acked-by: Ingo Molnar <mingo@elte.hu> > > Ingo Ok, thanks. I'll repost as a patch later today.... -- Thanks Jack Steiner (steiner@sgi.com) 651-683-5302 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 13:32 ` Ingo Molnar 2006-01-28 16:08 ` Jack Steiner @ 2006-01-28 19:27 ` Nathan Lynch 2006-01-28 20:06 ` Paul Jackson 2006-01-28 20:09 ` 2.6.16 " Paul Jackson 2 siblings, 1 reply; 20+ messages in thread From: Nathan Lynch @ 2006-01-28 19:27 UTC (permalink / raw) To: Ingo Molnar Cc: Paul Jackson, Jack Steiner, linux-kernel, Robert Love, Andrew Morton Ingo Molnar wrote: > > > Jack wrote: > > > Should the following change be made to sched_getaffinity(). > > > > > > Index: linux/kernel/sched.c > > > =================================================================== > > > --- linux.orig/kernel/sched.c 2006-01-25 08:50:21.401747695 -0600 > > > +++ linux/kernel/sched.c 2006-01-27 16:57:24.504871895 -0600 > > > @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas > > > goto out_unlock; > > > > > > retval = 0; > > > - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); > > > + cpus_and(*mask, p->cpus_allowed, cpu_online_map); > > > In any case, Jack's change looks reasonable and obviously correct. Are you sure? Assuming this change is in effect, consider the following: Task starts with default affinity. Task does sched_getaffinity, stashes the result in saved_mask. Task pins itself to one cpu and does some work. Meanwhile, more cpus are brought online. Task finishes work and does sched_setaffinity(saved_mask). Task will never run on the new cpus. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 19:27 ` Nathan Lynch @ 2006-01-28 20:06 ` Paul Jackson 2006-01-29 13:51 ` [PATCH] " Jack Steiner 0 siblings, 1 reply; 20+ messages in thread From: Paul Jackson @ 2006-01-28 20:06 UTC (permalink / raw) To: Nathan Lynch; +Cc: mingo, steiner, linux-kernel, rml, akpm Nathan wrote: > Task finishes work and does sched_setaffinity(saved_mask). Stupid task. If task wants to run on -all- cpus on a hotplug system, task should not pass a saved mask, but rather construct a mask with all bits set and pass that: cpu_set_t mask; unsigned int i; /* set all bits in mask - code totally untested */ for (i = 0; i < sizeof(cpu_set_t) / sizeof (__cpu_mask); i++) mask.__bits[i] = ~0; sched_setaffinity(&mask); Similar problems exist for a task running in a cpuset under migration. Saved masks are useless in all but static systems, having no migration, no hotplug. That, or use a library on top of this that lets the task work with relative (to whatever is available) CPU and (for the mbind/mempolicy calls) Memory Node numbers and that handles the above details. If all goes well, I should be releasing such a library in the not distant future. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] - sys_sched_getaffinity & hotplug 2006-01-28 20:06 ` Paul Jackson @ 2006-01-29 13:51 ` Jack Steiner 0 siblings, 0 replies; 20+ messages in thread From: Jack Steiner @ 2006-01-29 13:51 UTC (permalink / raw) To: akpm; +Cc: mingo, linux-kernel Change sched_getaffinity() so that it returns a bitmap that indicates the legally schedulable cpus that a task is allowed to run on. Without this patch, if CONFIG_HOTPLUG_CPU is enabled, sched_getaffinity() unconditionally returns (at least on IA64) a mask with NR_CPUS bits set. This conveys no useful infornmation except for a kernel compile option. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> --- This fixes a breakage we obseved running recent kernels. We have MPI jobs that use sched_getaffinity() to determine where to place their threads. Placing them on non-existant cpus is problematic :-) Index: linux/kernel/sched.c =================================================================== --- linux.orig/kernel/sched.c 2006-01-28 10:13:01.834293691 -0600 +++ linux/kernel/sched.c 2006-01-29 07:15:11.217227453 -0600 @@ -4031,7 +4031,7 @@ long sched_getaffinity(pid_t pid, cpumas goto out_unlock; retval = 0; - cpus_and(*mask, p->cpus_allowed, cpu_possible_map); + cpus_and(*mask, p->cpus_allowed, cpu_online_map); out_unlock: read_unlock(&tasklist_lock); ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 13:32 ` Ingo Molnar 2006-01-28 16:08 ` Jack Steiner 2006-01-28 19:27 ` Nathan Lynch @ 2006-01-28 20:09 ` Paul Jackson 2006-01-28 20:50 ` Robert Love 2 siblings, 1 reply; 20+ messages in thread From: Paul Jackson @ 2006-01-28 20:09 UTC (permalink / raw) To: Ingo Molnar; +Cc: steiner, linux-kernel, rml, akpm Ingo wrote: > i'm to blame for the syscall, Robert is to blame for the tool side And here I've been blaming Robert for that syscall all these years. My humble apologies, Robert ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 20:09 ` 2.6.16 " Paul Jackson @ 2006-01-28 20:50 ` Robert Love 2006-01-28 21:00 ` Paul Jackson 2006-01-29 13:01 ` Ingo Molnar 0 siblings, 2 replies; 20+ messages in thread From: Robert Love @ 2006-01-28 20:50 UTC (permalink / raw) To: Paul Jackson; +Cc: Ingo Molnar, rml, akpm, steiner, linux-kernel On Sat, 2006-01-28 at 12:09 -0800, Paul Jackson wrote: > And here I've been blaming Robert for that syscall all these years. > > My humble apologies, Robert ;). Well, I actually did do the 2.5 version of the patch and sent it to Linus, so I do find myself at confession on a monthly basis, begging for some forgiveness. Robert Love ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 20:50 ` Robert Love @ 2006-01-28 21:00 ` Paul Jackson 2006-01-29 13:01 ` Ingo Molnar 1 sibling, 0 replies; 20+ messages in thread From: Paul Jackson @ 2006-01-28 21:00 UTC (permalink / raw) To: Robert Love; +Cc: mingo, rml, akpm, steiner, linux-kernel Robert wrote: > so I do find myself at confession ... yeah ... dont' we all ... -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-28 20:50 ` Robert Love 2006-01-28 21:00 ` Paul Jackson @ 2006-01-29 13:01 ` Ingo Molnar 2006-01-29 16:09 ` Robert Love 1 sibling, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2006-01-29 13:01 UTC (permalink / raw) To: Robert Love; +Cc: Paul Jackson, rml, akpm, steiner, linux-kernel * Robert Love <rml@novell.com> wrote: > On Sat, 2006-01-28 at 12:09 -0800, Paul Jackson wrote: > > > And here I've been blaming Robert for that syscall all these years. > > > > My humble apologies, Robert ;). > > Well, I actually did do the 2.5 version of the patch and sent it to > Linus, so I do find myself at confession on a monthly basis, begging > for some forgiveness. ah, indeed, so *you* are the one to be blamed for passing on a mortally flawed hack, making you guilty of contributory enkludgement of the 2.6 kernel ;) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-29 13:01 ` Ingo Molnar @ 2006-01-29 16:09 ` Robert Love 2006-01-29 17:26 ` Paul Jackson 0 siblings, 1 reply; 20+ messages in thread From: Robert Love @ 2006-01-29 16:09 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Jackson, akpm, steiner, linux-kernel On Sun, 2006-01-29 at 14:01 +0100, Ingo Molnar wrote: > ah, indeed, so *you* are the one to be blamed for passing on a mortally > flawed hack, making you guilty of contributory enkludgement of the 2.6 > kernel ;) To be fair, I should point out that my original patch made the second argument a pointer, so that the kernel could return the actual length of the mask if it were too small. This would move the interface from "flawed hack" to "not too bad". ;-) Anyhow, Linus said that the interface was stupid and the second parameter should not be a pointer. So, if we are gonna blame someone... :) Robert Love ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: 2.6.16 - sys_sched_getaffinity & hotplug 2006-01-29 16:09 ` Robert Love @ 2006-01-29 17:26 ` Paul Jackson 0 siblings, 0 replies; 20+ messages in thread From: Paul Jackson @ 2006-01-29 17:26 UTC (permalink / raw) To: Robert Love; +Cc: mingo, akpm, steiner, linux-kernel Shush Ingo and Robert ... the glibc folks have graciously been covering for us all these years. If you just keep quiet, no one will notice your minor contributions to this botch. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-01-29 17:26 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-01-27 23:06 2.6.16 - sys_sched_getaffinity & hotplug Jack Steiner 2006-01-28 2:58 ` Nathan Lynch 2006-01-29 13:06 ` Jack Steiner 2006-01-28 3:14 ` Paul Jackson 2006-01-28 3:42 ` Nathan Lynch 2006-01-28 4:58 ` Paul Jackson 2006-01-28 5:23 ` Nathan Lynch 2006-01-28 6:40 ` Paul Jackson 2006-01-28 7:04 ` Paul Jackson 2006-01-28 13:32 ` Ingo Molnar 2006-01-28 16:08 ` Jack Steiner 2006-01-28 19:27 ` Nathan Lynch 2006-01-28 20:06 ` Paul Jackson 2006-01-29 13:51 ` [PATCH] " Jack Steiner 2006-01-28 20:09 ` 2.6.16 " Paul Jackson 2006-01-28 20:50 ` Robert Love 2006-01-28 21:00 ` Paul Jackson 2006-01-29 13:01 ` Ingo Molnar 2006-01-29 16:09 ` Robert Love 2006-01-29 17:26 ` Paul Jackson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox