* CPU Hotplug: Hotplug Script And SIGPWR [not found] <20040116174446.A2820@in.ibm.com> @ 2004-01-20 5:44 ` Rusty Russell 2004-01-20 6:33 ` Tim Hockin 0 siblings, 1 reply; 34+ messages in thread From: Rusty Russell @ 2004-01-20 5:44 UTC (permalink / raw) To: vatsa; +Cc: lhcs-devel, linux-kernel, torvalds, akpm, rml In message <20040116174446.A2820@in.ibm.com> you write: > Would it make sense if we defer invoking hotplug script _after_ > the CPU is completely dead (i.e after issuing the CPU_DEAD > notification)? The original code wanted to block until the hotplug script acknowledged the removal before completing it. Greg KH says hotplug doesn't work this way, so now it could well be delivered after everything is over. If it's simpler, we can just do it after. The other issue I wanted to revisit: we currently send SIGPWR to all processes which we have to undo the CPU affinity for (with a new si_info field containing the cpu going down). The main problem is that a process can call sched_setaffinity on another (unrelated) task, which might not know about it. One option would be to only deliver the signal if it's not SIG_DFL for that process. Another would be not to signal, and expect hotplug scripts to clean up. Thoughts? Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 5:44 ` CPU Hotplug: Hotplug Script And SIGPWR Rusty Russell @ 2004-01-20 6:33 ` Tim Hockin 2004-01-20 6:43 ` Nick Piggin 2004-01-20 7:45 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-20 6:33 UTC (permalink / raw) To: Rusty Russell; +Cc: vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 04:44:45PM +1100, Rusty Russell wrote: > The other issue I wanted to revisit: we currently send SIGPWR to all > processes which we have to undo the CPU affinity for (with a new > si_info field containing the cpu going down). > > The main problem is that a process can call sched_setaffinity on > another (unrelated) task, which might not know about it. One option > would be to only deliver the signal if it's not SIG_DFL for that > process. Another would be not to signal, and expect hotplug scripts > to clean up. I had to deal with this in my procstate patch (was against RH 2.4 with O(1) sched but not 2.6). What I chose to do (and what the people who were wanting the code wanted) was to move tasks which had no CPU to run upon onto an unrunnable list. Whenever a CPU's state is changed, scan the list. Whenevr a task's affinity mask is changed, check if it needs to go onto or come off of the unrunnable_list. I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the task's current (or most recent) CPU and the task's cpus_allowed and cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding these unrunnable tasks. I think the sanest thing for a CPU removal is to migrate everything off the processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, then notify /sbin/hotplug. The hotplug script can then find and handle the unrunnable tasks. No SIGPWR grossness needed. Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was heavily tested and I *think* it is all correct (for that kernel snapshot). Tim ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 6:33 ` Tim Hockin @ 2004-01-20 6:43 ` Nick Piggin 2004-01-20 6:52 ` Tim Hockin 2004-01-20 7:45 ` Rusty Russell 1 sibling, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-20 6:43 UTC (permalink / raw) To: Tim Hockin Cc: Rusty Russell, vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Tue, Jan 20, 2004 at 04:44:45PM +1100, Rusty Russell wrote: > >>The other issue I wanted to revisit: we currently send SIGPWR to all >>processes which we have to undo the CPU affinity for (with a new >>si_info field containing the cpu going down). >> >>The main problem is that a process can call sched_setaffinity on >>another (unrelated) task, which might not know about it. One option >>would be to only deliver the signal if it's not SIG_DFL for that >>process. Another would be not to signal, and expect hotplug scripts >>to clean up. >> > >I had to deal with this in my procstate patch (was against RH 2.4 with O(1) >sched but not 2.6). What I chose to do (and what the people who were >wanting the code wanted) was to move tasks which had no CPU to run upon onto >an unrunnable list. Whenever a CPU's state is changed, scan the list. >Whenevr a task's affinity mask is changed, check if it needs to go onto or >come off of the unrunnable_list. > >I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the >task's current (or most recent) CPU and the task's cpus_allowed and >cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding >these unrunnable tasks. > >I think the sanest thing for a CPU removal is to migrate everything off the >processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, >then notify /sbin/hotplug. The hotplug script can then find and handle the >unrunnable tasks. No SIGPWR grossness needed. > >Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was >heavily tested and I *think* it is all correct (for that kernel snapshot). > Seems less robust and more ad hoc than SIGPWR, however. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 6:43 ` Nick Piggin @ 2004-01-20 6:52 ` Tim Hockin 2004-01-20 7:11 ` Nick Piggin 2004-01-20 23:51 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-20 6:52 UTC (permalink / raw) To: Nick Piggin Cc: Rusty Russell, vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote: > >I think the sanest thing for a CPU removal is to migrate everything off the > >processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, > >then notify /sbin/hotplug. The hotplug script can then find and handle the > >unrunnable tasks. No SIGPWR grossness needed. > > > >Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was > >heavily tested and I *think* it is all correct (for that kernel snapshot). > > Seems less robust and more ad hoc than SIGPWR, however. Disagree. SIGPWR will kill any process that doesn't catch it. That's policy. It seems more robust to let the hotplug script decide what to do. If it wants to kill each unrunnable task with SIGPWR, it can. But if it wants to let them live, it can. Tim ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 6:52 ` Tim Hockin @ 2004-01-20 7:11 ` Nick Piggin 2004-01-20 7:30 ` Tim Hockin 2004-01-20 23:51 ` Rusty Russell 1 sibling, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-20 7:11 UTC (permalink / raw) To: Tim Hockin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote: > >>>I think the sanest thing for a CPU removal is to migrate everything off the >>>processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, >>>then notify /sbin/hotplug. The hotplug script can then find and handle the >>>unrunnable tasks. No SIGPWR grossness needed. >>> >>>Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was >>>heavily tested and I *think* it is all correct (for that kernel snapshot). >>> >>Seems less robust and more ad hoc than SIGPWR, however. >> > >Disagree. SIGPWR will kill any process that doesn't catch it. That's >policy. It seems more robust to let the hotplug script decide what to do. >If it wants to kill each unrunnable task with SIGPWR, it can. But if it >wants to let them live, it can. > I thought hotplug is allowed to fail? Thus you can have a hung system. Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the process needs a guaranteed scheduling latency? (I dropped lhcs-devel@lists.sourceforge.net because its moderated) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:11 ` Nick Piggin @ 2004-01-20 7:30 ` Tim Hockin 2004-01-20 7:45 ` Nick Piggin 0 siblings, 1 reply; 34+ messages in thread From: Tim Hockin @ 2004-01-20 7:30 UTC (permalink / raw) To: Nick Piggin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 06:11:49PM +1100, Nick Piggin wrote: > I thought hotplug is allowed to fail? Thus you can have a hung system. > Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the > process needs a guaranteed scheduling latency? I guess a hotplug script MAY fail. I don't think it's a good idea to make your CPU hotplug script fail. May and Misght are different. It's up to the implementor whether the script can get into a failure condition. The hotplug script can only become unrunnable if you yank out all the CPUs on the system. I'd assume it would have an affinity of 0xffffffff. What if <which> process needs guaranteed scheduling latency? Do we really _guarantee_ scheduling latency *anywhere*? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:30 ` Tim Hockin @ 2004-01-20 7:45 ` Nick Piggin 2004-01-20 7:54 ` Tim Hockin 0 siblings, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-20 7:45 UTC (permalink / raw) To: Tim Hockin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Tue, Jan 20, 2004 at 06:11:49PM +1100, Nick Piggin wrote: > >>I thought hotplug is allowed to fail? Thus you can have a hung system. >>Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the >>process needs a guaranteed scheduling latency? >> > >I guess a hotplug script MAY fail. I don't think it's a good idea to make >your CPU hotplug script fail. May and Misght are different. It's up to the >implementor whether the script can get into a failure condition. > Sorry bad wording. The script may fail to be executed. > >The hotplug script can only become unrunnable if you yank out all the CPUs >on the system. I'd assume it would have an affinity of 0xffffffff. > OK I guess thats not such a valid concern > >What if <which> process needs guaranteed scheduling latency? Do we really >_guarantee_ scheduling latency *anywhere*? > > We do guarantee that a realtime task won't be blocked waiting for a hotplug script to fault in and start it up again (which may not happen). Not sure how important this issue is. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:45 ` Nick Piggin @ 2004-01-20 7:54 ` Tim Hockin 2004-01-20 8:14 ` Nick Piggin 2004-01-21 0:00 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-20 7:54 UTC (permalink / raw) To: Nick Piggin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 06:45:37PM +1100, Nick Piggin wrote: > >I guess a hotplug script MAY fail. I don't think it's a good idea to make > >your CPU hotplug script fail. May and Misght are different. It's up to > >the > >implementor whether the script can get into a failure condition. > > > > Sorry bad wording. The script may fail to be executed. Under what conditions? Not arbitrary entropy, surely. If a hotplug script is present and does not blow up, it should be safe to assume it will be run upon an event being delivered. If not, we have a WAY bigger problem :) > >What if <which> process needs guaranteed scheduling latency? Do we really > >_guarantee_ scheduling latency *anywhere*? > > We do guarantee that a realtime task won't be blocked waiting for > a hotplug script to fault in and start it up again (which may not > happen). Not sure how important this issue is. We have a conflict of priority here. If an RT task is affined to CPU A and CPU A gets yanked out, what do we do? Obviously the RT task can't keep running as it was. It was affined to A. Maybe for a good reason. I see we have a few choices here: * re-affine it automatically, thereby silently undoing the explicit affinity. * violate it's RT scheduling by not running it until it has been re-affined or CPU A returns to the pool/ Sending it a SIGPWR means you have to run it on a different CPU that it was affined to, which is already a violation. Basically, RT tasks + CPU affinity + hotplug CPUs do not play nicely together. I don't see much that can be done to solve that. With the procstate stuff I did, and with planned CPU unplugs we *do* have time before the CPU really goes offline in which to act. With unplanned CPU offlining, we don't. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:54 ` Tim Hockin @ 2004-01-20 8:14 ` Nick Piggin 2004-01-20 8:29 ` Tim Hockin 2004-01-20 8:41 ` Stefan Smietanowski 2004-01-21 0:00 ` Rusty Russell 1 sibling, 2 replies; 34+ messages in thread From: Nick Piggin @ 2004-01-20 8:14 UTC (permalink / raw) To: Tim Hockin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Tue, Jan 20, 2004 at 06:45:37PM +1100, Nick Piggin wrote: > >>>I guess a hotplug script MAY fail. I don't think it's a good idea to make >>>your CPU hotplug script fail. May and Misght are different. It's up to >>>the >>>implementor whether the script can get into a failure condition. >>> >>> >>Sorry bad wording. The script may fail to be executed. >> > >Under what conditions? Not arbitrary entropy, surely. If a hotplug script >is present and does not blow up, it should be safe to assume it will be run >upon an event being delivered. If not, we have a WAY bigger problem :) > That assumption is not safe. The main problems are of course process limits and memory allocation failure. > >>>What if <which> process needs guaranteed scheduling latency? Do we really >>>_guarantee_ scheduling latency *anywhere*? >>> >>We do guarantee that a realtime task won't be blocked waiting for >>a hotplug script to fault in and start it up again (which may not >>happen). Not sure how important this issue is. >> > >We have a conflict of priority here. If an RT task is affined to CPU A and >CPU A gets yanked out, what do we do? > >Obviously the RT task can't keep running as it was. It was affined to A. >Maybe for a good reason. I see we have a few choices here: > >* re-affine it automatically, thereby silently undoing the explicit > affinity. >* violate it's RT scheduling by not running it until it has been re-affined > or CPU A returns to the pool/ > >Sending it a SIGPWR means you have to run it on a different CPU that it was >affined to, which is already a violation. > At least the task has the option to handle the problem. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:14 ` Nick Piggin @ 2004-01-20 8:29 ` Tim Hockin 2004-01-20 8:37 ` Nick Piggin 2004-01-20 8:41 ` Stefan Smietanowski 1 sibling, 1 reply; 34+ messages in thread From: Tim Hockin @ 2004-01-20 8:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 07:14:12PM +1100, Nick Piggin wrote: > >Under what conditions? Not arbitrary entropy, surely. If a hotplug script > >is present and does not blow up, it should be safe to assume it will be run > >upon an event being delivered. If not, we have a WAY bigger problem :) > > > > That assumption is not safe. The main problems are of course process limits > and memory allocation failure. If root has a process limit that make hotplug scripts fail to run, then we're hosed in a lot of ways. And if we fail to allocate memory, there really ought to be some retry or something. It seems to me that a failure to run a hotplug script is a BAD THING. > >Sending it a SIGPWR means you have to run it on a different CPU that it was > >affined to, which is already a violation. > > At least the task has the option to handle the problem. But it is a violation of the affinity. As the kernel we CAN NOT know what the affinity really means. Maybe there is some way for a task to indicate it would like to receive SIGPWR in that case. Or some other signal. Can we invent new signals? That way a task that KNOWS about the CPU disappearing underneath it can be wise, while everything else will not just get killed. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:29 ` Tim Hockin @ 2004-01-20 8:37 ` Nick Piggin 2004-01-20 8:43 ` Tim Hockin 0 siblings, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-20 8:37 UTC (permalink / raw) To: Tim Hockin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Tue, Jan 20, 2004 at 07:14:12PM +1100, Nick Piggin wrote: > >>>Under what conditions? Not arbitrary entropy, surely. If a hotplug script >>>is present and does not blow up, it should be safe to assume it will be run >>>upon an event being delivered. If not, we have a WAY bigger problem :) >>> >>> >>That assumption is not safe. The main problems are of course process limits >>and memory allocation failure. >> > >If root has a process limit that make hotplug scripts fail to run, then >we're hosed in a lot of ways. And if we fail to allocate memory, there >really ought to be some retry or something. It seems to me that a failure >to run a hotplug script is a BAD THING. > (or OOM killed being another that comes to mind) It is sometimes inevitable. With that knowledge we should be designing for graceful failure. > >>>Sending it a SIGPWR means you have to run it on a different CPU that it was >>>affined to, which is already a violation. >>> >>At least the task has the option to handle the problem. >> > >But it is a violation of the affinity. As the kernel we CAN NOT know what >the affinity really means. > Not if the application is designed to handle it. How would hotplug scripts make this any different, anyway? > Maybe there is some way for a task to indicate >it would like to receive SIGPWR in that case. Or some other signal. Can we >invent new signals? > >That way a task that KNOWS about the CPU disappearing underneath it can be >wise, while everything else will not just get killed. > Rusty thought you just wouldn't send it unless the process was handling it. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:37 ` Nick Piggin @ 2004-01-20 8:43 ` Tim Hockin 2004-01-21 4:06 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 34+ messages in thread From: Tim Hockin @ 2004-01-20 8:43 UTC (permalink / raw) To: Nick Piggin; +Cc: Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 07:37:48PM +1100, Nick Piggin wrote: > (or OOM killed being another that comes to mind) > > It is sometimes inevitable. With that knowledge we should be designing > for graceful failure. Don't get me started on OOM killer. If the OOM killer is killing hotplug scripts, there's another problem. What's the chance of hotplug scripts being the memory hog? :) That said, I understand what you're saying. It's rough. > >But it is a violation of the affinity. As the kernel we CAN NOT know what > >the affinity really means. > > Not if the application is designed to handle it. How would hotplug > scripts make this any different, anyway? IFF the app is designed to handle it. The existence of a SIGPWR handler does not necessarily imply that, though. a SIGCPU or something might correlate 1:1 with this, but SIGPWR doesn't. Solving it from hotplug scripts means the task's affinity is not automatically violated. It means the decision to violate the affinity was made in user-space, probably by the admin, who CAN know what the affinity means. > Rusty thought you just wouldn't send it unless the process was handling > it. I remembered that after I sent it, sorry. :) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:43 ` Tim Hockin @ 2004-01-21 4:06 ` Srivatsa Vaddagiri 2004-01-21 4:14 ` Nick Piggin 2004-01-21 5:07 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Srivatsa Vaddagiri @ 2004-01-21 4:06 UTC (permalink / raw) To: Tim Hockin; +Cc: Nick Piggin, Rusty Russell, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote: > IFF the app is designed to handle it. The existence of a SIGPWR handler > does not necessarily imply that, though. a SIGCPU or something might > correlate 1:1 with this, but SIGPWR doesn't. I agree we should have a separe signal for CPU Hotplug. By default the signal will be ignored, unless a task registers a signal handler for that special signal. That way, tasks which "knowingly" change their CPU affinity will be able to tackle a CPU going down by handling the signal (probably change their CPU affinity again), while tasks which have their CPU affinity changed "unknowingly" (by other tasks) will just ignore the signal. The hotplug script interface allows the admin to go and change the CPU affinity again for the second class of tasks, if needed. The only problem with a new signal is conformance to standards (if any). -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 4:06 ` Srivatsa Vaddagiri @ 2004-01-21 4:14 ` Nick Piggin 2004-01-21 5:09 ` Srivatsa Vaddagiri ` (2 more replies) 2004-01-21 5:07 ` Rusty Russell 1 sibling, 3 replies; 34+ messages in thread From: Nick Piggin @ 2004-01-21 4:14 UTC (permalink / raw) To: vatsa; +Cc: Tim Hockin, Rusty Russell, linux-kernel, torvalds, akpm, rml Srivatsa Vaddagiri wrote: >On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote: > >>IFF the app is designed to handle it. The existence of a SIGPWR handler >>does not necessarily imply that, though. a SIGCPU or something might >>correlate 1:1 with this, but SIGPWR doesn't. >> > >I agree we should have a separe signal for CPU Hotplug. By default the signal >will be ignored, unless a task registers a signal handler for that special >signal. > I'd be happy with that. > >That way, tasks which "knowingly" change their CPU affinity will be able to >tackle a CPU going down by handling the signal (probably change their CPU >affinity again), while tasks which have their CPU affinity changed "unknowingly" >(by other tasks) will just ignore the signal. The hotplug script interface >allows the admin to go and change the CPU affinity again for the second class >of tasks, if needed. > Yes, that is with the cpu-is-down hotplug event, right? *Before* that happens, tasks that don't handle the signal should just have their affinity changed to all cpus. Or doesn't anybody care to think about hoplug scripts failing? (serious question) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 4:14 ` Nick Piggin @ 2004-01-21 5:09 ` Srivatsa Vaddagiri 2004-01-21 7:08 ` Tim Hockin 2004-01-21 7:09 ` Tim Hockin 2004-01-21 8:11 ` Rusty Russell 2 siblings, 1 reply; 34+ messages in thread From: Srivatsa Vaddagiri @ 2004-01-21 5:09 UTC (permalink / raw) To: Nick Piggin; +Cc: Tim Hockin, Rusty Russell, linux-kernel, torvalds, akpm, rml On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote: > Yes, that is with the cpu-is-down hotplug event, right? right. > *Before* that happens, tasks that don't handle the signal should just > have their affinity changed to all cpus. Currently, handle or not handle the signal, affinity is changed to all cpus for tasks that are bound only to the dying CPU. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 5:09 ` Srivatsa Vaddagiri @ 2004-01-21 7:08 ` Tim Hockin 2004-01-21 15:07 ` Matthias Urlichs 2004-01-22 5:29 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-21 7:08 UTC (permalink / raw) To: Srivatsa Vaddagiri Cc: Nick Piggin, Rusty Russell, linux-kernel, torvalds, akpm, rml On Wed, Jan 21, 2004 at 10:39:33AM +0530, Srivatsa Vaddagiri wrote: > > *Before* that happens, tasks that don't handle the signal should just > > have their affinity changed to all cpus. > > Currently, handle or not handle the signal, affinity is changed > to all cpus for tasks that are bound only to the dying CPU. OK, so lets assume this scenarion: process A affined to cpu1 all other processes affined to 0xffffffff cpu1 goes down - process A affined to 0xffffffff hotplug "cpu1 removed" event cpu1 comes back hotplug "cpu1 inserted" event Process A has now discarded useful potentially VALUABLE information, with no way to retrieve it. The hot plug scripts do not have enough information to put things the way they were before. I can't believe that anyone considers this to be OK. Userspace gave us EXPLICIT instructions, which we then violate. By granting affinity, we have made a contract with userspace. Changing affinity without userspace's direct instruction is wrong. What about this: We already can not handle unexpected CPU removals gracefully, correct? So we expect some user-provided notification, right? So force userland to handle it before we give the OK to remove a CPU. pid_t sys_proc_offline(int cpu) { pid_t p; /* flag cpu as not schedulale anymore */ dont_add_tasks_to(cpu); p = find_first_unrunnable(cpu); if (p) return p; take_proc_offline(cpu); return 0; } The userspace control can then loop on this until it returns 0. Each time it return a pid, userspace must try to handle that pid - kill it, re-affine it, or provide some way to suspend it. Simpler yet: int sys_proc_offline(int cpu, int reaffine) { pid_t p; /* flag cpu as not schedulale anymore */ dont_add_tasks_to(cpu); while ((p = find_first_unrunnable(cpu))) { if (reaffine) reaffine(p); else make_unrunnable(p); } take_proc_offline(cpu); return 0; } Less flexible, but workable. I prefer the first. Yes it's racy, but the worst case is that you receive a pid that you don't need to handle (died or re-affined already). Anything that violates affinity without permission just is so WRONG. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 7:08 ` Tim Hockin @ 2004-01-21 15:07 ` Matthias Urlichs 2004-01-22 5:29 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Matthias Urlichs @ 2004-01-21 15:07 UTC (permalink / raw) To: linux-kernel Hi, Tim Hockin wrote: > We already can not handle unexpected CPU removals gracefully, correct? So > we expect some user-provided notification, right? > Well, if the CPU is executing userland (or idling), we conceivably could. That would kill off one userspace process (which might be able to recover given a signal and longjmp(), but such is life. ;-) > So force userland to handle it before we give the OK to remove a CPU. I like the idea of an "unrunnable" queue, that way you have the option to fix the problem afterwards -- or just ignore it, if you decide it's OK for processes to wait a few minutes while you replace the failing CPU fan. It's like mount(). Usually you unmount cleanly, but sometimes you use -f and something becomes inaccesible. At least WRT CPUs, the inaccessibility is (usually) fixable. (I wish it were so, WRT NFS mounts.) -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - "Whenever the civil government forbids the practice of things that God has commanded us to do, or tells us to do things He has commanded us not to do then we are on solid ground in disobeying the government and rebelling against it." [Pat Robertson] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 7:08 ` Tim Hockin 2004-01-21 15:07 ` Matthias Urlichs @ 2004-01-22 5:29 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-22 5:29 UTC (permalink / raw) To: Tim Hockin; +Cc: Nick Piggin, Rusty Russell, linux-kernel, torvalds, akpm, rml In message <20040121070844.GA31807@hockin.org> you write: > Process A has now discarded useful potentially VALUABLE information, with no > way to retrieve it. The hot plug scripts do not have enough information to > put things the way they were before. I can't believe that anyone considers > this to be OK. We already established that the process which cares has to listed to hotplug events. Userland should handle it *before* telling the kernel to remove the CPU. What we're dealing with here is merely a corner case, IMHO worth neither hysteria nor a great deal of code. Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 4:14 ` Nick Piggin 2004-01-21 5:09 ` Srivatsa Vaddagiri @ 2004-01-21 7:09 ` Tim Hockin 2004-01-21 7:31 ` Nick Piggin 2004-01-21 8:11 ` Rusty Russell 2 siblings, 1 reply; 34+ messages in thread From: Tim Hockin @ 2004-01-21 7:09 UTC (permalink / raw) To: Nick Piggin; +Cc: vatsa, Rusty Russell, linux-kernel, torvalds, akpm, rml On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote: > Or doesn't anybody care to think about hoplug scripts failing? > (serious question) If hotplug scripts are failing, you're in really deep trouble. I can't find a single case where a hotplug script failing would not indicate some other larger failure. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 7:09 ` Tim Hockin @ 2004-01-21 7:31 ` Nick Piggin 2004-01-21 7:42 ` Tim Hockin 0 siblings, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-21 7:31 UTC (permalink / raw) To: Tim Hockin; +Cc: vatsa, Rusty Russell, linux-kernel, torvalds, akpm, rml Tim Hockin wrote: >On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote: > >>Or doesn't anybody care to think about hoplug scripts failing? >>(serious question) >> > >If hotplug scripts are failing, you're in really deep trouble. I can't find >a single case where a hotplug script failing would not indicate some other >larger failure. > sigh. threads-max, pid_max, ulimit, -ENOMEM, oom. In my opinion, you can be in fine shape after one of the above happening, and if limits _are_ in place, its reasonable to expect they're there because they might get reached in rare cases. I'd rather not add something that, by design can hang any number of processes including the entire system if a hotplug script fails. Thats just my honest opinion, I know its rare enough it probably would never happen to anyone. Sorry I keep repeating this, its not my call and its never going to affect me so I'll shut up now ;) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 7:31 ` Nick Piggin @ 2004-01-21 7:42 ` Tim Hockin 0 siblings, 0 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-21 7:42 UTC (permalink / raw) To: Nick Piggin; +Cc: vatsa, Rusty Russell, linux-kernel, torvalds, akpm, rml On Wed, Jan 21, 2004 at 06:31:06PM +1100, Nick Piggin wrote: > >If hotplug scripts are failing, you're in really deep trouble. I can't > >find > >a single case where a hotplug script failing would not indicate some other > >larger failure. > > > > sigh. threads-max, pid_max, ulimit, -ENOMEM, oom. These affect ALL hotplug scripts. If you can't run a hotplug script because you've exceeded root's ulimit, or the max # of tasks/threads in the system, you're in trouble - regardless of what the hotplug event was - SOMETHING is going to go wrong. If you get ENOMEM you have a bigger problem. If you get OOM killed, then the OOM killer has gone haywire (not uncommon, historically). > I'd rather not add something that, by design can hang any number of > processes > including the entire system if a hotplug script fails. Thats just my honest > opinion, I know its rare enough it probably would never happen to anyone. > > Sorry I keep repeating this, its not my call and its never going to affect > me so I'll shut up now ;) I'd rather not add anything like that either. I'm not saying I advocate fast-and-loose at all. On the contrary, I think any action taken in response to a CPU removal needs to be accountable, and wantonly changing affinity is NOT. It'll probably not affect me either, nor is it my decision :) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 4:14 ` Nick Piggin 2004-01-21 5:09 ` Srivatsa Vaddagiri 2004-01-21 7:09 ` Tim Hockin @ 2004-01-21 8:11 ` Rusty Russell 2 siblings, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-21 8:11 UTC (permalink / raw) To: Nick Piggin; +Cc: Tim Hockin, Rusty Russell, linux-kernel, torvalds, akpm, rml In message <400DFC8B.7020906@cyberone.com.au> you write: > Or doesn't anybody care to think about hoplug scripts failing? > (serious question) It seems not. I don't neccessarily agree with it, but we'll see how it goes. Guarantees are hard: if the script is supposed to fork something and you're out of memory, what do you do? Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-21 4:06 ` Srivatsa Vaddagiri 2004-01-21 4:14 ` Nick Piggin @ 2004-01-21 5:07 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-21 5:07 UTC (permalink / raw) To: vatsa; +Cc: Nick Piggin, Rusty Russell, linux-kernel, torvalds, akpm, rml In message <20040121093633.A3169@in.ibm.com> you write: > On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote: > > IFF the app is designed to handle it. The existence of a SIGPWR handler > > does not necessarily imply that, though. a SIGCPU or something might > > correlate 1:1 with this, but SIGPWR doesn't. > > I agree we should have a separe signal for CPU Hotplug. Can we add signals without breaking userspace? If we can, SIGRECONFIG makes sense. If not, I'd rather not have a signal, rely on hotplug, and look at addding a signal in 2.7. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:14 ` Nick Piggin 2004-01-20 8:29 ` Tim Hockin @ 2004-01-20 8:41 ` Stefan Smietanowski 2004-01-20 8:49 ` Nick Piggin 1 sibling, 1 reply; 34+ messages in thread From: Stefan Smietanowski @ 2004-01-20 8:41 UTC (permalink / raw) To: Nick Piggin Cc: Tim Hockin, Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Hi. >> We have a conflict of priority here. If an RT task is affined to CPU >> A and >> CPU A gets yanked out, what do we do? >> >> Obviously the RT task can't keep running as it was. It was affined to A. >> Maybe for a good reason. I see we have a few choices here: >> >> * re-affine it automatically, thereby silently undoing the explicit >> affinity. >> * violate it's RT scheduling by not running it until it has been >> re-affined >> or CPU A returns to the pool/ >> >> Sending it a SIGPWR means you have to run it on a different CPU that >> it was >> affined to, which is already a violation. >> > > At least the task has the option to handle the problem. Why not make a flag that handles that choice explicitly. If the task sets the affinity itself the default is to re-affine it if the cpu gets yanked but if the task wants to be suspended until the CPU reappears it can set a flag for that to happen if the CPU is yanked. If we have a program that can start another program on a specific CPU then that program can dictate how the task should respond by setting the flag the same way as the task would if the task would be the one selecting a specific CPU. Doesn't that fix the problem? If the default was to re-affine to another CPU then we can optionally send it a SIGPWR as well to let it know it was re-affined. But the SIGPWR is in my eyes optional and the above scenario should handle the cases imo. // Stefan ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:41 ` Stefan Smietanowski @ 2004-01-20 8:49 ` Nick Piggin 2004-01-20 9:12 ` Tim Hockin 0 siblings, 1 reply; 34+ messages in thread From: Nick Piggin @ 2004-01-20 8:49 UTC (permalink / raw) To: Stefan Smietanowski Cc: Tim Hockin, Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml Stefan Smietanowski wrote: > Hi. > >>> We have a conflict of priority here. If an RT task is affined to >>> CPU A and >>> CPU A gets yanked out, what do we do? >>> >>> Obviously the RT task can't keep running as it was. It was affined >>> to A. >>> Maybe for a good reason. I see we have a few choices here: >>> >>> * re-affine it automatically, thereby silently undoing the explicit >>> affinity. >>> * violate it's RT scheduling by not running it until it has been >>> re-affined >>> or CPU A returns to the pool/ >>> >>> Sending it a SIGPWR means you have to run it on a different CPU that >>> it was >>> affined to, which is already a violation. >>> >> >> At least the task has the option to handle the problem. > > > Why not make a flag that handles that choice explicitly. > > If the task sets the affinity itself the default is to > re-affine it if the cpu gets yanked but if the task wants to > be suspended until the CPU reappears it can set a flag for > that to happen if the CPU is yanked. > > If we have a program that can start another program on a > specific CPU then that program can dictate how the task > should respond by setting the flag the same way > as the task would if the task would be the one selecting > a specific CPU. Doesn't that fix the problem? Well I'll admit it would usually be more flexible if you freeze the process and run hotplug scripts to handle cpu affinity. Unfortunately it introduces unfixable robustness and realtime problems by design. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:49 ` Nick Piggin @ 2004-01-20 9:12 ` Tim Hockin 0 siblings, 0 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-20 9:12 UTC (permalink / raw) To: Nick Piggin Cc: Stefan Smietanowski, Rusty Russell, vatsa, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 07:49:45PM +1100, Nick Piggin wrote: > Well I'll admit it would usually be more flexible if you freeze > the process and run hotplug scripts to handle cpu affinity. > > Unfortunately it introduces unfixable robustness and realtime > problems by design. And I submit that there is no clean way to handle the RT problem. The proposed flag gives the task a choice, which is good, but I am not sure that the choice is worth the effort. The robustness issues are real, but the same issue applies to all hotplug activity. The issues are severe corner cases which indicate OTHER faults in the system. My main concern is that affinity is not treated as a suggestion or preference. Affinity is an explicit request. Once granted, we can not arbitrarily decide to revoke affinity unless we have a sane way to alert *someone*. Freezing tasks and sending a hotplug event is a sane way. Sending SIGPWR is a sane way IFF you can guarantee that a task which receives SIGPWR will handle a CPU being yanked without violating affinity. This does not handle the case of tasks which do not handle SIGPWR. A flag to indicate 'my affinity is a preference' vs. 'my affinity is a requirement' is a possibly sane way. It still requires all the code to freeze a task, and forces affinity-aware apps to adapt to this new edge case. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:54 ` Tim Hockin 2004-01-20 8:14 ` Nick Piggin @ 2004-01-21 0:00 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-21 0:00 UTC (permalink / raw) To: Tim Hockin; +Cc: Nick Piggin, vatsa, linux-kernel, torvalds, akpm, rml In message <20040120075409.GA13897@hockin.org> you write: > Basically, RT tasks + CPU affinity + hotplug CPUs do not play nicely > together. I don't see much that can be done to solve that. With the > procstate stuff I did, and with planned CPU unplugs we *do* have time before > the CPU really goes offline in which to act. With unplanned CPU offlining, > we don't. This can't be done with the hotplug scripts. I originally ran hotplug synchronous before taking the CPU offline, and Greg KH said that constitutes abuse 8( Userspace can agree on a protocol *before* initiating the offline, of course, in which case it's not a kernel problem. You make an excellent point though: if you need 2 cpus on your system to meet requirements, and you go down to one cpu, you have a problem. But I think that's a "don't do that". Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 6:52 ` Tim Hockin 2004-01-20 7:11 ` Nick Piggin @ 2004-01-20 23:51 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-20 23:51 UTC (permalink / raw) To: Tim Hockin Cc: Nick Piggin, vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml In message <20040120065207.GA10993@hockin.org> you write: > On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote: > > Seems less robust and more ad hoc than SIGPWR, however. > > Disagree. SIGPWR will kill any process that doesn't catch it. That's > policy. It seems more robust to let the hotplug script decide what to do. > If it wants to kill each unrunnable task with SIGPWR, it can. But if it > wants to let them live, it can. The proposal was to send SIGPWR only if they don't have it set to the default, for this reason. I think that if your patch goes in, it will complement this solution nicely. Thanks! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 6:33 ` Tim Hockin 2004-01-20 6:43 ` Nick Piggin @ 2004-01-20 7:45 ` Rusty Russell 2004-01-20 8:37 ` Tim Hockin 1 sibling, 1 reply; 34+ messages in thread From: Rusty Russell @ 2004-01-20 7:45 UTC (permalink / raw) To: Tim Hockin; +Cc: vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml In message <20040120063316.GA9736@hockin.org> you write: > I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the > task's current (or most recent) CPU and the task's cpus_allowed and > cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding > these unrunnable tasks. > > I think the sanest thing for a CPU removal is to migrate everything off the > processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, > then notify /sbin/hotplug. The hotplug script can then find and handle the > unrunnable tasks. No SIGPWR grossness needed. Interesting. The downside is that you now need some script needs to know what to do with the tasks (unless you have something like DBUS, but that's a ways off). There are no correctness concerns AFAICT with userspace not being on a particular CPU, just performance. The SIGPWR solution lets a random process deal appropriately without having to interface with /sbin/hotplug, if it wants to. And it's a lot less invasive. I'll take a look though. Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 7:45 ` Rusty Russell @ 2004-01-20 8:37 ` Tim Hockin 2004-01-20 9:29 ` Srivatsa Vaddagiri 2004-01-21 0:12 ` Rusty Russell 0 siblings, 2 replies; 34+ messages in thread From: Tim Hockin @ 2004-01-20 8:37 UTC (permalink / raw) To: Rusty Russell; +Cc: vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 06:45:41PM +1100, Rusty Russell wrote: > In message <20040120063316.GA9736@hockin.org> you write: > > I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the > > task's current (or most recent) CPU and the task's cpus_allowed and > > cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding > > these unrunnable tasks. > > > > I think the sanest thing for a CPU removal is to migrate everything off the > > processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, > > then notify /sbin/hotplug. The hotplug script can then find and handle the > > unrunnable tasks. No SIGPWR grossness needed. > > Interesting. > > The downside is that you now need some script needs to know what to do > with the tasks (unless you have something like DBUS, but that's a ways Well, if we provide a sane example script, the rest is up to the distros or the people with this hardware to decide. > off). There are no correctness concerns AFAICT with userspace not > being on a particular CPU, just performance. Correctness does matter if an affined task violates that affinity. If we are going to provide explicit affinity, we need to honor it under all conditions, or at least provide an option to honor it. > The SIGPWR solution lets a random process deal appropriately without > having to interface with /sbin/hotplug, if it wants to. And it's a > lot less invasive. I agree about invasiveness. Maybe a combo? Send SIGPWR iff a task is actually handling it, otherwise mark it TASK_UNRUNNABLE and let hotplug handle it? A new signal would be much more polite, but SIGPWR can be made to work. What if a process catches SIGPWR, but does not handle CPU removal? Do we wait for it's signal handler to finish before re-evaluating it for TASK_UNRUNNABLE? Yuck. If a CPU gets yanked with no warning, where do we run the signal handler? Violating affinity again. Tim ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:37 ` Tim Hockin @ 2004-01-20 9:29 ` Srivatsa Vaddagiri 2004-01-21 0:12 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Srivatsa Vaddagiri @ 2004-01-20 9:29 UTC (permalink / raw) To: Tim Hockin; +Cc: Rusty Russell, lhcs-devel, linux-kernel, torvalds, akpm, rml On Tue, Jan 20, 2004 at 12:37:01AM -0800, Tim Hockin wrote: > If a CPU gets yanked with no warning, where do we > run the signal handler? Violating affinity again. With the current CPU Hotplug design, I don't think this is allowed. A CPU has to be offlined first in software before it is yanked out from hardware. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 8:37 ` Tim Hockin 2004-01-20 9:29 ` Srivatsa Vaddagiri @ 2004-01-21 0:12 ` Rusty Russell 1 sibling, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-21 0:12 UTC (permalink / raw) To: Tim Hockin; +Cc: vatsa, lhcs-devel, linux-kernel, torvalds, akpm, rml In message <20040120083700.GB15733@hockin.org> you write: > > off). There are no correctness concerns AFAICT with userspace not > > being on a particular CPU, just performance. > > Correctness does matter if an affined task violates that affinity. If we > are going to provide explicit affinity, we need to honor it under all > conditions, or at least provide an option to honor it. WHY? Think of an example where this is actually a problem. "Under all conditions" is not something we can ever implement for anything. > I agree about invasiveness. Maybe a combo? Send SIGPWR iff a task is > actually handling it, otherwise mark it TASK_UNRUNNABLE and let hotplug > handle it? Well, I think that violating affinity given that (1) affinity in userspace is only a performance issue, and (2) we've been explicitly told to take the CPU down, is a valid solution. OTOH making tasks unrunnable until hotplug gets around to servicing them could equally be a disaster. Given that this requires infrastructure not in Linus' tree and the "simply unbind" solution doesn't, I'm leaning towards unbinding everything which would become unrunnable, SIGPWR if they handle it, and hotplug at the end. Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <fa.f37o48p.1io5q5@ifi.uio.no>]
[parent not found: <fa.frjqvfo.170g8hq@ifi.uio.no>]
* Re: CPU Hotplug: Hotplug Script And SIGPWR [not found] ` <fa.frjqvfo.170g8hq@ifi.uio.no> @ 2004-01-20 17:49 ` Andy Lutomirski 2004-01-21 4:33 ` Rusty Russell 0 siblings, 1 reply; 34+ messages in thread From: Andy Lutomirski @ 2004-01-20 17:49 UTC (permalink / raw) To: Tim Hockin; +Cc: Nick Piggin, Rusty Russell, vatsa, lhcs-devel, linux-kernel Tim Hockin wrote: > On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote: > >>>I think the sanest thing for a CPU removal is to migrate everything off the >>>processor in question, move unrunnable tasks into TASK_UNRUNNABLE state, >>>then notify /sbin/hotplug. The hotplug script can then find and handle the >>>unrunnable tasks. No SIGPWR grossness needed. >>> >> >>Seems less robust and more ad hoc than SIGPWR, however. > > > Disagree. SIGPWR will kill any process that doesn't catch it. That's > policy. It seems more robust to let the hotplug script decide what to do. > If it wants to kill each unrunnable task with SIGPWR, it can. But if it > wants to let them live, it can. This seems like a problem that a lot of power-management issues have. (At some point, linux may want to suspend itself after inactivity. Both RT tasks and some interactive tasks may want to supress that.) Why not add a SIGPM signal, which is only sent if handles, and which indicates that PM event is happening. Give usermode some method of responding to it (e.g. handler returns a value, or a new syscall), and let /sbin/hotplug handle events for tasks that either ignore the signal or responded that they were uninterested. This seems be close to optimal for every case I can think of. --Andy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CPU Hotplug: Hotplug Script And SIGPWR 2004-01-20 17:49 ` Andy Lutomirski @ 2004-01-21 4:33 ` Rusty Russell 0 siblings, 0 replies; 34+ messages in thread From: Rusty Russell @ 2004-01-21 4:33 UTC (permalink / raw) To: Andy Lutomirski Cc: Nick Piggin, Rusty Russell, vatsa, lhcs-devel, linux-kernel In message <400D6A33.6020108@myrealbox.com> you write: > (At some point, linux may want to suspend itself after inactivity. Both > RT tasks and some interactive tasks may want to supress that.) Why not > add a SIGPM signal, which is only sent if handles, and which indicates > that PM event is happening. Give usermode some method of responding to > it (e.g. handler returns a value, or a new syscall), and let > /sbin/hotplug handle events for tasks that either ignore the signal or > responded that they were uninterested. This seems be close to optimal > for every case I can think of. This was my original idea too. AIX has this, but in reality the control ends up all in userspace for non-trivial uses. ie. some "workload manager" program consults with all the interested parties *before* telling the kernel what to do. The async and non-consultive nature of hotplug is policy for good reason. Giving someone 30 seconds to respond to a signal can always fail, and making it configurable is just a bandaid. I have nothing against SIGRECONFIG (think memory hotplug), but the AIX guys indicated from their experience it seems that non-toy users don't use it anyway (they have a hotplug-style script system, too). So: trying to cover every corner case isn't worthwhile in practice, it seems. I like the signal for RC5 challenge etc, but that's about it. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2004-01-22 7:16 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040116174446.A2820@in.ibm.com>
2004-01-20 5:44 ` CPU Hotplug: Hotplug Script And SIGPWR Rusty Russell
2004-01-20 6:33 ` Tim Hockin
2004-01-20 6:43 ` Nick Piggin
2004-01-20 6:52 ` Tim Hockin
2004-01-20 7:11 ` Nick Piggin
2004-01-20 7:30 ` Tim Hockin
2004-01-20 7:45 ` Nick Piggin
2004-01-20 7:54 ` Tim Hockin
2004-01-20 8:14 ` Nick Piggin
2004-01-20 8:29 ` Tim Hockin
2004-01-20 8:37 ` Nick Piggin
2004-01-20 8:43 ` Tim Hockin
2004-01-21 4:06 ` Srivatsa Vaddagiri
2004-01-21 4:14 ` Nick Piggin
2004-01-21 5:09 ` Srivatsa Vaddagiri
2004-01-21 7:08 ` Tim Hockin
2004-01-21 15:07 ` Matthias Urlichs
2004-01-22 5:29 ` Rusty Russell
2004-01-21 7:09 ` Tim Hockin
2004-01-21 7:31 ` Nick Piggin
2004-01-21 7:42 ` Tim Hockin
2004-01-21 8:11 ` Rusty Russell
2004-01-21 5:07 ` Rusty Russell
2004-01-20 8:41 ` Stefan Smietanowski
2004-01-20 8:49 ` Nick Piggin
2004-01-20 9:12 ` Tim Hockin
2004-01-21 0:00 ` Rusty Russell
2004-01-20 23:51 ` Rusty Russell
2004-01-20 7:45 ` Rusty Russell
2004-01-20 8:37 ` Tim Hockin
2004-01-20 9:29 ` Srivatsa Vaddagiri
2004-01-21 0:12 ` Rusty Russell
[not found] <fa.f37o48p.1io5q5@ifi.uio.no>
[not found] ` <fa.frjqvfo.170g8hq@ifi.uio.no>
2004-01-20 17:49 ` Andy Lutomirski
2004-01-21 4:33 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox