* [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
@ 2004-08-30 9:41 Nathan Lynch
2004-08-30 13:29 ` Ingo Molnar
0 siblings, 1 reply; 3+ messages in thread
From: Nathan Lynch @ 2004-08-30 9:41 UTC (permalink / raw)
To: akpm; +Cc: lkml, Rusty Russell, Ingo Molnar
Hi-
I've reported this issue a couple of times and I think I've finally
tracked it down, though I don't know whether I've come up with the best
fix.
To recap, offlining a cpu with current bk results in the "Aiee, killing
interrupt handler!" panic from do_exit(). This seems to be triggered
only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled. I believe
the problem is that when do_stop() calls schedule(), dependent_sleeper()
drops the "offline" cpu's rq->lock and never reacquires it.
The following seems to work (tested on ppc64). Is there a better way?
Nathan
---
Return early from dependent_sleeper and wake_sleeping_dependent if
this_cpu is offline to avoid releasing this_cpu's rq->lock.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
---
diff -puN kernel/sched.c~sched-smt-cpu-hotplug-fix kernel/sched.c
--- 2.6-bk/kernel/sched.c~sched-smt-cpu-hotplug-fix 2004-08-30 04:22:49.000000000 -0500
+++ 2.6-bk-nathanl/kernel/sched.c 2004-08-30 04:23:28.000000000 -0500
@@ -2502,7 +2502,7 @@ static inline void wake_sleeping_depende
cpumask_t sibling_map;
int i;
- if (!(sd->flags & SD_SHARE_CPUPOWER))
+ if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
return;
/*
@@ -2549,7 +2549,7 @@ static inline int dependent_sleeper(int
int ret = 0, i;
task_t *p;
- if (!(sd->flags & SD_SHARE_CPUPOWER))
+ if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
return 0;
/*
_
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
2004-08-30 9:41 [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent Nathan Lynch
@ 2004-08-30 13:29 ` Ingo Molnar
2004-09-01 23:25 ` Rusty Russell
0 siblings, 1 reply; 3+ messages in thread
From: Ingo Molnar @ 2004-08-30 13:29 UTC (permalink / raw)
To: Nathan Lynch; +Cc: akpm, lkml, Rusty Russell
* Nathan Lynch <nathanl@austin.ibm.com> wrote:
> To recap, offlining a cpu with current bk results in the "Aiee,
> killing interrupt handler!" panic from do_exit(). This seems to be
> triggered only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled.
> I believe the problem is that when do_stop() calls schedule(),
> dependent_sleeper() drops the "offline" cpu's rq->lock and never
> reacquires it.
>
> The following seems to work (tested on ppc64). Is there a better way?
> - if (!(sd->flags & SD_SHARE_CPUPOWER))
> + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
> + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
it would really be nice to do this without any runtime overhead. Rusty?
Ingo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
2004-08-30 13:29 ` Ingo Molnar
@ 2004-09-01 23:25 ` Rusty Russell
0 siblings, 0 replies; 3+ messages in thread
From: Rusty Russell @ 2004-09-01 23:25 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Nathan Lynch, Andrew Morton, lkml - Kernel Mailing List
On Mon, 2004-08-30 at 23:29, Ingo Molnar wrote:
> * Nathan Lynch <nathanl@austin.ibm.com> wrote:
>
> > To recap, offlining a cpu with current bk results in the "Aiee,
> > killing interrupt handler!" panic from do_exit(). This seems to be
> > triggered only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled.
> > I believe the problem is that when do_stop() calls schedule(),
> > dependent_sleeper() drops the "offline" cpu's rq->lock and never
> > reacquires it.
> >
> > The following seems to work (tested on ppc64). Is there a better way?
>
> > - if (!(sd->flags & SD_SHARE_CPUPOWER))
> > + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
>
> > + if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
>
> it would really be nice to do this without any runtime overhead. Rusty?
If the scheduling topology is updated atomically (ie. inside
__cpu_disable), this would not happen; there are patches around to do
this and I think longer term this is the correct fix. However, I
believe a good current fix is to merely ensure that the current cpu is
always set in dependent_sleeper(), something like:L
/*
* The same locking rules and details apply as for
* wake_sleeping_dependent():
*/
spin_unlock(&this_rq->lock);
cpus_and(sibling_map, sd->span, cpu_online_map);
+ /* Include this CPU, for special case of going us dying */
+ cpu_set(this_cpu, sibling_map);
for_each_cpu_mask(i, sibling_map)
spin_lock(&cpu_rq(i)->lock);
cpu_clear(this_cpu, sibling_map);
Not quite free, but very cheap...
Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-09-01 23:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30 9:41 [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent Nathan Lynch
2004-08-30 13:29 ` Ingo Molnar
2004-09-01 23:25 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox