public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
@ 2004-08-30  9:41 Nathan Lynch
  2004-08-30 13:29 ` Ingo Molnar
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Lynch @ 2004-08-30  9:41 UTC (permalink / raw)
  To: akpm; +Cc: lkml, Rusty Russell, Ingo Molnar

Hi-

I've reported this issue a couple of times and I think I've finally
tracked it down, though I don't know whether I've come up with the best
fix.

To recap, offlining a cpu with current bk results in the "Aiee, killing
interrupt handler!" panic from do_exit().  This seems to be triggered
only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled.  I believe
the problem is that when do_stop() calls schedule(), dependent_sleeper()
drops the "offline" cpu's rq->lock and never reacquires it.

The following seems to work (tested on ppc64).  Is there a better way?


Nathan

---

Return early from dependent_sleeper and wake_sleeping_dependent if
this_cpu is offline to avoid releasing this_cpu's rq->lock.

Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>


---


diff -puN kernel/sched.c~sched-smt-cpu-hotplug-fix kernel/sched.c
--- 2.6-bk/kernel/sched.c~sched-smt-cpu-hotplug-fix	2004-08-30 04:22:49.000000000 -0500
+++ 2.6-bk-nathanl/kernel/sched.c	2004-08-30 04:23:28.000000000 -0500
@@ -2502,7 +2502,7 @@ static inline void wake_sleeping_depende
 	cpumask_t sibling_map;
 	int i;
 
-	if (!(sd->flags & SD_SHARE_CPUPOWER))
+	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
 		return;
 
 	/*
@@ -2549,7 +2549,7 @@ static inline int dependent_sleeper(int 
 	int ret = 0, i;
 	task_t *p;
 
-	if (!(sd->flags & SD_SHARE_CPUPOWER))
+	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
 		return 0;
 
 	/*

_



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
  2004-08-30  9:41 [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent Nathan Lynch
@ 2004-08-30 13:29 ` Ingo Molnar
  2004-09-01 23:25   ` Rusty Russell
  0 siblings, 1 reply; 3+ messages in thread
From: Ingo Molnar @ 2004-08-30 13:29 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: akpm, lkml, Rusty Russell


* Nathan Lynch <nathanl@austin.ibm.com> wrote:

> To recap, offlining a cpu with current bk results in the "Aiee,
> killing interrupt handler!" panic from do_exit().  This seems to be
> triggered only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled. 
> I believe the problem is that when do_stop() calls schedule(),
> dependent_sleeper() drops the "offline" cpu's rq->lock and never
> reacquires it.
> 
> The following seems to work (tested on ppc64).  Is there a better way?

> -	if (!(sd->flags & SD_SHARE_CPUPOWER))
> +	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))

> +	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))

it would really be nice to do this without any runtime overhead. Rusty?

	Ingo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent
  2004-08-30 13:29 ` Ingo Molnar
@ 2004-09-01 23:25   ` Rusty Russell
  0 siblings, 0 replies; 3+ messages in thread
From: Rusty Russell @ 2004-09-01 23:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Nathan Lynch, Andrew Morton, lkml - Kernel Mailing List

On Mon, 2004-08-30 at 23:29, Ingo Molnar wrote:
> * Nathan Lynch <nathanl@austin.ibm.com> wrote:
> 
> > To recap, offlining a cpu with current bk results in the "Aiee,
> > killing interrupt handler!" panic from do_exit().  This seems to be
> > triggered only with CONFIG_PREEMPT and CONFIG_SCHED_SMT both enabled. 
> > I believe the problem is that when do_stop() calls schedule(),
> > dependent_sleeper() drops the "offline" cpu's rq->lock and never
> > reacquires it.
> > 
> > The following seems to work (tested on ppc64).  Is there a better way?
> 
> > -	if (!(sd->flags & SD_SHARE_CPUPOWER))
> > +	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
> 
> > +	if (!(sd->flags & SD_SHARE_CPUPOWER) || cpu_is_offline(this_cpu))
> 
> it would really be nice to do this without any runtime overhead. Rusty?

If the scheduling topology is updated atomically (ie. inside
__cpu_disable), this would not happen; there are patches around to do
this and I think longer term this is the correct fix.  However, I
believe a good current fix is to merely ensure that the current cpu is
always set in dependent_sleeper(), something like:L

	/*
	 * The same locking rules and details apply as for
	 * wake_sleeping_dependent():
	 */
	spin_unlock(&this_rq->lock);
	cpus_and(sibling_map, sd->span, cpu_online_map);
+	/* Include this CPU, for special case of going us dying */
+	cpu_set(this_cpu, sibling_map);
	for_each_cpu_mask(i, sibling_map)
		spin_lock(&cpu_rq(i)->lock);
	cpu_clear(this_cpu, sibling_map);

Not quite free, but very cheap...
Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-09-01 23:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30  9:41 [PATCH] cpu hotplug fixes for dependent_sleeper and wake_sleeping_dependent Nathan Lynch
2004-08-30 13:29 ` Ingo Molnar
2004-09-01 23:25   ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox