xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: Regression in v3.4-rc0 " BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]..[<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf
       [not found] <1332347541.18960.498.camel@twins>
@ 2012-03-22  3:04 ` Konrad Rzeszutek Wilk
  2012-03-22  3:04   ` [PATCH] xen/smp: Fix bringup bug in AP code Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-22  3:04 UTC (permalink / raw)
  To: peterz, linux-kernel, mingo, rjw, tglx; +Cc: xen-devel

On Wed, Mar 21, 2012 at 05:32:21PM +0100, Peter Zijlstra wrote:
> On Wed, 2012-03-21 at 17:30 +0100, Peter Zijlstra wrote:
> > On Wed, 2012-03-21 at 16:57 +0100, Peter Zijlstra wrote:
> > > On Wed, 2012-03-21 at 11:26 -0400, Konrad Rzeszutek Wilk wrote:
> > > > On Tue, Mar 20, 2012 at 07:53:22PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > Seeing this in v3.4-rc0 tree and didn't see that with v3.3:
> > > > 
> > > > Hey Peter,
> > > > 
> > > > Git bisection points this to the fault of
> > > > 5fbd036b552f633abb394a319f7c62a5c86a9cd7 " sched: Cleanup cpu_active madness"
> > > > 
> > > > thoughts? (also attaching the .config) 
> > > 
> > > Argh.. so when is this? boot? No that's somewhat unexpected. I have one
> > > report of funnies during a hotplug bash that I'm looking into, but I
> > > haven't actually been able to reproduce that report myself either.
> > 
> > is arch/x86/xen/smp.c:cpu_bringup() missing a call to
> > notify_cpu_starting() before doing set_cpu_online()?
> > 
> > Also, shouldn't that also take the ipi_call_lock() around setting the
> > cpu online?
> 
> 
> And before you ask, yes all that should live in generic code... somehow.
> This per-arch replication of the cpu hotplug logic is driving me insane.

Thanks to Peter,  here is the patch that fixes the regression.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] xen/smp: Fix bringup bug in AP code.
  2012-03-22  3:04 ` Regression in v3.4-rc0 " BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]..[<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf Konrad Rzeszutek Wilk
@ 2012-03-22  3:04   ` Konrad Rzeszutek Wilk
  2012-03-22  9:20     ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-22  3:04 UTC (permalink / raw)
  To: peterz, linux-kernel, mingo, rjw, tglx; +Cc: xen-devel, Konrad Rzeszutek Wilk

The CPU hotplug code has now a callback to help bring up the CPU.
Without the call we end up getting:

 BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]
Modules linked in:
CPU ] Pid: 6, comm: migration/0 Not tainted 3.3.0upstream-01180-ged378a5 #1 Dell Inc. PowerEdge T105 /0RR825
RIP: e030:[<ffffffff810d3b8b>]  [<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf0
RSP: e02b:ffff8800ceaabdb0  EFLAGS: 00000293
.. snip..
Call Trace:
 [<ffffffff810d3b10>] ? stop_one_cpu_nowait+0x50/0x50
 [<ffffffff810d3841>] cpu_stopper_thread+0xf1/0x1c0
 [<ffffffff815a9776>] ? __schedule+0x3c6/0x760
 [<ffffffff815aa749>] ? _raw_spin_unlock_irqrestore+0x19/0x30
 [<ffffffff810d3750>] ? res_counter_charge+0x150/0x150
 [<ffffffff8108dc76>] kthread+0x96/0xa0
 [<ffffffff815b27e4>] kernel_thread_helper+0x4/0x10
 [<ffffffff815aacbc>] ? retint_restore_ar

This fixes it.

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/smp.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 315d8fa..02900e8 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -75,8 +75,14 @@ static void __cpuinit cpu_bringup(void)
 
 	xen_setup_cpu_clockevents();
 
+	notify_cpu_starting(cpu);
+
+	ipi_call_lock();
 	set_cpu_online(cpu, true);
+	ipi_call_unlock();
+
 	this_cpu_write(cpu_state, CPU_ONLINE);
+
 	wmb();
 
 	/* We can take interrupts now: we're officially "up". */
-- 
1.7.7.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen/smp: Fix bringup bug in AP code.
  2012-03-22  3:04   ` [PATCH] xen/smp: Fix bringup bug in AP code Konrad Rzeszutek Wilk
@ 2012-03-22  9:20     ` Peter Zijlstra
  0 siblings, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2012-03-22  9:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, mingo, rjw, tglx, xen-devel

On Wed, 2012-03-21 at 23:04 -0400, Konrad Rzeszutek Wilk wrote:
> The CPU hotplug code has now a callback to help bring up the CPU.
> Without the call we end up getting:

Its had this for a while now (since 2008, see e545a614). Its just that
generic infrastructure started using it only now.

>  BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]
> Modules linked in:
> CPU ] Pid: 6, comm: migration/0 Not tainted 3.3.0upstream-01180-ged378a5 #1 Dell Inc. PowerEdge T105 /0RR825
> RIP: e030:[<ffffffff810d3b8b>]  [<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf0
> RSP: e02b:ffff8800ceaabdb0  EFLAGS: 00000293
> .. snip..
> Call Trace:
>  [<ffffffff810d3b10>] ? stop_one_cpu_nowait+0x50/0x50
>  [<ffffffff810d3841>] cpu_stopper_thread+0xf1/0x1c0
>  [<ffffffff815a9776>] ? __schedule+0x3c6/0x760
>  [<ffffffff815aa749>] ? _raw_spin_unlock_irqrestore+0x19/0x30
>  [<ffffffff810d3750>] ? res_counter_charge+0x150/0x150
>  [<ffffffff8108dc76>] kthread+0x96/0xa0
>  [<ffffffff815b27e4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff815aacbc>] ? retint_restore_ar
> 
> This fixes it.
> 
> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-03-22  9:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1332347541.18960.498.camel@twins>
2012-03-22  3:04 ` Regression in v3.4-rc0 " BUG: soft lockup - CPU#0 stuck for 29s! [migration/0:6]..[<ffffffff810d3b8b>] stop_machine_cpu_stop+0x7b/0xf Konrad Rzeszutek Wilk
2012-03-22  3:04   ` [PATCH] xen/smp: Fix bringup bug in AP code Konrad Rzeszutek Wilk
2012-03-22  9:20     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).