linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP.
@ 2011-06-28 15:08 Marc Zyngier
  2011-06-28 17:25 ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2011-06-28 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

Paul,

I've updated my -next tree (to next-20110628) today, and discovered 
that my favorite ARM board wouldn't boot anymore:

[...]
Hierarchical RCU implementation.
NR_IRQS:128 nr_irqs:128 128
Console: colour dummy device 80x30
Calibrating delay loop... 83.35 BogoMIPS (lpj=416768)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Calibrating local timer... 104.04MHz.
CPU1: Booted secondary processor
CPU1: Unknown IPI message 0x1
CPU2: Booted secondary processor
CPU2: Unknown IPI message 0x1
CPU3: Booted secondary processor
CPU3: Unknown IPI message 0x1
Brought up 4 CPUs
SMP: Total of 4 processors activated (333.92 BogoMIPS).
------------[ cut here ]------------
WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0()
NET: Registered protocol family 16
Modules linked in:
[<c00415d4>] (unwind_backtrace+0x0/0xf4) from [<c0056184>] (warn_slowpath_common+0x4c/0x64)
[<c0056184>] (warn_slowpath_common+0x4c/0x64) from [<c00561b8>] (warn_slowpath_null+0x1c/0x24)
[<c00561b8>] (warn_slowpath_null+0x1c/0x24) from [<c0088218>] (smp_call_function_single+0xe4/0x1c0)
[<c0088218>] (smp_call_function_single+0xe4/0x1c0) from [<c0094804>] (rcu_start_gp+0x184/0x310)
[<c0094804>] (rcu_start_gp+0x184/0x310) from [<c00955b0>] (__rcu_process_callbacks+0x274/0x398)
[<c00955b0>] (__rcu_process_callbacks+0x274/0x398) from [<c0095708>] (rcu_process_callbacks+0x34/0x5c)
[<c0095708>] (rcu_process_callbacks+0x34/0x5c) from [<c005c964>] (__do_softirq+0xa4/0x16c)
[<c005c964>] (__do_softirq+0xa4/0x16c) from [<c005cc0c>] (irq_exit+0x80/0x9c)
[<c005cc0c>] (irq_exit+0x80/0x9c) from [<c00353cc>] (do_local_timer+0x54/0x70)
[<c00353cc>] (do_local_timer+0x54/0x70) from [<c003b618>] (__irq_svc+0x38/0xc0)
Exception stack(0xdf467f90 to 0xdf467fd8)
7f80:                                     df466000 00000000 df467fd8 00000000
7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000
7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff
[<c003b618>] (__irq_svc+0x38/0xc0) from [<c003c4b0>] (default_idle+0x24/0x28)
[<c003c4b0>] (default_idle+0x24/0x28) from [<c003ccd0>] (cpu_idle+0x9c/0xdc)
[<c003ccd0>] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734)
---[ end trace 1b75b31a2719ed1c ]---

... and here it dies.

The offending commit is b983032b7 (rcu: Avoid grace-period overflow for 
long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU 
cross-call with interrupts disabled, which kills the box. Reverting this
patch results in a working system.

My RCU-foo being rather low, I haven't dug deeper into this. Please let
me know if you want me to test anything.

Cheers,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP.
  2011-06-28 15:08 [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP Marc Zyngier
@ 2011-06-28 17:25 ` Paul E. McKenney
  2011-06-29  8:58   ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2011-06-28 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote:
> Paul,
> 
> I've updated my -next tree (to next-20110628) today, and discovered 
> that my favorite ARM board wouldn't boot anymore:
> 
> [...]
> Hierarchical RCU implementation.
> NR_IRQS:128 nr_irqs:128 128
> Console: colour dummy device 80x30
> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 512
> CPU: Testing write buffer coherency: ok
> Calibrating local timer... 104.04MHz.
> CPU1: Booted secondary processor
> CPU1: Unknown IPI message 0x1
> CPU2: Booted secondary processor
> CPU2: Unknown IPI message 0x1
> CPU3: Booted secondary processor
> CPU3: Unknown IPI message 0x1
> Brought up 4 CPUs
> SMP: Total of 4 processors activated (333.92 BogoMIPS).
> ------------[ cut here ]------------
> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0()
> NET: Registered protocol family 16
> Modules linked in:
> [<c00415d4>] (unwind_backtrace+0x0/0xf4) from [<c0056184>] (warn_slowpath_common+0x4c/0x64)
> [<c0056184>] (warn_slowpath_common+0x4c/0x64) from [<c00561b8>] (warn_slowpath_null+0x1c/0x24)
> [<c00561b8>] (warn_slowpath_null+0x1c/0x24) from [<c0088218>] (smp_call_function_single+0xe4/0x1c0)
> [<c0088218>] (smp_call_function_single+0xe4/0x1c0) from [<c0094804>] (rcu_start_gp+0x184/0x310)
> [<c0094804>] (rcu_start_gp+0x184/0x310) from [<c00955b0>] (__rcu_process_callbacks+0x274/0x398)
> [<c00955b0>] (__rcu_process_callbacks+0x274/0x398) from [<c0095708>] (rcu_process_callbacks+0x34/0x5c)
> [<c0095708>] (rcu_process_callbacks+0x34/0x5c) from [<c005c964>] (__do_softirq+0xa4/0x16c)
> [<c005c964>] (__do_softirq+0xa4/0x16c) from [<c005cc0c>] (irq_exit+0x80/0x9c)
> [<c005cc0c>] (irq_exit+0x80/0x9c) from [<c00353cc>] (do_local_timer+0x54/0x70)
> [<c00353cc>] (do_local_timer+0x54/0x70) from [<c003b618>] (__irq_svc+0x38/0xc0)
> Exception stack(0xdf467f90 to 0xdf467fd8)
> 7f80:                                     df466000 00000000 df467fd8 00000000
> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000
> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff
> [<c003b618>] (__irq_svc+0x38/0xc0) from [<c003c4b0>] (default_idle+0x24/0x28)
> [<c003c4b0>] (default_idle+0x24/0x28) from [<c003ccd0>] (cpu_idle+0x9c/0xdc)
> [<c003ccd0>] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734)
> ---[ end trace 1b75b31a2719ed1c ]---
> 
> ... and here it dies.
> 
> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for 
> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU 
> cross-call with interrupts disabled, which kills the box. Reverting this
> patch results in a working system.

That does sound problematic...

> My RCU-foo being rather low, I haven't dug deeper into this. Please let
> me know if you want me to test anything.

I will put together a patch to defer the actual cross-call until irqs
are enabled.  The call would be from softirq -- that is OK, correct?

And thank you for testing this!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP.
  2011-06-28 17:25 ` Paul E. McKenney
@ 2011-06-29  8:58   ` Marc Zyngier
  2011-07-08  5:29     ` Paul E. McKenney
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2011-06-29  8:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 28/06/11 18:25, Paul E. McKenney wrote:
> On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote:
>> Paul,
>>
>> I've updated my -next tree (to next-20110628) today, and discovered 
>> that my favorite ARM board wouldn't boot anymore:
>>
>> [...]
>> Hierarchical RCU implementation.
>> NR_IRQS:128 nr_irqs:128 128
>> Console: colour dummy device 80x30
>> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768)
>> pid_max: default: 32768 minimum: 301
>> Mount-cache hash table entries: 512
>> CPU: Testing write buffer coherency: ok
>> Calibrating local timer... 104.04MHz.
>> CPU1: Booted secondary processor
>> CPU1: Unknown IPI message 0x1
>> CPU2: Booted secondary processor
>> CPU2: Unknown IPI message 0x1
>> CPU3: Booted secondary processor
>> CPU3: Unknown IPI message 0x1
>> Brought up 4 CPUs
>> SMP: Total of 4 processors activated (333.92 BogoMIPS).
>> ------------[ cut here ]------------
>> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0()
>> NET: Registered protocol family 16
>> Modules linked in:
>> [<c00415d4>] (unwind_backtrace+0x0/0xf4) from [<c0056184>] (warn_slowpath_common+0x4c/0x64)
>> [<c0056184>] (warn_slowpath_common+0x4c/0x64) from [<c00561b8>] (warn_slowpath_null+0x1c/0x24)
>> [<c00561b8>] (warn_slowpath_null+0x1c/0x24) from [<c0088218>] (smp_call_function_single+0xe4/0x1c0)
>> [<c0088218>] (smp_call_function_single+0xe4/0x1c0) from [<c0094804>] (rcu_start_gp+0x184/0x310)
>> [<c0094804>] (rcu_start_gp+0x184/0x310) from [<c00955b0>] (__rcu_process_callbacks+0x274/0x398)
>> [<c00955b0>] (__rcu_process_callbacks+0x274/0x398) from [<c0095708>] (rcu_process_callbacks+0x34/0x5c)
>> [<c0095708>] (rcu_process_callbacks+0x34/0x5c) from [<c005c964>] (__do_softirq+0xa4/0x16c)
>> [<c005c964>] (__do_softirq+0xa4/0x16c) from [<c005cc0c>] (irq_exit+0x80/0x9c)
>> [<c005cc0c>] (irq_exit+0x80/0x9c) from [<c00353cc>] (do_local_timer+0x54/0x70)
>> [<c00353cc>] (do_local_timer+0x54/0x70) from [<c003b618>] (__irq_svc+0x38/0xc0)
>> Exception stack(0xdf467f90 to 0xdf467fd8)
>> 7f80:                                     df466000 00000000 df467fd8 00000000
>> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000
>> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff
>> [<c003b618>] (__irq_svc+0x38/0xc0) from [<c003c4b0>] (default_idle+0x24/0x28)
>> [<c003c4b0>] (default_idle+0x24/0x28) from [<c003ccd0>] (cpu_idle+0x9c/0xdc)
>> [<c003ccd0>] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734)
>> ---[ end trace 1b75b31a2719ed1c ]---
>>
>> ... and here it dies.
>>
>> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for 
>> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU 
>> cross-call with interrupts disabled, which kills the box. Reverting this
>> patch results in a working system.
> 
> That does sound problematic...
> 
>> My RCU-foo being rather low, I haven't dug deeper into this. Please let
>> me know if you want me to test anything.
> 
> I will put together a patch to defer the actual cross-call until irqs
> are enabled.  The call would be from softirq -- that is OK, correct?

That should indeed fix the problem, as interrupts are normally enabled
in softirq.

Cheers,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP.
  2011-06-29  8:58   ` Marc Zyngier
@ 2011-07-08  5:29     ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2011-07-08  5:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2011 at 09:58:50AM +0100, Marc Zyngier wrote:
> On 28/06/11 18:25, Paul E. McKenney wrote:
> > On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote:
> >> Paul,
> >>
> >> I've updated my -next tree (to next-20110628) today, and discovered 
> >> that my favorite ARM board wouldn't boot anymore:
> >>
> >> [...]
> >> Hierarchical RCU implementation.
> >> NR_IRQS:128 nr_irqs:128 128
> >> Console: colour dummy device 80x30
> >> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768)
> >> pid_max: default: 32768 minimum: 301
> >> Mount-cache hash table entries: 512
> >> CPU: Testing write buffer coherency: ok
> >> Calibrating local timer... 104.04MHz.
> >> CPU1: Booted secondary processor
> >> CPU1: Unknown IPI message 0x1
> >> CPU2: Booted secondary processor
> >> CPU2: Unknown IPI message 0x1
> >> CPU3: Booted secondary processor
> >> CPU3: Unknown IPI message 0x1
> >> Brought up 4 CPUs
> >> SMP: Total of 4 processors activated (333.92 BogoMIPS).
> >> ------------[ cut here ]------------
> >> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0()
> >> NET: Registered protocol family 16
> >> Modules linked in:
> >> [<c00415d4>] (unwind_backtrace+0x0/0xf4) from [<c0056184>] (warn_slowpath_common+0x4c/0x64)
> >> [<c0056184>] (warn_slowpath_common+0x4c/0x64) from [<c00561b8>] (warn_slowpath_null+0x1c/0x24)
> >> [<c00561b8>] (warn_slowpath_null+0x1c/0x24) from [<c0088218>] (smp_call_function_single+0xe4/0x1c0)
> >> [<c0088218>] (smp_call_function_single+0xe4/0x1c0) from [<c0094804>] (rcu_start_gp+0x184/0x310)
> >> [<c0094804>] (rcu_start_gp+0x184/0x310) from [<c00955b0>] (__rcu_process_callbacks+0x274/0x398)
> >> [<c00955b0>] (__rcu_process_callbacks+0x274/0x398) from [<c0095708>] (rcu_process_callbacks+0x34/0x5c)
> >> [<c0095708>] (rcu_process_callbacks+0x34/0x5c) from [<c005c964>] (__do_softirq+0xa4/0x16c)
> >> [<c005c964>] (__do_softirq+0xa4/0x16c) from [<c005cc0c>] (irq_exit+0x80/0x9c)
> >> [<c005cc0c>] (irq_exit+0x80/0x9c) from [<c00353cc>] (do_local_timer+0x54/0x70)
> >> [<c00353cc>] (do_local_timer+0x54/0x70) from [<c003b618>] (__irq_svc+0x38/0xc0)
> >> Exception stack(0xdf467f90 to 0xdf467fd8)
> >> 7f80:                                     df466000 00000000 df467fd8 00000000
> >> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000
> >> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff
> >> [<c003b618>] (__irq_svc+0x38/0xc0) from [<c003c4b0>] (default_idle+0x24/0x28)
> >> [<c003c4b0>] (default_idle+0x24/0x28) from [<c003ccd0>] (cpu_idle+0x9c/0xdc)
> >> [<c003ccd0>] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734)
> >> ---[ end trace 1b75b31a2719ed1c ]---
> >>
> >> ... and here it dies.
> >>
> >> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for 
> >> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU 
> >> cross-call with interrupts disabled, which kills the box. Reverting this
> >> patch results in a working system.
> > 
> > That does sound problematic...
> > 
> >> My RCU-foo being rather low, I haven't dug deeper into this. Please let
> >> me know if you want me to test anything.
> > 
> > I will put together a patch to defer the actual cross-call until irqs
> > are enabled.  The call would be from softirq -- that is OK, correct?
> 
> That should indeed fix the problem, as interrupts are normally enabled
> in softirq.

Hello, Marc,

I played with this some, but eventually chose to defer this commit to
3.1.  Thank you for testing -- and it will be back!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-07-08  5:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-28 15:08 [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP Marc Zyngier
2011-06-28 17:25 ` Paul E. McKenney
2011-06-29  8:58   ` Marc Zyngier
2011-07-08  5:29     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).