From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.vnet.ibm.com (Paul E. McKenney) Date: Tue, 28 Jun 2011 10:25:13 -0700 Subject: [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP. In-Reply-To: <4E09EE75.9040204@arm.com> References: <4E09EE75.9040204@arm.com> Message-ID: <20110628172513.GB2294@linux.vnet.ibm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote: > Paul, > > I've updated my -next tree (to next-20110628) today, and discovered > that my favorite ARM board wouldn't boot anymore: > > [...] > Hierarchical RCU implementation. > NR_IRQS:128 nr_irqs:128 128 > Console: colour dummy device 80x30 > Calibrating delay loop... 83.35 BogoMIPS (lpj=416768) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 512 > CPU: Testing write buffer coherency: ok > Calibrating local timer... 104.04MHz. > CPU1: Booted secondary processor > CPU1: Unknown IPI message 0x1 > CPU2: Booted secondary processor > CPU2: Unknown IPI message 0x1 > CPU3: Booted secondary processor > CPU3: Unknown IPI message 0x1 > Brought up 4 CPUs > SMP: Total of 4 processors activated (333.92 BogoMIPS). > ------------[ cut here ]------------ > WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0() > NET: Registered protocol family 16 > Modules linked in: > [] (unwind_backtrace+0x0/0xf4) from [] (warn_slowpath_common+0x4c/0x64) > [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24) > [] (warn_slowpath_null+0x1c/0x24) from [] (smp_call_function_single+0xe4/0x1c0) > [] (smp_call_function_single+0xe4/0x1c0) from [] (rcu_start_gp+0x184/0x310) > [] (rcu_start_gp+0x184/0x310) from [] (__rcu_process_callbacks+0x274/0x398) > [] (__rcu_process_callbacks+0x274/0x398) from [] (rcu_process_callbacks+0x34/0x5c) > [] (rcu_process_callbacks+0x34/0x5c) from [] (__do_softirq+0xa4/0x16c) > [] (__do_softirq+0xa4/0x16c) from [] (irq_exit+0x80/0x9c) > [] (irq_exit+0x80/0x9c) from [] (do_local_timer+0x54/0x70) > [] (do_local_timer+0x54/0x70) from [] (__irq_svc+0x38/0xc0) > Exception stack(0xdf467f90 to 0xdf467fd8) > 7f80: df466000 00000000 df467fd8 00000000 > 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000 > 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff > [] (__irq_svc+0x38/0xc0) from [] (default_idle+0x24/0x28) > [] (default_idle+0x24/0x28) from [] (cpu_idle+0x9c/0xdc) > [] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734) > ---[ end trace 1b75b31a2719ed1c ]--- > > ... and here it dies. > > The offending commit is b983032b7 (rcu: Avoid grace-period overflow for > long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU > cross-call with interrupts disabled, which kills the box. Reverting this > patch results in a working system. That does sound problematic... > My RCU-foo being rather low, I haven't dug deeper into this. Please let > me know if you want me to test anything. I will put together a patch to defer the actual cross-call until irqs are enabled. The call would be from softirq -- that is OK, correct? And thank you for testing this! Thanx, Paul