From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.vnet.ibm.com (Paul E. McKenney) Date: Thu, 7 Jul 2011 22:29:54 -0700 Subject: [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP. In-Reply-To: <4E0AE94A.7000306@arm.com> References: <4E09EE75.9040204@arm.com> <20110628172513.GB2294@linux.vnet.ibm.com> <4E0AE94A.7000306@arm.com> Message-ID: <20110708052954.GL6014@linux.vnet.ibm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Jun 29, 2011 at 09:58:50AM +0100, Marc Zyngier wrote: > On 28/06/11 18:25, Paul E. McKenney wrote: > > On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote: > >> Paul, > >> > >> I've updated my -next tree (to next-20110628) today, and discovered > >> that my favorite ARM board wouldn't boot anymore: > >> > >> [...] > >> Hierarchical RCU implementation. > >> NR_IRQS:128 nr_irqs:128 128 > >> Console: colour dummy device 80x30 > >> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768) > >> pid_max: default: 32768 minimum: 301 > >> Mount-cache hash table entries: 512 > >> CPU: Testing write buffer coherency: ok > >> Calibrating local timer... 104.04MHz. > >> CPU1: Booted secondary processor > >> CPU1: Unknown IPI message 0x1 > >> CPU2: Booted secondary processor > >> CPU2: Unknown IPI message 0x1 > >> CPU3: Booted secondary processor > >> CPU3: Unknown IPI message 0x1 > >> Brought up 4 CPUs > >> SMP: Total of 4 processors activated (333.92 BogoMIPS). > >> ------------[ cut here ]------------ > >> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0() > >> NET: Registered protocol family 16 > >> Modules linked in: > >> [] (unwind_backtrace+0x0/0xf4) from [] (warn_slowpath_common+0x4c/0x64) > >> [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24) > >> [] (warn_slowpath_null+0x1c/0x24) from [] (smp_call_function_single+0xe4/0x1c0) > >> [] (smp_call_function_single+0xe4/0x1c0) from [] (rcu_start_gp+0x184/0x310) > >> [] (rcu_start_gp+0x184/0x310) from [] (__rcu_process_callbacks+0x274/0x398) > >> [] (__rcu_process_callbacks+0x274/0x398) from [] (rcu_process_callbacks+0x34/0x5c) > >> [] (rcu_process_callbacks+0x34/0x5c) from [] (__do_softirq+0xa4/0x16c) > >> [] (__do_softirq+0xa4/0x16c) from [] (irq_exit+0x80/0x9c) > >> [] (irq_exit+0x80/0x9c) from [] (do_local_timer+0x54/0x70) > >> [] (do_local_timer+0x54/0x70) from [] (__irq_svc+0x38/0xc0) > >> Exception stack(0xdf467f90 to 0xdf467fd8) > >> 7f80: df466000 00000000 df467fd8 00000000 > >> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000 > >> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff > >> [] (__irq_svc+0x38/0xc0) from [] (default_idle+0x24/0x28) > >> [] (default_idle+0x24/0x28) from [] (cpu_idle+0x9c/0xdc) > >> [] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734) > >> ---[ end trace 1b75b31a2719ed1c ]--- > >> > >> ... and here it dies. > >> > >> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for > >> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU > >> cross-call with interrupts disabled, which kills the box. Reverting this > >> patch results in a working system. > > > > That does sound problematic... > > > >> My RCU-foo being rather low, I haven't dug deeper into this. Please let > >> me know if you want me to test anything. > > > > I will put together a patch to defer the actual cross-call until irqs > > are enabled. The call would be from softirq -- that is OK, correct? > > That should indeed fix the problem, as interrupts are normally enabled > in softirq. Hello, Marc, I played with this some, but eventually chose to defer this commit to 3.1. Thank you for testing -- and it will be back! Thanx, Paul