From mboxrd@z Thu Jan 1 00:00:00 1970 From: marc.zyngier@arm.com (Marc Zyngier) Date: Wed, 29 Jun 2011 09:58:50 +0100 Subject: [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP. In-Reply-To: <20110628172513.GB2294@linux.vnet.ibm.com> References: <4E09EE75.9040204@arm.com> <20110628172513.GB2294@linux.vnet.ibm.com> Message-ID: <4E0AE94A.7000306@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 28/06/11 18:25, Paul E. McKenney wrote: > On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote: >> Paul, >> >> I've updated my -next tree (to next-20110628) today, and discovered >> that my favorite ARM board wouldn't boot anymore: >> >> [...] >> Hierarchical RCU implementation. >> NR_IRQS:128 nr_irqs:128 128 >> Console: colour dummy device 80x30 >> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768) >> pid_max: default: 32768 minimum: 301 >> Mount-cache hash table entries: 512 >> CPU: Testing write buffer coherency: ok >> Calibrating local timer... 104.04MHz. >> CPU1: Booted secondary processor >> CPU1: Unknown IPI message 0x1 >> CPU2: Booted secondary processor >> CPU2: Unknown IPI message 0x1 >> CPU3: Booted secondary processor >> CPU3: Unknown IPI message 0x1 >> Brought up 4 CPUs >> SMP: Total of 4 processors activated (333.92 BogoMIPS). >> ------------[ cut here ]------------ >> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0() >> NET: Registered protocol family 16 >> Modules linked in: >> [] (unwind_backtrace+0x0/0xf4) from [] (warn_slowpath_common+0x4c/0x64) >> [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24) >> [] (warn_slowpath_null+0x1c/0x24) from [] (smp_call_function_single+0xe4/0x1c0) >> [] (smp_call_function_single+0xe4/0x1c0) from [] (rcu_start_gp+0x184/0x310) >> [] (rcu_start_gp+0x184/0x310) from [] (__rcu_process_callbacks+0x274/0x398) >> [] (__rcu_process_callbacks+0x274/0x398) from [] (rcu_process_callbacks+0x34/0x5c) >> [] (rcu_process_callbacks+0x34/0x5c) from [] (__do_softirq+0xa4/0x16c) >> [] (__do_softirq+0xa4/0x16c) from [] (irq_exit+0x80/0x9c) >> [] (irq_exit+0x80/0x9c) from [] (do_local_timer+0x54/0x70) >> [] (do_local_timer+0x54/0x70) from [] (__irq_svc+0x38/0xc0) >> Exception stack(0xdf467f90 to 0xdf467fd8) >> 7f80: df466000 00000000 df467fd8 00000000 >> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000 >> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff >> [] (__irq_svc+0x38/0xc0) from [] (default_idle+0x24/0x28) >> [] (default_idle+0x24/0x28) from [] (cpu_idle+0x9c/0xdc) >> [] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734) >> ---[ end trace 1b75b31a2719ed1c ]--- >> >> ... and here it dies. >> >> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for >> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU >> cross-call with interrupts disabled, which kills the box. Reverting this >> patch results in a working system. > > That does sound problematic... > >> My RCU-foo being rather low, I haven't dug deeper into this. Please let >> me know if you want me to test anything. > > I will put together a patch to defer the actual cross-call until irqs > are enabled. The call would be from softirq -- that is OK, correct? That should indeed fix the problem, as interrupts are normally enabled in softirq. Cheers, M. -- Jazz is not dead. It just smells funny... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754119Ab1F2I66 (ORCPT ); Wed, 29 Jun 2011 04:58:58 -0400 Received: from service87.mimecast.com ([94.185.240.25]:37300 "HELO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753750Ab1F2I6x convert rfc822-to-8bit (ORCPT ); Wed, 29 Jun 2011 04:58:53 -0400 Message-ID: <4E0AE94A.7000306@arm.com> Date: Wed, 29 Jun 2011 09:58:50 +0100 From: Marc Zyngier User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: "paulmck@linux.vnet.ibm.com" CC: "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: [next][bug] rcu_dyntick_kick_cpu() kills ARM SMP. References: <4E09EE75.9040204@arm.com> <20110628172513.GB2294@linux.vnet.ibm.com> In-Reply-To: <20110628172513.GB2294@linux.vnet.ibm.com> X-Enigmail-Version: 1.1.2 X-OriginalArrivalTime: 29 Jun 2011 08:58:49.0831 (UTC) FILETIME=[C33AB770:01CC363A] X-MC-Unique: 111062909585101001 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28/06/11 18:25, Paul E. McKenney wrote: > On Tue, Jun 28, 2011 at 04:08:37PM +0100, Marc Zyngier wrote: >> Paul, >> >> I've updated my -next tree (to next-20110628) today, and discovered >> that my favorite ARM board wouldn't boot anymore: >> >> [...] >> Hierarchical RCU implementation. >> NR_IRQS:128 nr_irqs:128 128 >> Console: colour dummy device 80x30 >> Calibrating delay loop... 83.35 BogoMIPS (lpj=416768) >> pid_max: default: 32768 minimum: 301 >> Mount-cache hash table entries: 512 >> CPU: Testing write buffer coherency: ok >> Calibrating local timer... 104.04MHz. >> CPU1: Booted secondary processor >> CPU1: Unknown IPI message 0x1 >> CPU2: Booted secondary processor >> CPU2: Unknown IPI message 0x1 >> CPU3: Booted secondary processor >> CPU3: Unknown IPI message 0x1 >> Brought up 4 CPUs >> SMP: Total of 4 processors activated (333.92 BogoMIPS). >> ------------[ cut here ]------------ >> WARNING: at kernel/smp.c:320 smp_call_function_single+0xe4/0x1c0() >> NET: Registered protocol family 16 >> Modules linked in: >> [] (unwind_backtrace+0x0/0xf4) from [] (warn_slowpath_common+0x4c/0x64) >> [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_null+0x1c/0x24) >> [] (warn_slowpath_null+0x1c/0x24) from [] (smp_call_function_single+0xe4/0x1c0) >> [] (smp_call_function_single+0xe4/0x1c0) from [] (rcu_start_gp+0x184/0x310) >> [] (rcu_start_gp+0x184/0x310) from [] (__rcu_process_callbacks+0x274/0x398) >> [] (__rcu_process_callbacks+0x274/0x398) from [] (rcu_process_callbacks+0x34/0x5c) >> [] (rcu_process_callbacks+0x34/0x5c) from [] (__do_softirq+0xa4/0x16c) >> [] (__do_softirq+0xa4/0x16c) from [] (irq_exit+0x80/0x9c) >> [] (irq_exit+0x80/0x9c) from [] (do_local_timer+0x54/0x70) >> [] (do_local_timer+0x54/0x70) from [] (__irq_svc+0x38/0xc0) >> Exception stack(0xdf467f90 to 0xdf467fd8) >> 7f80: df466000 00000000 df467fd8 00000000 >> 7fa0: df466000 c045dd24 c034f6cc 00000000 c0445514 410fb020 70409ddc 00000000 >> 7fc0: 00000000 df467fd8 c003c4ac c003c4b0 60000013 ffffffff >> [] (__irq_svc+0x38/0xc0) from [] (default_idle+0x24/0x28) >> [] (default_idle+0x24/0x28) from [] (cpu_idle+0x9c/0xdc) >> [] (cpu_idle+0x9c/0xdc) from [<70348734>] (0x70348734) >> ---[ end trace 1b75b31a2719ed1c ]--- >> >> ... and here it dies. >> >> The offending commit is b983032b7 (rcu: Avoid grace-period overflow for >> long dyntick-idle periods). rcu_dyntick_kick_cpu() tries to do a CPU >> cross-call with interrupts disabled, which kills the box. Reverting this >> patch results in a working system. > > That does sound problematic... > >> My RCU-foo being rather low, I haven't dug deeper into this. Please let >> me know if you want me to test anything. > > I will put together a patch to defer the actual cross-call until irqs > are enabled. The call would be from softirq -- that is OK, correct? That should indeed fix the problem, as interrupts are normally enabled in softirq. Cheers, M. -- Jazz is not dead. It just smells funny...