From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758906AbYJILpI (ORCPT ); Thu, 9 Oct 2008 07:45:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756615AbYJILox (ORCPT ); Thu, 9 Oct 2008 07:44:53 -0400 Received: from e4.ny.us.ibm.com ([32.97.182.144]:40211 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758716AbYJILow (ORCPT ); Thu, 9 Oct 2008 07:44:52 -0400 Date: Thu, 9 Oct 2008 04:44:49 -0700 From: "Paul E. McKenney" To: Andi Kleen Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, rjw@sisk.pl, dipankar@in.ibm.com, tglx@linutronix.de Subject: Re: RCU hang on cpu re-hotplug with 2.6.27rc8 Message-ID: <20081009114449.GA6628@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20081006232837.GA1157@basil.nowhere.org> <20081007030822.GC6820@linux.vnet.ibm.com> <20081007071544.GC20740@one.firstfloor.org> <20081007152629.GH6384@linux.vnet.ibm.com> <20081007154939.GN20740@one.firstfloor.org> <20081007163401.GJ6384@linux.vnet.ibm.com> <20081007210947.GP20740@one.firstfloor.org> <20081007212215.GN6384@linux.vnet.ibm.com> <20081009013321.GA11291@linux.vnet.ibm.com> <20081009045646.GB24560@one.firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081009045646.GB24560@one.firstfloor.org> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 09, 2008 at 06:56:46AM +0200, Andi Kleen wrote: > [fix up Thomas' address to not bounce] > > On Wed, Oct 08, 2008 at 06:33:21PM -0700, Paul E. McKenney wrote: > > The attached patch (similar to one in -tip, but set up for mainline and > > tweaked to make stall-checking on by default) should get you a stack > > trace of any CPUs holding up RCU grace periods for more than about > > three seconds. > > > > On the off-chance that this helps. > > It actually does. The stall detector makes the online echo return > after three seconds, although it's not 100% clear to me why. Interesting. This behavior would be consistent with the CPU entering dyntick-idle mode without RCU's being aware of this. Except that your earlier .config file says "# CONFIG_NO_HZ is not set". And that would mean that the CPU really should be invoking RCU's state machine every scheduling tick. I confess confusion. Thanx, Paul > here's the backtrace > > RCU detected CPU 14 stall (t=4295149800/5928 jiffies) > Pid: 0, comm: swapper Not tainted 2.6.27-rc9 #5 > > Call Trace: > [] __rcu_pending+0x6e/0x1d9 > [] rcu_pending+0x36/0x6e > [] update_process_times+0x37/0x5b > [] tick_periodic+0x68/0x74 > [] tick_handle_periodic+0x21/0x66 > [] smp_apic_timer_interrupt+0x8a/0xa8 > [] apic_timer_interrupt+0x66/0x70 > [] ? acpi_safe_halt+0x2b/0x3e > [] ? acpi_idle_enter_c1+0xae/0x102 > [] ? cpuidle_idle_call+0x70/0xa2 > [] ? cpu_idle+0x7e/0x9c > [] ? start_secondary+0x157/0x15c > > Timer issue? > > > -Andi