From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755304AbZCUVCP (ORCPT ); Sat, 21 Mar 2009 17:02:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753750AbZCUVB6 (ORCPT ); Sat, 21 Mar 2009 17:01:58 -0400 Received: from e1.ny.us.ibm.com ([32.97.182.141]:56878 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753284AbZCUVB6 (ORCPT ); Sat, 21 Mar 2009 17:01:58 -0400 Date: Sat, 21 Mar 2009 14:01:54 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Steven Rostedt , Frederic Weisbecker , LKML , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH 0/5] [GIT PULL] updates for tip/tracing/ftrace Message-ID: <20090321210154.GD7148@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20090320192721.GI6224@elte.hu> <20090320194617.GA5934@nowhere> <20090320195414.GA24129@elte.hu> <20090320204848.GA6044@nowhere> <20090321100129.GC7201@elte.hu> <20090321165804.GA21366@elte.hu> <20090321190746.GC7148@linux.vnet.ibm.com> <20090321200919.GA23992@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090321200919.GA23992@elte.hu> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Mar 21, 2009 at 09:09:19PM +0100, Ingo Molnar wrote: > * Paul E. McKenney wrote: > > On Sat, Mar 21, 2009 at 01:25:23PM -0400, Steven Rostedt wrote: > > > On Sat, 21 Mar 2009, Ingo Molnar wrote: > > > > * Ingo Molnar wrote: [ . . . ] > > > > CONFIG_CLASSIC_RCU=y > > > > > > All the crashes you reported only happen with classic RCU. > > > > > > Paul, > > > > > > Did anything change recently that could cause this lockup? > > > > Arjan van de Ven is seeing a problem where a single > > synchronize_rcu() during bootup is taking a full second, which is > > currently thought to be due to some drivers spinning in the kernel > > (Arjan is working on a bootgraph that will hopefully pinpoint the > > problem: http://lkml.org/lkml/2009/3/21/7). If the drivers were > > also instrumented with ftrace, they might (or might not)slow down > > even further, depending on exactly why they are spinning. > > for one of the hung boxes in the past i waited 24 hours but it never > unwedged itself. The box that hung today is still hanging and the > RCU stall detector is still busy printing out those backtraces. And on the last trace you emailed, the first and the last stall warning are identical according to "diff". In fact, they are all identical. That is a bit unusual, one would normally expect to see slight differences in the stack based on the scheduling clock interrupt hitting the "longer than average loop" in different places each time. That would indicate either a very tight loop or a loop that has interrupts enabled only in one spot. Thanx, Paul