Re: 2.6.30-rc kills my box hard - and lockdep chains

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Frederic Weisbecker <fweisbec@gmail.com>
To: Jonathan Corbet <corbet@lwn.net>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: 2.6.30-rc kills my box hard - and lockdep chains
Date: Sun, 17 May 2009 03:23:01 +0200	[thread overview]
Message-ID: <20090517012300.GA5512@nowhere> (raw)
In-Reply-To: <20090517003837.GA4640@nowhere>

On Sun, May 17, 2009 at 02:38:39AM +0200, Frederic Weisbecker wrote:
> On Sat, May 16, 2009 at 04:14:19PM -0700, Andrew Morton wrote:
> > On Thu, 14 May 2009 09:49:51 -0600 Jonathan Corbet <corbet@lwn.net> wrote:
> > 
> > > So...every now and then I return to my system (a dual-core 64-bit
> > > x86 box) only to find it totally dead.  Lights are on but there's no
> > > disk activity, no ping responses, no alternative to simply pulling the
> > > plug.  It happens fairly reliably about once a day with the 2.6.30-rc
> > > kernels; it does not happen with 2.6.29.
> > > 
> > > I'm at a bit of a loss for how to try to track this one down.  "System
> > > disappears without a trace" isn't much to go on.  I can't reproduce it
> > > at will; even the "maintain an unsaved editor buffer with hours' worth
> > > of work" trick doesn't seem to work this time.  
> > > 
> > > One clue might be found here, perhaps: I didn't have lockdep enabled but I do
> > > now.
> > 
> > So the lockup isn't due to lockdep.
> > 
> > Did you try all the usual sysrq-P, nmi-watchdog stuff?
> > 
> > Is netconsole enabled, to see if it squawked as it died?
> > 
> > > May 14 01:06:55 bike kernel: [38730.804833] BUG: MAX_LOCKDEP_CHAINS too low!
> > > May 14 01:06:55 bike kernel: [38730.804838] turning off the locking correctness validator.
> > > May 14 01:06:55 bike kernel: [38730.804843] Pid: 5321, comm: tar Tainted: G        W  2.6.30-rc5 #11
> > > May 14 01:06:55 bike kernel: [38730.804846] Call Trace:
> > > May 14 01:06:55 bike kernel: [38730.804854]  [<ffffffff8025df59>] __lock_acquire+0x57f/0xbc9
> > > May 14 01:06:55 bike kernel: [38730.804860]  [<ffffffff8020f3a9>] ? print_context_stack+0xfa/0x119
> > > May 14 01:06:55 bike kernel: [38730.804866]  [<ffffffff80394da9>] ? get_hash_bucket+0x28/0x34
> > >
> > > ...
> > >
> > > May 14 01:06:55 bike kernel: [38730.805340]  [<ffffffff802c2741>] ? filldir+0x0/0xc4
> > > May 14 01:06:55 bike kernel: [38730.805344]  [<ffffffff802c293d>] vfs_readdir+0x79/0xb6
> > > May 14 01:06:55 bike kernel: [38730.805348]  [<ffffffff802c2ac3>] sys_getdents+0x81/0xd1
> > > May 14 01:06:55 bike kernel: [38730.805353]  [<ffffffff8020bcdb>] system_call_fastpath+0x16/0x1b
> > > 
> > > That's quite the call stack...  and, evidently, a lot of lock chains...  
> > 
> > It is a deep stack trace.
> > 
> > And unfortunately
> > 
> > a) that diagnostic didn't print the stack pointer value, from which
> >    we can often work out if we're looking at a stack overflow.
> > 
> > b) I regularly think it would be useful if that stack backtrace were
> >    to print out the actual stack address, so we could see how much
> >    stack each function is using.
> > 
> >    I just went in to hack these things up, but the x86 stacktrace
> >    code which I used to understand has become stupidly complex so I
> >    gave up.
> > 
> > What tools do we have to diagnose a possible kernel stack overflow? 
> > There's CONFIG_DEBUG_STACK_USAGE but that's unlikely to be much use.
> 
> 
> I think about CONFIG_STACK_TRACER. Currently this tracer
> dumps the max stack footprint backtrace through a file in debugfs.
> Then it's not that much useful to debug a stack overflow.
> 
> I'm trying to hack around a printk dump for each max stack footprint
> encountered. Hopefully it could help to debug this.
> 
> Frederic.
> 

Jonathan, could you try the following patch?
It will dump a stack trace each time it becomes the new max one.
If it's about a stack overflow it can be helpful to track the last
max stack usage before the crash.

You'll need CONFIG_STACK_TRACER, the "stacktrace" boot parameter
and some luck...

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index c750f65..fbbe312 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -67,6 +67,9 @@ static inline void check_stack(void)
 
 	save_stack_trace(&max_stack_trace);
 
+	printk("New max stack usage:\n");
+	print_stack_trace(&max_stack_trace, 1);
+
 	/*
 	 * Now find where in the stack these are.
 	 */

next prev parent reply	other threads:[~2009-05-17  1:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-14 15:49 2.6.30-rc kills my box hard - and lockdep chains Jonathan Corbet
2009-05-15  5:32 ` Peter Zijlstra
2009-05-16 23:03   ` Andrew Morton
2009-05-16 23:13     ` Jonathan Corbet
2009-05-16 23:14 ` Andrew Morton
2009-05-17  0:38   ` Frederic Weisbecker
2009-05-17  1:23     ` Frederic Weisbecker [this message]
2009-05-17  1:48   ` Ming Lei
2009-05-18 19:02   ` Jonathan Corbet

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c750f65 dfblob:fbbe312 )
 OR (
bs:"Re: 2.6.30-rc kills my box hard - and lockdep chains" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090517012300.GA5512@nowhere \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox