From: Frederic Weisbecker <fweisbec@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.30-rc kills my box hard - and lockdep chains
Date: Sun, 17 May 2009 02:38:39 +0200 [thread overview]
Message-ID: <20090517003837.GA4640@nowhere> (raw)
In-Reply-To: <20090516161419.62c45c2b.akpm@linux-foundation.org>
On Sat, May 16, 2009 at 04:14:19PM -0700, Andrew Morton wrote:
> On Thu, 14 May 2009 09:49:51 -0600 Jonathan Corbet <corbet@lwn.net> wrote:
>
> > So...every now and then I return to my system (a dual-core 64-bit
> > x86 box) only to find it totally dead. Lights are on but there's no
> > disk activity, no ping responses, no alternative to simply pulling the
> > plug. It happens fairly reliably about once a day with the 2.6.30-rc
> > kernels; it does not happen with 2.6.29.
> >
> > I'm at a bit of a loss for how to try to track this one down. "System
> > disappears without a trace" isn't much to go on. I can't reproduce it
> > at will; even the "maintain an unsaved editor buffer with hours' worth
> > of work" trick doesn't seem to work this time.
> >
> > One clue might be found here, perhaps: I didn't have lockdep enabled but I do
> > now.
>
> So the lockup isn't due to lockdep.
>
> Did you try all the usual sysrq-P, nmi-watchdog stuff?
>
> Is netconsole enabled, to see if it squawked as it died?
>
> > May 14 01:06:55 bike kernel: [38730.804833] BUG: MAX_LOCKDEP_CHAINS too low!
> > May 14 01:06:55 bike kernel: [38730.804838] turning off the locking correctness validator.
> > May 14 01:06:55 bike kernel: [38730.804843] Pid: 5321, comm: tar Tainted: G W 2.6.30-rc5 #11
> > May 14 01:06:55 bike kernel: [38730.804846] Call Trace:
> > May 14 01:06:55 bike kernel: [38730.804854] [<ffffffff8025df59>] __lock_acquire+0x57f/0xbc9
> > May 14 01:06:55 bike kernel: [38730.804860] [<ffffffff8020f3a9>] ? print_context_stack+0xfa/0x119
> > May 14 01:06:55 bike kernel: [38730.804866] [<ffffffff80394da9>] ? get_hash_bucket+0x28/0x34
> >
> > ...
> >
> > May 14 01:06:55 bike kernel: [38730.805340] [<ffffffff802c2741>] ? filldir+0x0/0xc4
> > May 14 01:06:55 bike kernel: [38730.805344] [<ffffffff802c293d>] vfs_readdir+0x79/0xb6
> > May 14 01:06:55 bike kernel: [38730.805348] [<ffffffff802c2ac3>] sys_getdents+0x81/0xd1
> > May 14 01:06:55 bike kernel: [38730.805353] [<ffffffff8020bcdb>] system_call_fastpath+0x16/0x1b
> >
> > That's quite the call stack... and, evidently, a lot of lock chains...
>
> It is a deep stack trace.
>
> And unfortunately
>
> a) that diagnostic didn't print the stack pointer value, from which
> we can often work out if we're looking at a stack overflow.
>
> b) I regularly think it would be useful if that stack backtrace were
> to print out the actual stack address, so we could see how much
> stack each function is using.
>
> I just went in to hack these things up, but the x86 stacktrace
> code which I used to understand has become stupidly complex so I
> gave up.
>
> What tools do we have to diagnose a possible kernel stack overflow?
> There's CONFIG_DEBUG_STACK_USAGE but that's unlikely to be much use.
I think about CONFIG_STACK_TRACER. Currently this tracer
dumps the max stack footprint backtrace through a file in debugfs.
Then it's not that much useful to debug a stack overflow.
I'm trying to hack around a printk dump for each max stack footprint
encountered. Hopefully it could help to debug this.
Frederic.
next prev parent reply other threads:[~2009-05-17 0:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-14 15:49 2.6.30-rc kills my box hard - and lockdep chains Jonathan Corbet
2009-05-15 5:32 ` Peter Zijlstra
2009-05-16 23:03 ` Andrew Morton
2009-05-16 23:13 ` Jonathan Corbet
2009-05-16 23:14 ` Andrew Morton
2009-05-17 0:38 ` Frederic Weisbecker [this message]
2009-05-17 1:23 ` Frederic Weisbecker
2009-05-17 1:48 ` Ming Lei
2009-05-18 19:02 ` Jonathan Corbet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090517003837.GA4640@nowhere \
--to=fweisbec@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.