From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
linux-kernel@vger.kernel.org,
Christoph Lameter <clameter@sgi.com>, Mel Gorman <mel@csn.ul.ie>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Yinghai.Lu@sun.com
Subject: Re: [bug] SLUB + mm/slab.c boot crash in -rc9
Date: Tue, 15 Apr 2008 18:15:32 +0200 [thread overview]
Message-ID: <20080415161532.GA15088@elte.hu> (raw)
In-Reply-To: <alpine.LFD.1.00.0804150838161.2879@woody.linux-foundation.org>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, 15 Apr 2008, Ingo Molnar wrote:
> >
> > debug output is:
> >
> > http://redhat.com/~mingo/misc/log-Thu_Apr_10_10_41_16_CEST_2008.bad.rc9
> >
> > so it's probably the first few page allocations (setup_cpu_cache())
> > going wrong already - suggesting a some fundamental borkage in SLAB?
>
> Well, I think it suggests some fundamental borkage in the page
> allocator.
>
> That first warn-on is from the "alloc_pages_node()" returning NULL at
> bootup. Sure, it could be that the arguments are bogus, but that
> sounds unlikely since none of that is dependent on any kconfig stuff.
>
> The fact that it happens with both SLUB/SLAB makes that even more
> obvious.
>
> Now, you don't have fault injection on, so it can't be that, and your
> debug entry for *z == NULL didn' trigger in alloc_pages, so it's no
> that one either.
>
> However, if __alloc_pages() failed, I would have expected to see the
> "memory allocation failed" printk. Why didn't it? Is
> printk_ratelimit() broken at boot (last_msg start out as zero - maybe
> i should start out as a negative number)?
btw., now with a second full day spent on this regression, i have
figured out a workaround the hard way: increasing SECTION_SIZE_BITS in
include/asm-x86/sparsemem.h from 26 to 27 makes it go away. (i.e. we use
section chunks of 128 MB instead of 64 MB before) I've given up on
analyzing the crash site - it seems rather random and uninformative and
just suggests page allocator borkage.
So this seems like a general sparsemem borkage. PAE uses a shift of 30
due to page->flags shortage (which masks this bug), 64-bit uses 27 which
too probably masks this bug.
Since this is a !NUMA config and !PAE as well, NODES_SHIFT is 0,
ZONES_SHIFT is 2, so the theory of running out of bits in page->flags is
wrong as well.
I also tried a hack to double the size of all sparsemem mem_map
allocations (on the theory of an overflow there) - but it didnt help.
So i think we need to go down further into the page allocator. Perhaps
the buddy bitmaps are wrongly sized somewhere. I'm grasping at straws.
Btw., Mel Gorman has reproduced crashes with my bzImage on his box (and
a hang with my config, using his build), so i think we can eliminate hw
and build environment specialities as a cause.
Ingo
next prev parent reply other threads:[~2008-04-15 16:16 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-11 7:41 [bug] mm/slab.c boot crash in -git, "kernel BUG at mm/slab.c:2103!" Ingo Molnar
2008-04-11 8:21 ` Pekka Enberg
2008-04-11 8:50 ` Pekka Enberg
2008-04-11 8:54 ` Ingo Molnar
2008-04-11 9:05 ` Pekka Enberg
2008-04-11 9:08 ` Pekka Enberg
2008-04-11 9:11 ` Pekka Enberg
2008-04-11 9:24 ` Ingo Molnar
2008-04-11 10:34 ` Nick Piggin
2008-04-11 19:28 ` Christoph Lameter
2008-04-12 10:38 ` Christoph Lameter
2008-04-12 17:22 ` Yinghai Lu
2008-04-15 5:43 ` Ingo Molnar
2008-04-15 9:36 ` Mel Gorman
2008-04-15 10:03 ` Ingo Molnar
2008-04-15 6:25 ` [bug] SLUB + mm/slab.c boot crash in -rc9 Ingo Molnar
2008-04-15 6:41 ` Pekka Enberg
2008-04-15 7:08 ` Ingo Molnar
2008-04-15 8:31 ` Yinghai Lu
2008-04-15 8:46 ` Ingo Molnar
2008-04-15 9:11 ` Ingo Molnar
2008-04-15 16:02 ` Linus Torvalds
2008-04-15 16:15 ` Ingo Molnar [this message]
2008-04-15 17:23 ` Linus Torvalds
2008-04-15 19:35 ` Ingo Molnar
2008-04-15 19:41 ` Ingo Molnar
2008-04-15 19:39 ` Christoph Lameter
2008-04-15 19:54 ` Ingo Molnar
2008-04-15 20:03 ` Christoph Lameter
2008-04-15 20:17 ` Ingo Molnar
2008-04-15 20:28 ` Ingo Molnar
2008-04-15 20:34 ` Ingo Molnar
2008-04-15 20:42 ` Ingo Molnar
2008-04-15 20:50 ` Christoph Lameter
2008-04-15 20:58 ` Ingo Molnar
2008-04-15 21:08 ` Christoph Lameter
2008-04-15 21:16 ` Mike Travis
2008-04-15 21:19 ` Ingo Molnar
2008-04-15 21:21 ` Christoph Lameter
2008-04-15 21:23 ` Ingo Molnar
2008-04-15 21:24 ` Christoph Lameter
2008-04-15 21:28 ` Ingo Molnar
2008-04-15 21:33 ` Christoph Lameter
2008-04-15 21:43 ` Mike Travis
2008-04-15 22:07 ` Ingo Molnar
2008-04-15 21:27 ` Mike Travis
2008-04-15 20:34 ` Pekka Enberg
2008-04-15 20:40 ` Ingo Molnar
2008-04-15 21:06 ` Linus Torvalds
2008-04-15 21:13 ` Ingo Molnar
2008-04-15 21:24 ` Ingo Molnar
2008-04-15 21:42 ` Christoph Lameter
2008-04-15 21:55 ` Ingo Molnar
2008-04-15 22:06 ` Christoph Lameter
2008-04-15 22:13 ` Ingo Molnar
2008-04-15 22:27 ` Christoph Lameter
2008-04-15 22:32 ` Ingo Molnar
2008-04-15 23:22 ` Christoph Lameter
2008-04-15 23:27 ` Ingo Molnar
2008-04-15 23:32 ` Christoph Lameter
2008-04-16 0:04 ` Christoph Lameter
2008-04-15 23:18 ` Yinghai Lu
2008-04-16 0:03 ` [patch] mm: sparsemem memory_present() memory corruption fix Ingo Molnar
2008-04-16 0:10 ` Christoph Lameter
2008-04-16 0:18 ` Ingo Molnar
2008-04-16 0:32 ` Yinghai Lu
2008-04-16 0:44 ` Ingo Molnar
2008-04-16 0:46 ` Christoph Lameter
2008-04-16 0:52 ` Ingo Molnar
2008-04-16 1:17 ` Ingo Molnar
2008-04-16 1:30 ` Yinghai Lu
2008-04-16 2:00 ` Yinghai Lu
2008-04-16 2:20 ` KAMEZAWA Hiroyuki
2008-04-16 0:56 ` Yinghai Lu
2008-04-16 1:02 ` Ingo Molnar
2008-04-16 1:17 ` Yinghai Lu
2008-04-16 0:19 ` Christoph Lameter
2008-04-16 0:33 ` Yinghai Lu
2008-04-16 0:36 ` Ingo Molnar
2008-04-16 0:34 ` Ingo Molnar
2008-04-16 0:40 ` Ingo Molnar
2008-04-16 0:45 ` Christoph Lameter
2008-04-16 0:52 ` Ingo Molnar
2008-04-16 1:14 ` Ingo Molnar
2008-04-16 2:45 ` Linus Torvalds
2008-04-16 1:48 ` KAMEZAWA Hiroyuki
2008-04-16 14:05 ` Mel Gorman
2008-04-16 15:03 ` Ingo Molnar
2008-04-15 20:54 ` [bug] SLUB + mm/slab.c boot crash in -rc9 Christoph Lameter
2008-04-15 20:58 ` Ingo Molnar
2008-04-15 21:08 ` Ingo Molnar
2008-04-15 20:23 ` Ingo Molnar
2008-04-11 19:26 ` [bug] mm/slab.c boot crash in -git, "kernel BUG at mm/slab.c:2103!" Christoph Lameter
2008-04-11 19:25 ` Christoph Lameter
2008-04-15 5:49 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080415161532.GA15088@elte.hu \
--to=mingo@elte.hu \
--cc=Yinghai.Lu@sun.com \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
--cc=penberg@cs.helsinki.fi \
--cc=rjw@sisk.pl \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox