From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754800AbYDOG0v (ORCPT ); Tue, 15 Apr 2008 02:26:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751904AbYDOG0n (ORCPT ); Tue, 15 Apr 2008 02:26:43 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:49304 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751871AbYDOG0n (ORCPT ); Tue, 15 Apr 2008 02:26:43 -0400 Date: Tue, 15 Apr 2008 08:25:34 +0200 From: Ingo Molnar To: Pekka Enberg Cc: linux-kernel@vger.kernel.org, Christoph Lameter , Mel Gorman , Nick Piggin , Linus Torvalds , Andrew Morton , "Rafael J. Wysocki" , Yinghai.Lu@sun.com Subject: Re: [bug] SLUB + mm/slab.c boot crash in -rc9 Message-ID: <20080415062534.GA9172@elte.hu> References: <20080411074145.GA4944@elte.hu> <84144f020804110121l8444aafl4631071b34c458fe@mail.gmail.com> <84144f020804110150q367260f6k473380a1309db878@mail.gmail.com> <20080411085411.GA10181@elte.hu> <84144f020804110205u3d073e76lbcdd36ec293a169b@mail.gmail.com> <84144f020804110208m41414c0h2ed71b85efbb426c@mail.gmail.com> <84144f020804110211w4ae41414od24cf2de72453e13@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <84144f020804110211w4ae41414od24cf2de72453e13@mail.gmail.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org you asked me to run with the debug patch attached below. I just tried vanilla -rc9 (head 120dd64cacd4fb7) and it still crashes with this config: http://redhat.com/~mingo/misc/config-Thu_Apr_10_10_41_16_CEST_2008.bad.rc9 debug output is: http://redhat.com/~mingo/misc/log-Thu_Apr_10_10_41_16_CEST_2008.bad.rc9 so it's probably the first few page allocations (setup_cpu_cache()) going wrong already - suggesting a some fundamental borkage in SLAB? note, when i change SLAB to SLUB (and keep the config unchanged otherwise), i get a similar early crash: http://redhat.com/~mingo/misc/log-Tue_Apr_15_07_24_59_CEST_2008.bad http://redhat.com/~mingo/misc/config-Tue_Apr_15_07_24_59_CEST_2008.bad i've also uploaded a bzImage (SLUB, debug patch not applied) that you can pick up and run on any 32-bit test-system: http://redhat.com/~mingo/misc/bzImage-Thu_Apr_10_10_41_16_CEST_2008.bad.rc9 it's a relatively generic bzImage that should boot on most whitebox PCs on most distros as long as you use a pure ext3 setup and might even give you networking (no modules or initrd is needed). It boots fine on two other 32-bit PCs i have (an Intel laptop and an AMD desktop). Ingo Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c +++ linux/mm/page_alloc.c @@ -1485,6 +1485,7 @@ restart: * Happens if we have an empty zonelist as a result of * GFP_THISNODE being used on a memoryless node */ + WARN_ON(1); return NULL; } Index: linux/mm/slab.c =================================================================== --- linux.orig/mm/slab.c +++ linux/mm/slab.c @@ -1682,6 +1682,7 @@ static void *kmem_getpages(struct kmem_c flags |= __GFP_RECLAIMABLE; page = alloc_pages_node(nodeid, flags, cachep->gfporder); + WARN_ON(!page); if (!page) return NULL; @@ -2620,6 +2621,7 @@ static struct slab *alloc_slabmgmt(struc /* Slab management obj is off-slab. */ slabp = kmem_cache_alloc_node(cachep->slabp_cache, local_flags & ~GFP_THISNODE, nodeid); + WARN_ON(!slabp); if (!slabp) return NULL; } else {