From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andi Kleen Subject: Re: 2.6.18-mm2 boot failure on x86-64 Date: Thu, 5 Oct 2006 17:40:58 +0200 Message-ID: <200610051740.58511.ak@suse.de> References: <20060928014623.ccc9b885.akpm@osdl.org> <1160061173.9569.43.camel@dyn9047017100.beaverton.ibm.com> <1160062332.29690.10.camel@flooterbu> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Cc: Badari Pulavarty , Martin Bligh , vgoyal@in.ibm.com, Andrew Morton , lkml , netdev@vger.kernel.org, kmannth@us.ibm.com, Andy Whitcroft Return-path: To: Steve Fox In-Reply-To: <1160062332.29690.10.camel@flooterbu> Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Thursday 05 October 2006 17:32, Steve Fox wrote: > On Thu, 2006-10-05 at 08:12 -0700, Badari Pulavarty wrote: > > > Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? > > CONFIG_DEBUG_KERNEL should be on > > > Last time I couldn't match your instruction dump to any code segment > > in the routine. And also, can you post your .config file. I have > > an amd64 and em64t machine and both work fine... > > Unable to handle kernel NULL pointer dereference at 0000000000000827 RIP: > [] xfrm_register_mode+0x36/0x60 > PGD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.18-git22 #1 > RIP: 0010:[] [] xfrm_register_mode+0x36/0x60 > RSP: 0000:ffff810bffcbded0 EFLAGS: 00010286 > RAX: 000000000000081f RBX: ffffffff805588a0 RCX: 0000000000000000 > RDX: ffffffffffffffff RSI: 0000000000000002 RDI: ffffffff80559550 > RBP: 00000000ffffffef R08: 000000003f924371 R09: 0000000000000000 > R10: ffff810bffcbdcb0 R11: 0000000000000154 R12: 0000000000000000 > R13: ffff810bffcbdef0 R14: 0000000000000000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000827 CR3: 0000000000201000 CR4: 00000000000006e0 > Process swapper (pid: 1, threadinfo ffff810bffcbc000, task ffff810bffcbb4e0) > Stack: 0000000000000000 ffffffff8061fb48 0000000000000000 ffffffff80207182 > 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 0000000000000000 0000000000000000 0000000000090000 Please don't snip the Code: line. It is fairly important. > > The base config file I'm using is at > http://flooterbu.net/kernel/elm3b239-2.6.17.config My guess is that something is wrong with the global variable it is accessing. Can you post the output of grep -5 xfrm_policy_afinfo ? I wonder if that variable overlaps something else. And please add a printk("global %p\n", xfrm_policy_afinfo[family]); at the beginning of net/xfrm/xfrm_poliy.c:xfrm_policy_lock_afinfo and post the output. If not then it's possible that some nearby variable is overflowing or similar. Adding some padding around xfrm_policy_afinfo would show that. Another way if that global is proven to be corrupted will be to add checks all over the boot process to track down where it gets corrupted. -Andi