From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764289AbZEHXZy (ORCPT ); Fri, 8 May 2009 19:25:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754394AbZEHXZp (ORCPT ); Fri, 8 May 2009 19:25:45 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:45299 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754162AbZEHXZo (ORCPT ); Fri, 8 May 2009 19:25:44 -0400 Date: Fri, 8 May 2009 16:23:35 -0700 From: Andrew Morton To: Robert Schwebel Cc: linux-kernel@vger.kernel.org, greg@kroah.com Subject: Re: Possible boot race (seen on MX35) Message-Id: <20090508162335.aca9841d.akpm@linux-foundation.org> In-Reply-To: <20090508214718.GP15802@pengutronix.de> References: <20090508214718.GP15802@pengutronix.de> X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.8.20; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 May 2009 23:47:18 +0200 Robert Schwebel wrote: > Hi, > > While testing 2.6.30-rc4 on i.MX35 (with mxc-master ontop of the vanilla > -rc4) I have seen the following oops. As it went away by booting the > board again and didn't show up0 again even after several boots, I assume > it could be a race coming from the recent fast boot activities? Does > anyone have an idea? > > After the oops, the board continues booting as usual. > > rsc > > ----------8<---------- > > Uncompressing Linux.................................................................................................................... done, booting the kernel. > Linux version 2.6.30-rc4-ptx-mxc1 (jbe__octopus) (gcc version 4.3.2 (OSELAS.Toolchain-1.99.3) ) #1 PREEMPT Fri May 8 22:04:53 CEST 2009 > CPU: ARMv6-compatible processor __4117b363__ revision 3 (ARMv6TEJ), cr=00c5387f > CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache > Machine: Phytec Phycore pcm043 > Memory policy: ECC disabled, Data cache writeback > On node 0 totalpages: 32768 > free_area_init_node: node 0, pgdat c038a0f0, node_mem_map c03a4000 > Normal zone: 256 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 32512 pages, LIFO batch:7 > Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 > Kernel command line: console=ttymxc0,115200 video=mx3fb:Sharp-LQ035Q7 ip=192.168.24.47:192.168.23.2:192.168.23.1:255.255.0.0::: root=/dev/nfs nfsroot=192.168.23.2:/home/jbe/work/bsp/phytec/phyCORE/OSELAS.BSP-phyCORE-trunk/platform-phyCORE-i.MX35/root,v3,tcp mtdparts="physmap-flash.0:256k(uboot)ro,128k(ubootenv),2M(kernel),-(root)" > NR_IRQS:180 > MXC GPIO hardware > MXC IRQ initialized > PID hash table entries: 512 (order: 9, 2048 bytes) > Console: colour dummy device 80x30 > Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) > Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) > Memory: 128MB = 128MB total > Memory: 126064KB available (3224K code, 258K data, 108K init, 0K highmem) > Calibrating delay loop... 398.13 BogoMIPS (lpj=1990656) > Mount-cache hash table entries: 512 > CPU: Testing write buffer coherency: ok > net_namespace: 296 bytes > regulator: core version 0.5 > NET: Registered protocol family 16 > Unable to handle kernel NULL pointer dereference at virtual address 000000e4 > pgd = c0004000 > __000000e4__ *pgd=00000000 > Internal error: Oops: 805 __#1__ PREEMPT > Modules linked in: > CPU: 0 Not tainted (2.6.30-rc4-ptx-mxc1 #1) > PC is at call_usermodehelper_setup+0x44/0x78 > LR is at exit_notify+0x168/0x184 > pc : ____ lr : ____ psr: 00000013 > sp : c786dff8 ip : 00000000 fp : 00000000 > r10: 00000000 r9 : 00000000 r8 : 00000000 > r7 : 00000000 r6 : 00000000 r5 : 0000003c r4 : 000000cc > r3 : c003d620 r2 : c004aa00 r1 : c781ca00 r0 : c781ca00 > Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel > Control: 00c5387f Table: 80004008 DAC: 00000017 > Process khelper (pid: 27, stack limit = 0xc786c260) > Stack: (0xc786dff8 to 0xc786e000) > dfe0: 00000000 00000000 > ____ (call_usermodehelper_setup+0x44/0x78) from ____ (0xc78c5c40) > Code: e4823004 e59f3034 e5842008 e584300c (e5846018) > ---__ end trace 1b75b31a2719ed1c __--- > Hard. At a guess I'd say it died somewhere down inside INIT_WORK(), perhaps doing lockdep stuff. Do you have CONFIG_LOCKDEP=n? It would help if you could work out which field of struct subprocess_info is at offset 0x000000e4 in your build. One way of doing that is - put this into ~/.gdbinit define offsetof set $off = &(((struct $arg0 *)0)->$arg1) printf "%d 0x%x\n", $off, $off end - set CONFIG_DEBUG_INFO=y - make kernel/kmod.o - gdb kernel/kmod.o (gdb) offsetof subprocess_info cred 80 0x50