* new kernel oops in recent kernels
@ 2008-03-16 15:19 Giuseppe Sacco
2008-03-16 16:39 ` James Bottomley
2008-03-16 16:42 ` Matthew Wilcox
0 siblings, 2 replies; 9+ messages in thread
From: Giuseppe Sacco @ 2008-03-16 15:19 UTC (permalink / raw)
To: linux-scsi
Hi all,
testing latest kernels on SGI O2, I found this new kernel oops. It has
been produced with kernel from linux-mips.org git of yesterday night.
A very similar oops has been reported by others[0] using 2.6.22.
As you may see, the oops happens while booting the machine, when init
run all scripts via rc. One of those scripts run hald-probe-storage, the
process that actually create the oops.
I am not able to identify the cause nor to propose a solution, but I am
willing to test any patch for this problem.
Thanks a lot,
Giuseppe
CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000
Oops[#1]:
Cpu 0
$ 0 : 0000000000000000 ffffffff9001fce0 ffffffffffffff86 0000000000000028
$ 4 : 980000000fc01140 0000000000000080 0000000000024000 0000000000000000
$ 8 : 980000000fc54700 0000000000000001 0000000000008000 404000130a0808ff
$12 : 0000000000000008 ffffffff801b8db8 0000000000000000 ffffffff803f0000
$16 : 980000000ff2fa70 980000000c417bb8 980000000c417c20 980000000fdeb610
$20 : 000000007fffffff 980000000f9211a0 980000000fc26000 000000007fa51ecd
$24 : 0000000000000000 ffffffff80074290
$28 : 980000000c414000 980000000c417bb0 0000000000400000 0000000000000000
Hi : 0000000000000000
Lo : 003d08dbda057200
epc : 0000000000000000 0x0 Not tainted
ra : 0000000000000000 0x0
Status: 9001fce3 KX SX UX KERNEL EXL IE
Cause : 00000008
BadVA : 0000000000000000
PrId : 00002321 (R5000)
Modules linked in: parport_pc lp parport ipv6 deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_generic xcbc sha25
6_generic sha1_generic crypto_null crypto_blkcipher dm_snapshot dm_mirror dm_mod ehci_hcd ohci_hcd r8169 usbcore sg evdev
Process hald-probe-stor (pid: 1937, threadinfo=980000000c414000, task=980000000ebf47d8)
Stack : 980000000c417be0 980000000c417de0 0800000000000000 980000000c417bb0
00000008ffffff86 0000000000000000 0200000000000001 000006d600000000
0000000000000000 980000000c417de0 980000000fdeb610 0000000000000001
0000000000005326 ffffffff802460b0 0000000070023a00 000000000f9211a0
ffffffff80490000 ffffffff8024bb84 980000000fc10e80 980000000f80bb28
0000000000000000 980000000f9210e0 0000010100000001 00000000800d1618
0000000000000004 980000000fc8f850 000000007fffffff 980000000fde4000
0000000000005326 000000007fffffff 980000000c407540 980000000f9211a0
980000000fc26000 000000007fa51ecd ffffffff80245c6c 980000000c407540
0000000000000000 fffffffffffffdfd 0000000000005326 ffffffff801ad8bc
...
Call Trace:
[<ffffffff802460b0>] sr_drive_status+0x50/0xe8
[<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
[<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
[<ffffffff801ad8bc>] compat_blkdev_ioctl+0x7cc/0x18e0
[<ffffffff800d1870>] do_open+0x98/0x310
[<ffffffff800d1d60>] blkdev_open+0x0/0xc0
[<ffffffff800d1da8>] blkdev_open+0x48/0xc0
[<ffffffff8009c444>] __dentry_open+0x114/0x2e0
[<ffffffff8009c740>] do_filp_open+0x48/0x58
[<ffffffff8009c740>] do_filp_open+0x48/0x58
[<ffffffff800def8c>] compat_sys_ioctl+0xf4/0x440
[<ffffffff80019154>] handle_sys+0x114/0x130
[<ffffffff8001fcf3>] fpu_emulator_cop1Handler+0x362/0x2270
Code: (Bad address in epc)
[0]http://lists.debian.org/debian-mips/2008/03/msg00082.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: new kernel oops in recent kernels 2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco @ 2008-03-16 16:39 ` James Bottomley 2008-03-16 18:32 ` Giuseppe Sacco 2008-03-16 16:42 ` Matthew Wilcox 1 sibling, 1 reply; 9+ messages in thread From: James Bottomley @ 2008-03-16 16:39 UTC (permalink / raw) To: Giuseppe Sacco; +Cc: linux-scsi On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote: > Hi all, > testing latest kernels on SGI O2, I found this new kernel oops. It has > been produced with kernel from linux-mips.org git of yesterday night. > A very similar oops has been reported by others[0] using 2.6.22. > > As you may see, the oops happens while booting the machine, when init > run all scripts via rc. One of those scripts run hald-probe-storage, the > process that actually create the oops. > > I am not able to identify the cause nor to propose a solution, but I am > willing to test any patch for this problem. > > Thanks a lot, > Giuseppe > > CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000 > Oops[#1]: > Cpu 0 > $ 0 : 0000000000000000 ffffffff9001fce0 ffffffffffffff86 0000000000000028 > $ 4 : 980000000fc01140 0000000000000080 0000000000024000 0000000000000000 > $ 8 : 980000000fc54700 0000000000000001 0000000000008000 404000130a0808ff > $12 : 0000000000000008 ffffffff801b8db8 0000000000000000 ffffffff803f0000 > $16 : 980000000ff2fa70 980000000c417bb8 980000000c417c20 980000000fdeb610 > $20 : 000000007fffffff 980000000f9211a0 980000000fc26000 000000007fa51ecd > $24 : 0000000000000000 ffffffff80074290 > $28 : 980000000c414000 980000000c417bb0 0000000000400000 0000000000000000 > Hi : 0000000000000000 > Lo : 003d08dbda057200 > epc : 0000000000000000 0x0 Not tainted > ra : 0000000000000000 0x0 > Status: 9001fce3 KX SX UX KERNEL EXL IE > Cause : 00000008 > BadVA : 0000000000000000 > PrId : 00002321 (R5000) > Modules linked in: parport_pc lp parport ipv6 deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_generic xcbc sha25 > 6_generic sha1_generic crypto_null crypto_blkcipher dm_snapshot dm_mirror dm_mod ehci_hcd ohci_hcd r8169 usbcore sg evdev > Process hald-probe-stor (pid: 1937, threadinfo=980000000c414000, task=980000000ebf47d8) > Stack : 980000000c417be0 980000000c417de0 0800000000000000 980000000c417bb0 > 00000008ffffff86 0000000000000000 0200000000000001 000006d600000000 > 0000000000000000 980000000c417de0 980000000fdeb610 0000000000000001 > 0000000000005326 ffffffff802460b0 0000000070023a00 000000000f9211a0 > ffffffff80490000 ffffffff8024bb84 980000000fc10e80 980000000f80bb28 > 0000000000000000 980000000f9210e0 0000010100000001 00000000800d1618 > 0000000000000004 980000000fc8f850 000000007fffffff 980000000fde4000 > 0000000000005326 000000007fffffff 980000000c407540 980000000f9211a0 > 980000000fc26000 000000007fa51ecd ffffffff80245c6c 980000000c407540 > 0000000000000000 fffffffffffffdfd 0000000000005326 ffffffff801ad8bc > ... > Call Trace: > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8 > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208 > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8 > [<ffffffff801ad8bc>] compat_blkdev_ioctl+0x7cc/0x18e0 > [<ffffffff800d1870>] do_open+0x98/0x310 > [<ffffffff800d1d60>] blkdev_open+0x0/0xc0 > [<ffffffff800d1da8>] blkdev_open+0x48/0xc0 > [<ffffffff8009c444>] __dentry_open+0x114/0x2e0 > [<ffffffff8009c740>] do_filp_open+0x48/0x58 > [<ffffffff8009c740>] do_filp_open+0x48/0x58 > [<ffffffff800def8c>] compat_sys_ioctl+0xf4/0x440 > [<ffffffff80019154>] handle_sys+0x114/0x130 > [<ffffffff8001fcf3>] fpu_emulator_cop1Handler+0x362/0x2270 This is a bit strange. It's obviously O2 specific, which makes it a lot harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce (just in case this changes the symbol layout). Then ask gdb where sr_drive_status+0x50 (or what it moves to) is: gdb vmlinux b *(sr_drive_status+0x50) should identify the file and line. The signature implies that cdi->handle may be NULL, so you could put in a check for that as well. Thanks, James ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 16:39 ` James Bottomley @ 2008-03-16 18:32 ` Giuseppe Sacco 2008-03-16 18:47 ` James Bottomley 0 siblings, 1 reply; 9+ messages in thread From: Giuseppe Sacco @ 2008-03-16 18:32 UTC (permalink / raw) To: linux-scsi Hi James, Il giorno dom, 16/03/2008 alle 11.39 -0500, James Bottomley ha scritto: > On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote: [...] > This is a bit strange. It's obviously O2 specific, which makes it a lot > harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce > (just in case this changes the symbol layout). Then ask gdb where [...] I cannot find any CONFIG_DEBUG_INFO. Do you mean CONFIG_DEBUG_KERNEL? Thanks, Giuseppe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 18:32 ` Giuseppe Sacco @ 2008-03-16 18:47 ` James Bottomley 0 siblings, 0 replies; 9+ messages in thread From: James Bottomley @ 2008-03-16 18:47 UTC (permalink / raw) To: Giuseppe Sacco; +Cc: linux-scsi On Sun, 2008-03-16 at 19:32 +0100, Giuseppe Sacco wrote: > Hi James, > > Il giorno dom, 16/03/2008 alle 11.39 -0500, James Bottomley ha scritto: > > On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote: > [...] > > This is a bit strange. It's obviously O2 specific, which makes it a lot > > harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce > > (just in case this changes the symbol layout). Then ask gdb where > [...] > > I cannot find any CONFIG_DEBUG_INFO. Do you mean CONFIG_DEBUG_KERNEL? This from lib/Kconfig.debug: config DEBUG_INFO bool "Compile the kernel with debug info" depends on DEBUG_KERNEL help If you say Y here the resulting kernel image will include debugging info resulting in a larger kernel image. This adds debug symbols to the kernel and modules (gcc -g), and is needed if you intend to use kernel crashdump or binary object tools like crash, kgdb, LKCD, gdb, etc on the kernel. Say Y here only if you plan to debug the kernel. If unsure, say N. It does depend on CONFIG_DEBUG_KERNEL according to the depends clause. James ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco 2008-03-16 16:39 ` James Bottomley @ 2008-03-16 16:42 ` Matthew Wilcox 2008-03-16 18:29 ` Giuseppe Sacco 1 sibling, 1 reply; 9+ messages in thread From: Matthew Wilcox @ 2008-03-16 16:42 UTC (permalink / raw) To: Giuseppe Sacco; +Cc: linux-scsi On Sun, Mar 16, 2008 at 04:19:08PM +0100, Giuseppe Sacco wrote: > testing latest kernels on SGI O2, I found this new kernel oops. It has > been produced with kernel from linux-mips.org git of yesterday night. > A very similar oops has been reported by others[0] using 2.6.22. > CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000 I'm not familiar with MIPS; is epc the program counter? If so, this would be a branch to 0. That's somewhat confusing as I don't see any function pointers used within sr_drive_status(). How accurate are MIPS backtraces? > Call Trace: > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8 > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208 > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8 It would be interesting to see a disassembly (objdump -dr drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60. And if that calls a function, it would be interesting to put in printks to figure out where we're dereferencing a null pointer. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 16:42 ` Matthew Wilcox @ 2008-03-16 18:29 ` Giuseppe Sacco 2008-03-17 3:58 ` Matthew Wilcox 2008-03-17 4:41 ` Matthew Wilcox 0 siblings, 2 replies; 9+ messages in thread From: Giuseppe Sacco @ 2008-03-16 18:29 UTC (permalink / raw) To: linux-scsi Hi all, Il giorno dom, 16/03/2008 alle 10.42 -0600, Matthew Wilcox ha scritto: > On Sun, Mar 16, 2008 at 04:19:08PM +0100, Giuseppe Sacco wrote: [...] > > Call Trace: > > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8 > > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208 > > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8 > > It would be interesting to see a disassembly (objdump -dr > drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60. here it is: (gdb) disassemble sr_drive_status+0x50 Dump of assembler code for function sr_drive_status: 0xffffffff80246060 <sr_drive_status+0>: daddiu sp,sp,-32 0xffffffff80246064 <sr_drive_status+4>: lui v0,0x7fff 0xffffffff80246068 <sr_drive_status+8>: sd s0,16(sp) 0xffffffff8024606c <sr_drive_status+12>: sd ra,24(sp) 0xffffffff80246070 <sr_drive_status+16>: ori v0,v0,0xffff 0xffffffff80246074 <sr_drive_status+20>: move s0,a0 0xffffffff80246078 <sr_drive_status+24>: bne a1,v0,0xffffffff802460e8 <sr_drive_status+136> 0xffffffff8024607c <sr_drive_status+28>: ld v1,24(a0) 0xffffffff80246080 <sr_drive_status+32>: ld a0,16(v1) 0xffffffff80246084 <sr_drive_status+36>: jal 0xffffffff80244c70 <sr_test_unit_ready> 0xffffffff80246088 <sr_drive_status+40>: daddiu a1,sp,4 0xffffffff8024608c <sr_drive_status+44>: bnez v0,0xffffffff802460a8 <sr_drive_status+72> 0xffffffff80246090 <sr_drive_status+48>: move a0,s0 0xffffffff80246094 <sr_drive_status+52>: li v0,4 0xffffffff80246098 <sr_drive_status+56>: ld ra,24(sp) 0xffffffff8024609c <sr_drive_status+60>: ld s0,16(sp) 0xffffffff802460a0 <sr_drive_status+64>: jr ra 0xffffffff802460a4 <sr_drive_status+68>: daddiu sp,sp,32 0xffffffff802460a8 <sr_drive_status+72>: jal 0xffffffff8024c838 <cdrom_get_media_event> 0xffffffff802460ac <sr_drive_status+76>: move a1,sp 0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156> 0xffffffff802460b4 <sr_drive_status+84>: lhu v0,0(sp) 0xffffffff802460b8 <sr_drive_status+88>: sll v0,v0,0x0 0xffffffff802460bc <sr_drive_status+92>: andi v0,v0,0xff 0xffffffff802460c0 <sr_drive_status+96>: andi v1,v0,0x2 0xffffffff802460c4 <sr_drive_status+100>: bnez v1,0xffffffff80246094 <sr_drive_status+52> 0xffffffff802460c8 <sr_drive_status+104>: andi v0,v0,0x1 0xffffffff802460cc <sr_drive_status+108>: beqz v0,0xffffffff80246098 <sr_drive_status+56> 0xffffffff802460d0 <sr_drive_status+112>: li v0,1 0xffffffff802460d4 <sr_drive_status+116>: ld ra,24(sp) > And if that calls a function, it would be interesting to put in printks > to figure out where we're dereferencing a null pointer. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 18:29 ` Giuseppe Sacco @ 2008-03-17 3:58 ` Matthew Wilcox 2008-03-17 4:41 ` Matthew Wilcox 1 sibling, 0 replies; 9+ messages in thread From: Matthew Wilcox @ 2008-03-17 3:58 UTC (permalink / raw) To: Giuseppe Sacco; +Cc: linux-scsi On Sun, Mar 16, 2008 at 07:29:07PM +0100, Giuseppe Sacco wrote: > > It would be interesting to see a disassembly (objdump -dr > > drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60. > > here it is: > > (gdb) disassemble sr_drive_status+0x50 > 0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156> The thing about objdump -dr is that it gives you the name of the function that's being called. gdb apparently doesn't, or would need a different command from "disassemble". -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-16 18:29 ` Giuseppe Sacco 2008-03-17 3:58 ` Matthew Wilcox @ 2008-03-17 4:41 ` Matthew Wilcox 2008-03-17 8:17 ` Giuseppe Sacco 1 sibling, 1 reply; 9+ messages in thread From: Matthew Wilcox @ 2008-03-17 4:41 UTC (permalink / raw) To: Giuseppe Sacco; +Cc: linux-scsi On Sun, Mar 16, 2008 at 07:29:07PM +0100, Giuseppe Sacco wrote: > > > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8 > > > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208 > > > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8 > > > 0xffffffff802460a4 <sr_drive_status+68>: daddiu sp,sp,32 > 0xffffffff802460a8 <sr_drive_status+72>: jal 0xffffffff8024c838 <cdrom_get_media_event> > 0xffffffff802460ac <sr_drive_status+76>: move a1,sp > 0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156> I think I was confused earlier. 156 is 0x9c, thus within the function. The backtrace must be incorrect; this is really 0x48 and thus a call to cdrom_get_media_event, which points the finger at cdi->ops->generic_packet being NULL. Put a BUG_ON(!cdi->ops->generic_packet) in drivers/cdrom/cdrom.c right before the line that calls it (ie line 11 of cdrom_get_media_event). That should trigger and give a better backtrace. Then it's a simple (*) matter of figuring out why it's NULL. * This is sarcasm. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels 2008-03-17 4:41 ` Matthew Wilcox @ 2008-03-17 8:17 ` Giuseppe Sacco 0 siblings, 0 replies; 9+ messages in thread From: Giuseppe Sacco @ 2008-03-17 8:17 UTC (permalink / raw) To: linux-scsi Hi all, I wrote a message to linux-mips mailing list for investigating on the assembly code generated in sr_cdrom_status() since adding the suggested printk() stopped the oops. I supposed there is a problem with the compiler, but people there are investigating about problem with cache coherency. You may follow the thread on public web archive, at http://www.linux-mips.org/archives/linux-mips/2008-03/msg00079.html I'll be back on this list after checking any problem with cache coherence and c compiler. Thanks, Giuseppe ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-17 8:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco 2008-03-16 16:39 ` James Bottomley 2008-03-16 18:32 ` Giuseppe Sacco 2008-03-16 18:47 ` James Bottomley 2008-03-16 16:42 ` Matthew Wilcox 2008-03-16 18:29 ` Giuseppe Sacco 2008-03-17 3:58 ` Matthew Wilcox 2008-03-17 4:41 ` Matthew Wilcox 2008-03-17 8:17 ` Giuseppe Sacco
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox