* new kernel oops in recent kernels
@ 2008-03-16 15:19 Giuseppe Sacco
2008-03-16 16:39 ` James Bottomley
2008-03-16 16:42 ` Matthew Wilcox
0 siblings, 2 replies; 9+ messages in thread
From: Giuseppe Sacco @ 2008-03-16 15:19 UTC (permalink / raw)
To: linux-scsi
Hi all,
testing latest kernels on SGI O2, I found this new kernel oops. It has
been produced with kernel from linux-mips.org git of yesterday night.
A very similar oops has been reported by others[0] using 2.6.22.
As you may see, the oops happens while booting the machine, when init
run all scripts via rc. One of those scripts run hald-probe-storage, the
process that actually create the oops.
I am not able to identify the cause nor to propose a solution, but I am
willing to test any patch for this problem.
Thanks a lot,
Giuseppe
CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000
Oops[#1]:
Cpu 0
$ 0 : 0000000000000000 ffffffff9001fce0 ffffffffffffff86 0000000000000028
$ 4 : 980000000fc01140 0000000000000080 0000000000024000 0000000000000000
$ 8 : 980000000fc54700 0000000000000001 0000000000008000 404000130a0808ff
$12 : 0000000000000008 ffffffff801b8db8 0000000000000000 ffffffff803f0000
$16 : 980000000ff2fa70 980000000c417bb8 980000000c417c20 980000000fdeb610
$20 : 000000007fffffff 980000000f9211a0 980000000fc26000 000000007fa51ecd
$24 : 0000000000000000 ffffffff80074290
$28 : 980000000c414000 980000000c417bb0 0000000000400000 0000000000000000
Hi : 0000000000000000
Lo : 003d08dbda057200
epc : 0000000000000000 0x0 Not tainted
ra : 0000000000000000 0x0
Status: 9001fce3 KX SX UX KERNEL EXL IE
Cause : 00000008
BadVA : 0000000000000000
PrId : 00002321 (R5000)
Modules linked in: parport_pc lp parport ipv6 deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_generic xcbc sha25
6_generic sha1_generic crypto_null crypto_blkcipher dm_snapshot dm_mirror dm_mod ehci_hcd ohci_hcd r8169 usbcore sg evdev
Process hald-probe-stor (pid: 1937, threadinfo=980000000c414000, task=980000000ebf47d8)
Stack : 980000000c417be0 980000000c417de0 0800000000000000 980000000c417bb0
00000008ffffff86 0000000000000000 0200000000000001 000006d600000000
0000000000000000 980000000c417de0 980000000fdeb610 0000000000000001
0000000000005326 ffffffff802460b0 0000000070023a00 000000000f9211a0
ffffffff80490000 ffffffff8024bb84 980000000fc10e80 980000000f80bb28
0000000000000000 980000000f9210e0 0000010100000001 00000000800d1618
0000000000000004 980000000fc8f850 000000007fffffff 980000000fde4000
0000000000005326 000000007fffffff 980000000c407540 980000000f9211a0
980000000fc26000 000000007fa51ecd ffffffff80245c6c 980000000c407540
0000000000000000 fffffffffffffdfd 0000000000005326 ffffffff801ad8bc
...
Call Trace:
[<ffffffff802460b0>] sr_drive_status+0x50/0xe8
[<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
[<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
[<ffffffff801ad8bc>] compat_blkdev_ioctl+0x7cc/0x18e0
[<ffffffff800d1870>] do_open+0x98/0x310
[<ffffffff800d1d60>] blkdev_open+0x0/0xc0
[<ffffffff800d1da8>] blkdev_open+0x48/0xc0
[<ffffffff8009c444>] __dentry_open+0x114/0x2e0
[<ffffffff8009c740>] do_filp_open+0x48/0x58
[<ffffffff8009c740>] do_filp_open+0x48/0x58
[<ffffffff800def8c>] compat_sys_ioctl+0xf4/0x440
[<ffffffff80019154>] handle_sys+0x114/0x130
[<ffffffff8001fcf3>] fpu_emulator_cop1Handler+0x362/0x2270
Code: (Bad address in epc)
[0]http://lists.debian.org/debian-mips/2008/03/msg00082.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco
@ 2008-03-16 16:39 ` James Bottomley
2008-03-16 18:32 ` Giuseppe Sacco
2008-03-16 16:42 ` Matthew Wilcox
1 sibling, 1 reply; 9+ messages in thread
From: James Bottomley @ 2008-03-16 16:39 UTC (permalink / raw)
To: Giuseppe Sacco; +Cc: linux-scsi
On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote:
> Hi all,
> testing latest kernels on SGI O2, I found this new kernel oops. It has
> been produced with kernel from linux-mips.org git of yesterday night.
> A very similar oops has been reported by others[0] using 2.6.22.
>
> As you may see, the oops happens while booting the machine, when init
> run all scripts via rc. One of those scripts run hald-probe-storage, the
> process that actually create the oops.
>
> I am not able to identify the cause nor to propose a solution, but I am
> willing to test any patch for this problem.
>
> Thanks a lot,
> Giuseppe
>
> CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000
> Oops[#1]:
> Cpu 0
> $ 0 : 0000000000000000 ffffffff9001fce0 ffffffffffffff86 0000000000000028
> $ 4 : 980000000fc01140 0000000000000080 0000000000024000 0000000000000000
> $ 8 : 980000000fc54700 0000000000000001 0000000000008000 404000130a0808ff
> $12 : 0000000000000008 ffffffff801b8db8 0000000000000000 ffffffff803f0000
> $16 : 980000000ff2fa70 980000000c417bb8 980000000c417c20 980000000fdeb610
> $20 : 000000007fffffff 980000000f9211a0 980000000fc26000 000000007fa51ecd
> $24 : 0000000000000000 ffffffff80074290
> $28 : 980000000c414000 980000000c417bb0 0000000000400000 0000000000000000
> Hi : 0000000000000000
> Lo : 003d08dbda057200
> epc : 0000000000000000 0x0 Not tainted
> ra : 0000000000000000 0x0
> Status: 9001fce3 KX SX UX KERNEL EXL IE
> Cause : 00000008
> BadVA : 0000000000000000
> PrId : 00002321 (R5000)
> Modules linked in: parport_pc lp parport ipv6 deflate zlib_deflate ctr twofish twofish_common camellia serpent blowfish des_generic cbc aes_generic xcbc sha25
> 6_generic sha1_generic crypto_null crypto_blkcipher dm_snapshot dm_mirror dm_mod ehci_hcd ohci_hcd r8169 usbcore sg evdev
> Process hald-probe-stor (pid: 1937, threadinfo=980000000c414000, task=980000000ebf47d8)
> Stack : 980000000c417be0 980000000c417de0 0800000000000000 980000000c417bb0
> 00000008ffffff86 0000000000000000 0200000000000001 000006d600000000
> 0000000000000000 980000000c417de0 980000000fdeb610 0000000000000001
> 0000000000005326 ffffffff802460b0 0000000070023a00 000000000f9211a0
> ffffffff80490000 ffffffff8024bb84 980000000fc10e80 980000000f80bb28
> 0000000000000000 980000000f9210e0 0000010100000001 00000000800d1618
> 0000000000000004 980000000fc8f850 000000007fffffff 980000000fde4000
> 0000000000005326 000000007fffffff 980000000c407540 980000000f9211a0
> 980000000fc26000 000000007fa51ecd ffffffff80245c6c 980000000c407540
> 0000000000000000 fffffffffffffdfd 0000000000005326 ffffffff801ad8bc
> ...
> Call Trace:
> [<ffffffff802460b0>] sr_drive_status+0x50/0xe8
> [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
> [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
> [<ffffffff801ad8bc>] compat_blkdev_ioctl+0x7cc/0x18e0
> [<ffffffff800d1870>] do_open+0x98/0x310
> [<ffffffff800d1d60>] blkdev_open+0x0/0xc0
> [<ffffffff800d1da8>] blkdev_open+0x48/0xc0
> [<ffffffff8009c444>] __dentry_open+0x114/0x2e0
> [<ffffffff8009c740>] do_filp_open+0x48/0x58
> [<ffffffff8009c740>] do_filp_open+0x48/0x58
> [<ffffffff800def8c>] compat_sys_ioctl+0xf4/0x440
> [<ffffffff80019154>] handle_sys+0x114/0x130
> [<ffffffff8001fcf3>] fpu_emulator_cop1Handler+0x362/0x2270
This is a bit strange. It's obviously O2 specific, which makes it a lot
harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce
(just in case this changes the symbol layout). Then ask gdb where
sr_drive_status+0x50 (or what it moves to) is:
gdb vmlinux
b *(sr_drive_status+0x50)
should identify the file and line.
The signature implies that cdi->handle may be NULL, so you could put in
a check for that as well.
Thanks,
James
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco
2008-03-16 16:39 ` James Bottomley
@ 2008-03-16 16:42 ` Matthew Wilcox
2008-03-16 18:29 ` Giuseppe Sacco
1 sibling, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2008-03-16 16:42 UTC (permalink / raw)
To: Giuseppe Sacco; +Cc: linux-scsi
On Sun, Mar 16, 2008 at 04:19:08PM +0100, Giuseppe Sacco wrote:
> testing latest kernels on SGI O2, I found this new kernel oops. It has
> been produced with kernel from linux-mips.org git of yesterday night.
> A very similar oops has been reported by others[0] using 2.6.22.
> CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, epc == 0000000000000000, ra == 0000000000000000
I'm not familiar with MIPS; is epc the program counter? If so, this
would be a branch to 0. That's somewhat confusing as I don't see any
function pointers used within sr_drive_status(). How accurate are MIPS
backtraces?
> Call Trace:
> [<ffffffff802460b0>] sr_drive_status+0x50/0xe8
> [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
> [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
It would be interesting to see a disassembly (objdump -dr
drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60.
And if that calls a function, it would be interesting to put in printks
to figure out where we're dereferencing a null pointer.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 16:42 ` Matthew Wilcox
@ 2008-03-16 18:29 ` Giuseppe Sacco
2008-03-17 3:58 ` Matthew Wilcox
2008-03-17 4:41 ` Matthew Wilcox
0 siblings, 2 replies; 9+ messages in thread
From: Giuseppe Sacco @ 2008-03-16 18:29 UTC (permalink / raw)
To: linux-scsi
Hi all,
Il giorno dom, 16/03/2008 alle 10.42 -0600, Matthew Wilcox ha scritto:
> On Sun, Mar 16, 2008 at 04:19:08PM +0100, Giuseppe Sacco wrote:
[...]
> > Call Trace:
> > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8
> > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
> > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
>
> It would be interesting to see a disassembly (objdump -dr
> drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60.
here it is:
(gdb) disassemble sr_drive_status+0x50
Dump of assembler code for function sr_drive_status:
0xffffffff80246060 <sr_drive_status+0>: daddiu sp,sp,-32
0xffffffff80246064 <sr_drive_status+4>: lui v0,0x7fff
0xffffffff80246068 <sr_drive_status+8>: sd s0,16(sp)
0xffffffff8024606c <sr_drive_status+12>: sd ra,24(sp)
0xffffffff80246070 <sr_drive_status+16>: ori v0,v0,0xffff
0xffffffff80246074 <sr_drive_status+20>: move s0,a0
0xffffffff80246078 <sr_drive_status+24>: bne a1,v0,0xffffffff802460e8 <sr_drive_status+136>
0xffffffff8024607c <sr_drive_status+28>: ld v1,24(a0)
0xffffffff80246080 <sr_drive_status+32>: ld a0,16(v1)
0xffffffff80246084 <sr_drive_status+36>: jal 0xffffffff80244c70 <sr_test_unit_ready>
0xffffffff80246088 <sr_drive_status+40>: daddiu a1,sp,4
0xffffffff8024608c <sr_drive_status+44>: bnez v0,0xffffffff802460a8 <sr_drive_status+72>
0xffffffff80246090 <sr_drive_status+48>: move a0,s0
0xffffffff80246094 <sr_drive_status+52>: li v0,4
0xffffffff80246098 <sr_drive_status+56>: ld ra,24(sp)
0xffffffff8024609c <sr_drive_status+60>: ld s0,16(sp)
0xffffffff802460a0 <sr_drive_status+64>: jr ra
0xffffffff802460a4 <sr_drive_status+68>: daddiu sp,sp,32
0xffffffff802460a8 <sr_drive_status+72>: jal 0xffffffff8024c838 <cdrom_get_media_event>
0xffffffff802460ac <sr_drive_status+76>: move a1,sp
0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156>
0xffffffff802460b4 <sr_drive_status+84>: lhu v0,0(sp)
0xffffffff802460b8 <sr_drive_status+88>: sll v0,v0,0x0
0xffffffff802460bc <sr_drive_status+92>: andi v0,v0,0xff
0xffffffff802460c0 <sr_drive_status+96>: andi v1,v0,0x2
0xffffffff802460c4 <sr_drive_status+100>: bnez v1,0xffffffff80246094 <sr_drive_status+52>
0xffffffff802460c8 <sr_drive_status+104>: andi v0,v0,0x1
0xffffffff802460cc <sr_drive_status+108>: beqz v0,0xffffffff80246098 <sr_drive_status+56>
0xffffffff802460d0 <sr_drive_status+112>: li v0,1
0xffffffff802460d4 <sr_drive_status+116>: ld ra,24(sp)
> And if that calls a function, it would be interesting to put in printks
> to figure out where we're dereferencing a null pointer.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 16:39 ` James Bottomley
@ 2008-03-16 18:32 ` Giuseppe Sacco
2008-03-16 18:47 ` James Bottomley
0 siblings, 1 reply; 9+ messages in thread
From: Giuseppe Sacco @ 2008-03-16 18:32 UTC (permalink / raw)
To: linux-scsi
Hi James,
Il giorno dom, 16/03/2008 alle 11.39 -0500, James Bottomley ha scritto:
> On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote:
[...]
> This is a bit strange. It's obviously O2 specific, which makes it a lot
> harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce
> (just in case this changes the symbol layout). Then ask gdb where
[...]
I cannot find any CONFIG_DEBUG_INFO. Do you mean CONFIG_DEBUG_KERNEL?
Thanks,
Giuseppe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 18:32 ` Giuseppe Sacco
@ 2008-03-16 18:47 ` James Bottomley
0 siblings, 0 replies; 9+ messages in thread
From: James Bottomley @ 2008-03-16 18:47 UTC (permalink / raw)
To: Giuseppe Sacco; +Cc: linux-scsi
On Sun, 2008-03-16 at 19:32 +0100, Giuseppe Sacco wrote:
> Hi James,
>
> Il giorno dom, 16/03/2008 alle 11.39 -0500, James Bottomley ha scritto:
> > On Sun, 2008-03-16 at 16:19 +0100, Giuseppe Sacco wrote:
> [...]
> > This is a bit strange. It's obviously O2 specific, which makes it a lot
> > harder. Can you compile the kernel with CONFIG_DEBUG_INFO and reproduce
> > (just in case this changes the symbol layout). Then ask gdb where
> [...]
>
> I cannot find any CONFIG_DEBUG_INFO. Do you mean CONFIG_DEBUG_KERNEL?
This from lib/Kconfig.debug:
config DEBUG_INFO
bool "Compile the kernel with debug info"
depends on DEBUG_KERNEL
help
If you say Y here the resulting kernel image will include
debugging info resulting in a larger kernel image.
This adds debug symbols to the kernel and modules (gcc -g), and
is needed if you intend to use kernel crashdump or binary object
tools like crash, kgdb, LKCD, gdb, etc on the kernel.
Say Y here only if you plan to debug the kernel.
If unsure, say N.
It does depend on CONFIG_DEBUG_KERNEL according to the depends clause.
James
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 18:29 ` Giuseppe Sacco
@ 2008-03-17 3:58 ` Matthew Wilcox
2008-03-17 4:41 ` Matthew Wilcox
1 sibling, 0 replies; 9+ messages in thread
From: Matthew Wilcox @ 2008-03-17 3:58 UTC (permalink / raw)
To: Giuseppe Sacco; +Cc: linux-scsi
On Sun, Mar 16, 2008 at 07:29:07PM +0100, Giuseppe Sacco wrote:
> > It would be interesting to see a disassembly (objdump -dr
> > drivers/scsi/sr_ioctl.o) of sr_drive_status from say 0x40 to 0x60.
>
> here it is:
>
> (gdb) disassemble sr_drive_status+0x50
> 0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156>
The thing about objdump -dr is that it gives you the name of the
function that's being called. gdb apparently doesn't, or would need a
different command from "disassemble".
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-16 18:29 ` Giuseppe Sacco
2008-03-17 3:58 ` Matthew Wilcox
@ 2008-03-17 4:41 ` Matthew Wilcox
2008-03-17 8:17 ` Giuseppe Sacco
1 sibling, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2008-03-17 4:41 UTC (permalink / raw)
To: Giuseppe Sacco; +Cc: linux-scsi
On Sun, Mar 16, 2008 at 07:29:07PM +0100, Giuseppe Sacco wrote:
> > > [<ffffffff802460b0>] sr_drive_status+0x50/0xe8
> > > [<ffffffff8024bb84>] cdrom_ioctl+0x5f4/0x1208
> > > [<ffffffff80245c6c>] sr_block_ioctl+0x64/0xe8
> >
> 0xffffffff802460a4 <sr_drive_status+68>: daddiu sp,sp,32
> 0xffffffff802460a8 <sr_drive_status+72>: jal 0xffffffff8024c838 <cdrom_get_media_event>
> 0xffffffff802460ac <sr_drive_status+76>: move a1,sp
> 0xffffffff802460b0 <sr_drive_status+80>: bnez v0,0xffffffff802460fc <sr_drive_status+156>
I think I was confused earlier. 156 is 0x9c, thus within the function.
The backtrace must be incorrect; this is really 0x48 and thus a call to
cdrom_get_media_event, which points the finger at
cdi->ops->generic_packet being NULL.
Put a BUG_ON(!cdi->ops->generic_packet) in drivers/cdrom/cdrom.c right
before the line that calls it (ie line 11 of cdrom_get_media_event).
That should trigger and give a better backtrace. Then it's a simple (*)
matter of figuring out why it's NULL.
* This is sarcasm.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: new kernel oops in recent kernels
2008-03-17 4:41 ` Matthew Wilcox
@ 2008-03-17 8:17 ` Giuseppe Sacco
0 siblings, 0 replies; 9+ messages in thread
From: Giuseppe Sacco @ 2008-03-17 8:17 UTC (permalink / raw)
To: linux-scsi
Hi all,
I wrote a message to linux-mips mailing list for investigating on the
assembly code generated in sr_cdrom_status() since adding the suggested
printk() stopped the oops. I supposed there is a problem with the
compiler, but people there are investigating about problem with cache
coherency.
You may follow the thread on public web archive, at
http://www.linux-mips.org/archives/linux-mips/2008-03/msg00079.html
I'll be back on this list after checking any problem with cache
coherence and c compiler.
Thanks,
Giuseppe
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-17 8:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-16 15:19 new kernel oops in recent kernels Giuseppe Sacco
2008-03-16 16:39 ` James Bottomley
2008-03-16 18:32 ` Giuseppe Sacco
2008-03-16 18:47 ` James Bottomley
2008-03-16 16:42 ` Matthew Wilcox
2008-03-16 18:29 ` Giuseppe Sacco
2008-03-17 3:58 ` Matthew Wilcox
2008-03-17 4:41 ` Matthew Wilcox
2008-03-17 8:17 ` Giuseppe Sacco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox