[Qemu-devel] Intermittant linux kernel panic on arm

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Intermittant linux kernel panic on arm
@ 2007-03-07 18:14 Quentin Barnes
  2007-03-10  0:05 ` Rob Landley
  0 siblings, 1 reply; 3+ messages in thread
From: Quentin Barnes @ 2007-03-07 18:14 UTC (permalink / raw)
  To: qemu-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit; format=flowed, Size: 8808 bytes --]

This is my first post to the list.  Hopefully, it will go well.

I've been using the ARM qemu for Linux development for some basic
work, but wanted to expand and do more with it.  I outgrew the
initrd limitation and needed a disk.  Since I've been using a disk,
I've been getting intermittant panics during the udevd phase of
boot.  Once the system is up though, it's been stable.

The panic occurs about 50%-75% the time.  I've had this problem on both
0.9.0 and the 2007-03-07_05 snapshot.  I've had this problem using both
my own 2.6.19-1 ARM kernel made directly from kernel.org as well as
Aurelien Jarno's 2.6.18 Debian ARM kernel at
http://people.debian.org/~aurel32/arm-versatile/vmlinuz-2.6.18-4-versatile 

My standard invocation:
  $ qemu-system-arm -M versatilepb -k en-us -kernel zImage \
	-initrd initrd.img-2.6.18-4-versatile -hda hda.img \
	-monitor stdio -append "root=/dev/sda1"

My host system is redhat FC6 with a Linux 2.6.19-1 i686 kernel.  My
"disk" is a qcow'd 20GB image.

Based on the stack traceback, I thought the panic might have to do
with the SCSI chip emulation not doing residuals correctly, so I
built a kernel with SYM_SETUP_RESIDUAL_SUPPORT set to 0.  That didn't
change anything.

I've googled around and checked the mailing list archives and can't
find anything like this.  Why haven't other people seen this?
Am I doing something unusual or wrong?

Any thoughts or ideas to try?

Quentin


Here's the panic specifics:
=================================

Gdb attached to qemu in gdbserver mode with breakpoint on "sym_evaluate_dp":
=====
Breakpoint 1, sym_evaluate_dp (np=0xffd00000, cp=0xffd00c00, scr=1342180112, 
    ofs=0xc09dd9fc) at drivers/scsi/sym53c8xx_2/sym_hipd.c:3570
3570    in drivers/scsi/sym53c8xx_2/sym_hipd.c
(gdb) c
Continuing.

Breakpoint 1, sym_evaluate_dp (np=0xffd00000, cp=0xffd04c00, scr=1342180112, 
    ofs=0xc063b9fc) at drivers/scsi/sym53c8xx_2/sym_hipd.c:3570
3570    in drivers/scsi/sym53c8xx_2/sym_hipd.c
(gdb) c
=====
After this continue, the kernel panics with fault to ffd05a98.
cp=0xffd04c00 is different this time.  All other couple of dozen
times it is entered with the same value, cp=0xffd00c00.


Boot console output:
=====
PCI: enabling device 0000:00:0c.0 (0140 -> 0143)
sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 27
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    0.9. PQ: 0 ANSI: 3
 target0:0:0: tagged command queuing enabled, command queue depth 16.
 target0:0:0: Beginning Domain Validation
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
scsi 0:0:2:0: CD-ROM            QEMU     QEMU CD-ROM      0.9. PQ: 0 ANSI: 3
 target0:0:2: tagged command queuing enabled, command queue depth 16.
 target0:0:2: Beginning Domain Validation
 target0:0:2: Domain Validation skipping write tests
 target0:0:2: Ending Domain Validation
SCSI device sda: 41943040 512-byte hdwr sectors (21475 MB)
sda: Write Protect is off
sda: Mode Sense: 13 00 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 41943040 512-byte hdwr sectors (21475 MB)
sda: Write Protect is off
sda: Mode Sense: 13 00 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 < sda5 >
sd 0:0:0:0: Attached scsi disk sda
[...]
INIT: version 2.86 booting
Starting the hotplug events dispatcher: udevd.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...Unable to handle kernel paging request at virtual address ffd05a98
pgd = c05a8000
[ffd05a98] *pgd=00a6d011, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
Modules linked in:
CPU: 0
PC is at sym_evaluate_dp+0x9c/0x184
LR is at 0xff0000da
pc : [<c01a97a0>]    lr : [<ff0000da>]    Not tainted
sp : c0bc79e4  ip : ffd05aa0  fp : c0bc79f4
r10: 00000000  r9 : 00000000  r8 : 000009f8
r7 : 0000027e  r6 : c7da6180  r5 : ffd00000  r4 : c0bc79fc
r3 : 00000ea0  r2 : 0000005f  r1 : ffd04c00  r0 : 000001c9
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  Segment user
Control: 3137
Table: 005A8000  DAC: 00000015
Process scsi_id (pid: 976, stack limit = 0xc0bc6258)
[... Stack contents removed ...]
Backtrace: 
[<c01a9704>] (sym_evaluate_dp+0x0/0x184) from [<c01a99fc>] (sym_compute_residual+0x78/0xe4)
 r4 = FFD04C00 
[<c01a9984>] (sym_compute_residual+0x0/0xe4) from [<c01acfb4>] (sym_interrupt+0xf8/0x18e0)
 r4 = FFD04C00 
[<c01acebc>] (sym_interrupt+0x0/0x18e0) from [<c01a7844>] (sym53c8xx_intr+0x3c/0x6c)
[<c01a7808>] (sym53c8xx_intr+0x0/0x6c) from [<c005c458>] (handle_IRQ_event+0x44/0x84)
 r5 = 00000000  r4 = C7D79E60 
[<c005c414>] (handle_IRQ_event+0x0/0x84) from [<c005dba8>] (handle_level_irq+0xac/0x104)
 r7 = C7DA4800  r6 = 00000001  r5 = 0000001B  r4 = C029A6C0
[<c005dafc>] (handle_level_irq+0x0/0x104) from [<c0023780>] (asm_do_IRQ+0x4c/0x68)
 r5 = F1140000  r4 = 00000000 
[<c0023734>] (asm_do_IRQ+0x0/0x68) from [<c02443f0>] (__irq_svc+0x30/0xa0)
 r4 = FFFFFFFF 
[<c019a760>] (scsi_dispatch_cmd+0x0/0x25c) from [<c019fa40>] (scsi_request_fn+0x250/0x31c)
 r7 = C7DA03E8  r6 = C7DABC00  r5 = C7DA4800  r4 = C7DA4000
[<c019f7f0>] (scsi_request_fn+0x0/0x31c) from [<c013f974>] (elv_insert+0x80/0x1c4)
[<c013f8f4>] (elv_insert+0x0/0x1c4) from [<c013fb6c>] (__elv_add_request+0xb4/0xb8)
 r7 = C0BC7CB0  r6 = 00000002  r5 = C7DABC00  r4 = C7DA03E8
[<c013fab8>] (__elv_add_request+0x0/0xb8) from [<c0142914>] (blk_execute_rq_nowait+0x80/0xa8)
 r6 = 00000002  r5 = C7DABC00  r4 = C7DA03E8 
[<c0142894>] (blk_execute_rq_nowait+0x0/0xa8) from [<c01429c4>] (blk_execute_rq+0x88/0xa8)
 r6 = C7DA59E0  r5 = 00000000  r4 = C7DA03E8 
[<c014293c>] (blk_execute_rq+0x0/0xa8) from [<c0146204>] (sg_io+0x28c/0x3b4)
[<c0145f78>] (sg_io+0x0/0x3b4) from [<c0146850>] (scsi_cmd_ioctl+0x1e4/0x41c)
[<c014666c>] (scsi_cmd_ioctl+0x0/0x41c) from [<c01b1520>] (sd_ioctl+0x90/0xc0)
[<c01b1490>] (sd_ioctl+0x0/0xc0) from [<c0144708>] (blkdev_driver_ioctl+0x50/0x5c)
[<c01446b8>] (blkdev_driver_ioctl+0x0/0x5c) from [<c0144eac>] (blkdev_ioctl+0x754/0x7b0)
 r5 = BE992400  r4 = FFFFFDFD 
[<c0144758>] (blkdev_ioctl+0x0/0x7b0) from [<c00a2464>] (block_ioctl+0x2c/0x30)
[<c00a2438>] (block_ioctl+0x0/0x30) from [<c0089060>] (do_ioctl+0x34/0x74)
[<c008902c>] (do_ioctl+0x0/0x74) from [<c0089304>] (vfs_ioctl+0x264/0x294)
 r5 = BE992400  r4 = C0BA0D20 
[<c00890a0>] (vfs_ioctl+0x0/0x294) from [<c0089374>] (sys_ioctl+0x40/0x64)
 r7 = 00000036  r6 = 00002285  r5 = FFFFFFF7  r4 = C0BA0D20
[<c0089334>] (sys_ioctl+0x0/0x64) from [<c00228c0>] (ret_fast_syscall+0x0/0x2c)
 r6 = 00016A88  r5 = 00000003  r4 = 00000006 
Code: e35e0000 e2632060 aa00000f ea000008 (e53c3008) 
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
=====


Contents of "cp" on entry to sym_evaluate_dp() just before it paniced:
=====
(gdb) print *cp
$6 = {phys = {head = {go = {start = 0x50000058, restart = 0x500004c0}, 
      savep = 0x500007e8, lastp = 0x50000b10, status = "\000\204\000@"}, 
    pm0 = {sg = {size = 0xffffff26, addr = 0x5840da}, ret = 0x50001338}, 
    pm1 = {sg = {size = 0x0, addr = 0x0}, ret = 0x0}, select = {
      sel_scntl4 = 0x0, sel_sxfer = 0x0, sel_id = 0x0, sel_scntl3 = 0x7}, 
    smsg = {size = 0x8, addr = 0xa44f9c}, smsg_ext = {size = 0x0, 
      addr = 0x7d524de}, cmd = {size = 0x6, addr = 0xa44f5c}, sense = {
      size = 0x0, addr = 0x0}, wresid = {size = 0x0, addr = 0x0}, data = {{
        size = 0x0, addr = 0x0} <repeats 80 times>, {size = 0x2000, 
        addr = 0x7eb8000}, {size = 0x2000, addr = 0x7dce000}, {size = 0x2000, 
        addr = 0x7d6e000}, {size = 0x2000, addr = 0x7df0000}, {size = 0x2000, 
        addr = 0x7de4000}, {size = 0x2000, addr = 0x9d0000}, {size = 0x1000, 
        addr = 0x71f000}, {size = 0x1000, addr = 0x7c25000}, {size = 0x1000, 
        addr = 0xaff000}, {size = 0x1000, addr = 0xae2000}, {size = 0x1000, 
        addr = 0x73d000}, {size = 0x1000, addr = 0x722000}, {size = 0x1000, 
        addr = 0x6a2000}, {size = 0x1000, addr = 0x742000}, {size = 0x1000, 
        addr = 0x634000}, {size = 0xfe, addr = 0x584000}}}, cmd = 0xc7da6180, 
  cdb_buf = "\022\001\000\000þ\000\000\000\b\000\000\000\000\000\000", 
  sns_bbuf = '\0' <repeats 31 times>, data_len = 0xfe, segments = 0x1, 
  order = 0x20, odd_byte_adjustment = 0x0, nego_status = 0x1, 
  xerr_status = 0x0, extra_bytes = 0x0, 
  scsi_smsg = "À e\001\003\001\n\037\000\000\000", 
  scsi_smsg2 = '\0' <repeats 11 times>, sensecmd = "\000\000\000\000\000", 
  sv_scsi_status = 0x0, sv_xerr_status = 0x0, sv_resid = 0x0, 
  ccb_ba = 0xa44c00, tag = 0x32, target = 0x0, lun = 0x0, link_ccbh = 0x0, 
  link_ccbq = {flink = 0xffd004fc, blink = 0xffd004fc}, startp = 0x500007e8, 
  goalp = 0x500007f8, ext_sg = 0xffffffff, ext_ofs = 0x0, to_abort = 0x0, 
  tags_si = 0x1}
=====
=================================

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Intermittant linux kernel panic on arm
  2007-03-07 18:14 [Qemu-devel] Intermittant linux kernel panic on arm Quentin Barnes
@ 2007-03-10  0:05 ` Rob Landley
  2007-03-12  5:33   ` Quentin Barnes
  0 siblings, 1 reply; 3+ messages in thread
From: Rob Landley @ 2007-03-10  0:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Quentin Barnes

On Wednesday 07 March 2007 1:14 pm, Quentin Barnes wrote:
> This is my first post to the list.  Hopefully, it will go well.
> 
> I've been using the ARM qemu for Linux development for some basic
> work, but wanted to expand and do more with it.  I outgrew the
> initrd limitation and needed a disk.  Since I've been using a disk,
> I've been getting intermittant panics during the udevd phase of
> boot.  Once the system is up though, it's been stable.
> 
> The panic occurs about 50%-75% the time.  I've had this problem on both
> 0.9.0 and the 2007-03-07_05 snapshot.  I've had this problem using both
> my own 2.6.19-1 ARM kernel made directly from kernel.org as well as
> Aurelien Jarno's 2.6.18 Debian ARM kernel at
> http://people.debian.org/~aurel32/arm-versatile/vmlinuz-2.6.18-4-versatile 

The first thing I'd do is try to figure out what's udev doing to trigger this 
panic?

>  r7 = 00000036  r6 = 00002285  r5 = FFFFFFF7  r4 = C0BA0D20
> [<c0089334>] (sys_ioctl+0x0/0x64) from [<c00228c0>] 
(ret_fast_syscall+0x0/0x2c)

Is there any way you can figure out which ioctl this is?  Presumably udev read 
something from /sys that told it to mknod something in /dev.  I'm not quite 
sure where an ioctl comes into this...

If you could get a small C program that triggers the panic, and a 
kernel .config you built your kernel with, that would be helpful.

Rob
-- 
Vista: Windows Millenium Second Edition

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Intermittant linux kernel panic on arm
  2007-03-10  0:05 ` Rob Landley
@ 2007-03-12  5:33   ` Quentin Barnes
  0 siblings, 0 replies; 3+ messages in thread
From: Quentin Barnes @ 2007-03-12  5:33 UTC (permalink / raw)
  To: Rob Landley; +Cc: qemu-devel

On Fri, Mar 09, 2007 at 07:05:27PM -0500, Rob Landley wrote:
>On Wednesday 07 March 2007 1:14 pm, Quentin Barnes wrote:
>> This is my first post to the list.  Hopefully, it will go well.
>> 
>> I've been using the ARM qemu for Linux development for some basic
>> work, but wanted to expand and do more with it.  I outgrew the
>> initrd limitation and needed a disk.  Since I've been using a disk,
>> I've been getting intermittant panics during the udevd phase of
>> boot.  Once the system is up though, it's been stable.
>> 
>> The panic occurs about 50%-75% the time.  I've had this problem on both
>> 0.9.0 and the 2007-03-07_05 snapshot.  I've had this problem using both
>> my own 2.6.19-1 ARM kernel made directly from kernel.org as well as
>> Aurelien Jarno's 2.6.18 Debian ARM kernel at
>> http://people.debian.org/~aurel32/arm-versatile/vmlinuz-2.6.18-4-versatile 
>
>The first thing I'd do is try to figure out what's udev doing to trigger this 
>panic?
>
>>  r7 = 00000036  r6 = 00002285  r5 = FFFFFFF7  r4 = C0BA0D20
>> [<c0089334>] (sys_ioctl+0x0/0x64) from [<c00228c0>] 
>(ret_fast_syscall+0x0/0x2c)
>
>Is there any way you can figure out which ioctl this is?
>Presumably udev read something from /sys that told it to mknod
>something in /dev.  I'm not quite sure where an ioctl comes into
>this...
>
>If you could get a small C program that triggers the panic, and a
>kernel .config you built your kernel with, that would be helpful.

I don't know if writing a small C program would trigger the panic.
The same ioctl happens earlier in the startup which doesn't panic.
However, I could still give it a try at some point if we have
no other ideas.

I ioctl is for an SG_IO which is doing a SCSI inquiry command:
==============
Breakpoint 3, scsi_dispatch_cmd (cmd=0xc7db7180) at drivers/scsi/scsi.c:475
475             struct Scsi_Host *host = cmd->device->host;
1: x/i $pc  0xc019e7f4 <scsi_dispatch_cmd+12>:  ldr     r1, [r0]
(gdb) print /c cmd->cmnd
$8 = {0x12, 0x1, 0x0, 0x0, 0xfe, 0x0 <repeats 11 times>}
==============

The SCSI inquiry command is properly formed and dispatched for a EVPD=1
to do a VPD read of 0x00.

It calls: sym_interrupt() -> sym_wakeup_done() -> sym_complete_ok().

In sym_complete_ok(), it executes:
	if (cp->phys.head.lastp != cp->goalp)
		resid = sym_compute_residual(np, cp);

cp->phys.head.lastp is 0x50000b10 and cp->goalp is 0x500007f8.  Since
they're not equal, the driver thinks there is a residual.

cp->startp is 0x500007e8 which seems to make sense to me.  I would
expect "lastp" to be between "startp" and "goalp", but it's not,
however, I'm just guessing here since I don't know SCSI at all.

Any ideas what might be wrong?


Partial contents of "cp" that leads up to panic:
==============
$17 = {phys = {head = {go = {start = 0x50000058, restart = 0x500004c0}, 
      savep = 0x500007e8, lastp = 0x50000b10, status = "\000\204\000@"}, 
[...]
  order = 0x20, odd_byte_adjustment = 0x0, nego_status = 0x1, 
  xerr_status = 0x0, extra_bytes = 0x0, 
  scsi_smsg = "\xc0 g\001\003\001\n\037\000\000\000", 
  scsi_smsg2 = '\0' <repeats 11 times>, sensecmd = "\000\000\000\000\000", 
  sv_scsi_status = 0x0, sv_xerr_status = 0x0, sv_resid = 0x0, 
  ccb_ba = 0x7db3c00, tag = 0x33, target = 0x0, lun = 0x0, link_ccbh = 0x0, 
  link_ccbq = {flink = 0xffd004fc, blink = 0xffd004fc}, startp = 0x500007e8, 
  goalp = 0x500007f8, ext_sg = 0xffffffff, ext_ofs = 0x0, to_abort = 0x0, 
  tags_si = 0x1}
==============

A strange thing to note is that this panic is only intermittent when
in graphics mode, but happens 100% of the time when qemu is in tty
console mode.  If the boot makes it past this point, this system is
really stable.  I've done hours of builds on it without it falling
over.

>Rob
>-- 
>Vista: Windows Millenium Second Edition

Quentin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-03-12  5:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-07 18:14 [Qemu-devel] Intermittant linux kernel panic on arm Quentin Barnes
2007-03-10  0:05 ` Rob Landley
2007-03-12  5:33   ` Quentin Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).