public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kernel 2.4.17 crashes on SCSI-errors
@ 2002-01-03 12:05 R.Oehler
  2002-01-03 12:09 ` Jens Axboe
  2002-01-03 12:43 ` Alan Cox
  0 siblings, 2 replies; 11+ messages in thread
From: R.Oehler @ 2002-01-03 12:05 UTC (permalink / raw)
  To: Scsi, linux-kernel

Hi, List

right now I tried the new kernel 2.4.17, hoping, that
the SCSI-system is now useable again. 
But NO! It immediately crashed, like the few kernels before.

In the meantime I'm really getting into problems with
our product, because I expect SuSE to launch their next 
release soon with an instable "stable" kernel.

Isn't anybody recognizing, that this bug is serious?
3.5" MO-drives report blank sectors as "SCSI-Hardware-Error"
This kind of sense code also appears for errors, that
are much more common than blanked sectors.
Any flaw in SCSI-disks will crash the kernel.
Please don't rely on modern hardware to be so perfect, that
errors will never occure. Then you could likewise remove
the complete error-handling-code. 
This would at least prevent the crashes...


Here is a simple procedure to reliably trigger the BUG:

1) I compiled the SCSI-stuff as modules.
2) I put an erased MO-Medium in a MO-SCSI-drive.
3) I connected the drive to the computer.
4) I typed "modprobe sd_mod"
5) Crash! Serial console said:

Welcome to SuSE Linux 7.3 (i386) - Kernel 2.4.17 (ttyS0).

tick login: invalid operand: 0000
CPU:    0
EIP:    0010:[<d0851735>]    Not tainted
EFLAGS: 00010082
eax: 00000042   ebx: ce3dc070   ecx: c0224080   edx: 0000270d
esi: c009e018   edi: 00000018   ebp: c009e000   esp: c0237dd4
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0237000)
Stack: d0867340 00000093 cf95b9ac cfb6de00 c0237e2c 00000000 66656400 00000006 
       cfb6de10 00000002 00000003 00000282 41000031 c0220002 ce434a00 d0851346 
       cfb6de00 ce468ecc 00000293 ce434ab8 ce434a00 cf4f416c 00000092 d083466a 
Call Trace: [<d0867340>] [<d0851346>] [<d083466a>] [<d0834df8>] [<d083baaf>] 
   [<d084e880>] [<d083b10e>] [<d083b2b3>] [<d083b318>] [<d083b7a0>] [<d084cce8>] 
   [<d08351f7>] [<d0835099>] [<c01176a2>] [<c01175d9>] [<c01173ca>] [<c0107f8d>] 
   [<c0105150>] [<c0105150>] [<c0105173>] [<c01051d7>] [<c0105000>] [<c0105027>] 

Code: 0f 0b 83 c4 08 83 3e 00 74 13 8b 06 05 00 00 00 40 89 46 0c 
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing



Again I offer my time and my hardware for testing purposes.
I cannot fix the bug in the kernel myself, but I can test patches
and provide resulting stack traces.

Regards,
        Ralf

 -----------------------------------------------------------------
|  Ralf Oehler
|  GDI - Gesellschaft fuer Digitale Informationstechnik mbH
|
|  E-Mail:      R.Oehler@GDImbH.com
|  Tel.:        +49 6182-9271-23 
|  Fax.:        +49 6182-25035           
|  Mail:        GDI, Bensbruchstraße 11, D-63533 Mainhausen
|  HTTP:        www.GDImbH.com
 -----------------------------------------------------------------

time is a funny concept


^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: kernel 2.4.17 crashes on SCSI-errors
@ 2002-01-03 13:39 R.Oehler
  2002-01-03 19:33 ` Keith Owens
  0 siblings, 1 reply; 11+ messages in thread
From: R.Oehler @ 2002-01-03 13:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox, Jens Axboe



Ksymoops was not possible, because after rebooting the 
memory/module-layout had changed. (Or is there a trick
I don't know?) To get a useable stack chain I patched 
the kernel with SGI's kadb and reproduced the crash.

I'm using the aic7xxx controller. (With both, the old and 
the new one I can reproduce the crash).

Alan Cox reports that the -ac kernels behave. This
makes me believe that the BUG sneaked into the linus
kernel at 2.4.10, where heavy block layer changes
happened, which were not applied to alan's kernel.
linus-2.4.0 behaves, too.



Here is
1) register dump
2) stack chain
3) last few lines dumped from syslog buffer


Is there anything more I can do?

Regards,
        Ralf





Welcome to SuSE Linux 7.3 (i386) - Kernel 2.4.17-Dbg (ttyS0).

tick login: 
Entering kdb (current=0xc0290000, pid 0) Oops: invalid operand
due to oops @ 0xd0853729
eax = 0x00000046 ebx = 0xce550070 ecx = 0xc027de00 edx = 0x0000276d 
esi = 0xc009e018 edi = 0x00000018 esp = 0xc0291d94 eip = 0xd0853729 
ebp = 0xc0291dd8 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010002 
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xc0291d60


kdb> bt
    EBP       EIP         Function(args)
0xc0291dd8 0xd0853729 [aic7xxx]ahc_linux_run_device_queue+0x39d (0xcfb6be00, 0xce62ed1c)
                               aic7xxx .text 0xd0852060 0xd085338c 0xd0853c90
0xc0291dfc 0xd0853356 [aic7xxx]ahc_linux_queue+0x172 (0xce5f6a00, 0xd0834de0)
                               aic7xxx .text 0xd0852060 0xd08531e4 0xd085338c
0xc0291e20 0xd0834674 [scsi_mod]scsi_dispatch_cmd+0x1a4 (0xce5f6a00, 0xce5f6a00)
                               scsi_mod .text 0xd0834060 0xd08344d0 0xd083481c
0xc0291e50 0xd083bc7d [scsi_mod]scsi_request_fn+0x2bd (0xcf9f77b4)
                               scsi_mod .text 0xd0834060 0xd083b9c0 0xd083bcb4
0xc0291e6c 0xd083b2d6 [scsi_mod]scsi_queue_next_request+0x46 (0xcf9f77b4, 0xce5f6a00)
                               scsi_mod .text 0xd0834060 0xd083b290 0xd083b39c
0xc0291e88 0xd083b489 [scsi_mod]__scsi_end_request+0xed (0xce5f6a00, 0x0, 0x0, 0x1, 0x1)
                               scsi_mod .text 0xd0834060 0xd083b39c 0xd083b4d4
0xc0291ea4 0xd083b4ec [scsi_mod]scsi_end_request+0x18 (0xce5f6a00, 0x0, 0x2)
                               scsi_mod .text 0xd0834060 0xd083b4d4 0xd083b4f0
0xc0291ee0 0xd083b96b [scsi_mod]scsi_io_completion+0x3ab (0xce5f6a00, 0x0, 0x1)
                               scsi_mod .text 0xd0834060 0xd083b5c0 0xd083b978
0xc0291f10 0xd084ecec [sd_mod]rw_intr+0x1e8 (0xce5f6a00)
                               sd_mod .text 0xd084e060 0xd084eb04 0xd084ecf8
0xc0291f28 0xd0835214 [scsi_mod]scsi_finish_command+0xdc (0xce5f6a00)
                               scsi_mod .text 0xd0834060 0xd0835138 0xd0835220
0xc0291f3c 0xd083508a [scsi_mod]scsi_bottom_half_handler+0x1f2
more> 
                               scsi_mod .text 0xd0834060 0xd0834e98 0xd08350ac
0xc0291f44 0xc0117dd0 bh_action+0x1c (0x8)
                               kernel .text 0xc0100000 0xc0117db4 0xc0117df8
0xc0291f5c 0xc0117ce9 tasklet_hi_action+0x59 (0xc02a85c0)
                               kernel .text 0xc0100000 0xc0117c90 0xc0117d10
0xc0291f78 0xc0117aac do_softirq+0x4c
                               kernel .text 0xc0100000 0xc0117a60 0xc0117b00
0xc0291f90 0xc01083ed do_IRQ+0xa1 (0xc0290000, 0xc14e4000, 0xc14e4270, 0xc0105170, 0xffffe000)
                               kernel .text 0xc0100000 0xc010834c 0xc0108400
0xc0291fcc 0xc01f33b8 call_do_IRQ+0x5
                               kernel .rodata 0xc01f1b00 0xc01f33b3 0xc01f33c0
           0xc0105207 cpu_idle+0x3f
                               kernel .text 0xc0100000 0xc01051c8 0xc010521c
0xc0291fe8 0xc010502a stext+0x2a
                               kernel .text 0xc0100000 0xc0105000 0xc0105030
0xc0291ff8 0xc0292931 start_kernel+0x101
                               kernel .text.init 0xc0292000 0xc0292830 0xc0292938


>From log_buf[]:
<4>SCSI device sda: 1273011 1024-byte hdwr sectors (1304 MB).
<4>sda: Write Protect is off.
<6> /dev/scsi/host0/bus0/target1/lun0:SCSI disk error : 
    host 0 channel 0 id 1 lun 0 return code = 8000002.
<4>Info fld=0x0, Current sd08:00: sense key Blank Check.
<4> I/O error: dev 08:00, sector 0.
<4>Incorrect number of segments after building list.
<4>kernel BUG at /usr/src/linux-SuSE73-2.4.17-Dbg/include/asm/pci.h:147!


Regards,
        Ralf Oehler

 -----------------------------------------------------------------
|  Ralf Oehler
|  GDI - Gesellschaft fuer Digitale Informationstechnik mbH
|
|  E-Mail:      R.Oehler@GDImbH.com
|  Tel.:        +49 6182-9271-23 
|  Fax.:        +49 6182-25035           
|  Mail:        GDI, Bensbruchstraße 11, D-63533 Mainhausen
|  HTTP:        www.GDImbH.com
 -----------------------------------------------------------------

time is a funny concept


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-01-04  9:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-03 12:05 kernel 2.4.17 crashes on SCSI-errors R.Oehler
2002-01-03 12:09 ` Jens Axboe
2002-01-03 12:34   ` Jens Axboe
2002-01-03 12:43 ` Alan Cox
2002-01-03 21:28   ` Andrew Morton
2002-01-04  9:43     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2002-01-03 13:39 R.Oehler
2002-01-03 19:33 ` Keith Owens
2002-01-04  8:31   ` R.Oehler
2002-01-04  8:45     ` Jens Axboe
2002-01-04  9:48       ` R.Oehler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox