* death by ATA
@ 2001-11-15 19:22 Andrew Morton
2001-11-16 22:45 ` Jens Axboe
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2001-11-15 19:22 UTC (permalink / raw)
To: Andre Hedrick, lkml
Hi, Andre.
A shared mapping stress test tool was developed for ext3 testing.
It's called `bash-shared-mapping', and you run it with a script
called `run-bash-shared-mapping.sh'. These tools are in
http://www.zip.com.au/~akpm/ext3-tools.tar.gz
This program is the meanest stress tester I have ever seen. It
has the scalps of three or four core kernel bugs on its belt and
several ext3 ones too.
Its IPI rate kills my otherwise-fine-runs-cerberus-overnight BP6
in thirty seconds.
Its IDE load kills my otherwise-fine-runs-cerberus-overnight
P6DBE in a few hours.
We have another failure here on a uniprocessor VIA C3, running
2.4.15-pre4. The controller is a VT8231. Running at UDMA100.
The oops indicates a bug in the IDE driver's error recovery. Note
the "ide0: reset: success" followed by a null pointer deref in the
IDE interrupt handler. You'll see that local variable `rq' has
value zero.
What does "end-request: buffer-list destroyed" mean?
What does "ide_dmaproc: chipset supported ide_dma_timeout func only: 14"
mean?
Do you think this is caused by a hardware failure? If so, do you
expect that the reset recovery (once the oops is fixed) will bring
the disk back online?
Thanks.
end_request: buffer-list destroyed
hda8: bad access: block=5296, count=-2
end_request: I/O error, dev 03:08 (hda), sector 5296
hda8: bad access: block=5298, count=-4
end_request: I/O error, dev 03:08 (hda), sector 5298
hda8: bad access: block=5300, count=-6
end_request: I/O error, dev 03:08 (hda), sector 5300
hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: status timeout: status=0xd0 { Busy }
hda: no DRQ after issuing WRITE
ide0: reset: success
Unable to handle kernel NULL pointer dereference at virtual address 00000020
printing eip:
c01dee53
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01dee53>] Not tainted
EFLAGS: 00010002
eax: c039a350 ebx: 00000000 ecx: c11e5160 edx: c03701f7
esi: 00000008 edi: c039a380 ebp: c039a340 esp: c02f7f30
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c02f7000)
Stack: c01d5df0 c039a380 c039a380 c11e5160 c039a380 c11e5160 00000202 c01d62b0
c039a380 c01dedf0 c118f640 04000001 0000000e c02f7fac c010833a 0000000e
c11e5160 c02f7fac c02f7fac 0000000e c036dac0 c118f640 c01084c8 0000000e
Call Trace: [<c01d5df0>] [<c01d62b0>] [<c01dedf0>] [<c010833a>] [<c01084c8>]
[<c01051a0>] [<c010a738>] [<c01051a0>] [<c01051c4>] [<c0105242>] [<c0105000>]
Code: 8b 43 20 74 08 48 75 0c e9 b0 00 00 00 48 0f 85 a9 00 00 00
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
c01dee53
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01dee53>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: c039a350 ebx: 00000000 ecx: c11e5160 edx: c03701f7
esi: 00000008 edi: c039a380 ebp: c039a340 esp: c02f7f30
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c02f7000)
Stack: c01d5df0 c039a380 c039a380 c11e5160 c039a380 c11e5160 00000202 c01d62b0
c039a380 c01dedf0 c118f640 04000001 0000000e c02f7fac c010833a 0000000e
c11e5160 c02f7fac c02f7fac 0000000e c036dac0 c118f640 c01084c8 0000000e
Call Trace: [<c01d5df0>] [<c01d62b0>] [<c01dedf0>] [<c010833a>] [<c01084c8>]
[<c01051a0>] [<c010a738>] [<c01051a0>] [<c01051c4>] [<c0105242>] [<c0105000>]
Code: 8b 43 20 74 08 48 75 0c e9 b0 00 00 00 48 0f 85 a9 00 00 00
>>EIP; c01dee53 <write_intr+63/150> <=====
Trace; c01d5df0 <ide_do_request+2b0/300>
Trace; c01d62b0 <ide_intr+100/160>
Trace; c01dedf0 <write_intr+0/150>
Trace; c010833a <handle_IRQ_event+3a/70>
Trace; c01084c8 <do_IRQ+78/c0>
Trace; c01051a0 <default_idle+0/40>
Trace; c010a738 <call_do_IRQ+5/d>
Trace; c01051a0 <default_idle+0/40>
Trace; c01051c4 <default_idle+24/40>
Trace; c0105242 <cpu_idle+42/60>
Trace; c0105000 <_stext+0/0>
Code; c01dee53 <write_intr+63/150>
0000000000000000 <_EIP>:
Code; c01dee53 <write_intr+63/150> <=====
0: 8b 43 20 mov 0x20(%ebx),%eax <=====
Code; c01dee56 <write_intr+66/150>
3: 74 08 je d <_EIP+0xd> c01dee60 <write_intr+70/150>
Code; c01dee58 <write_intr+68/150>
5: 48 dec %eax
Code; c01dee59 <write_intr+69/150>
6: 75 0c jne 14 <_EIP+0x14> c01dee67 <write_intr+77/150>
Code; c01dee5b <write_intr+6b/150>
8: e9 b0 00 00 00 jmp bd <_EIP+0xbd> c01def10 <write_intr+120/150>
Code; c01dee60 <write_intr+70/150>
d: 48 dec %eax
Code; c01dee61 <write_intr+71/150>
e: 0f 85 a9 00 00 00 jne bd <_EIP+0xbd> c01def10 <write_intr+120/150>
<0>Kernel panic: Aiee, killing interrupt handler!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: death by ATA
2001-11-15 19:22 death by ATA Andrew Morton
@ 2001-11-16 22:45 ` Jens Axboe
2001-11-17 1:21 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2001-11-16 22:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andre Hedrick, lkml
On Thu, Nov 15 2001, Andrew Morton wrote:
> What does "end-request: buffer-list destroyed" mean?
It means that the request was not sane anymore, or specifically that
clustered number of sectors was set to lower value than current number
of sectors (which isn't valid, of course). The buffer-list destroyed
comment tells you that this corruption is most likely due to the
buffer_head list on the request having been corrupted -- which in turn
probably means that someone seriously screwed this request.
hda8: bad access: block=5296, count=-2
end_request: I/O error, dev 03:08 (hda), sector 5296
hda8: bad access: block=5298, count=-4
end_request: I/O error, dev 03:08 (hda), sector 5298
hda8: bad access: block=5300, count=-6
end_request: I/O error, dev 03:08 (hda), sector 5300
This errors would seem to backup that theory :-)
Is this an SMP board? Also, is
end_request: buffer-list destroyed
the very first error message?
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: death by ATA
2001-11-16 22:45 ` Jens Axboe
@ 2001-11-17 1:21 ` Andrew Morton
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2001-11-17 1:21 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andre Hedrick, lkml
Jens Axboe wrote:
>
> On Thu, Nov 15 2001, Andrew Morton wrote:
> > What does "end-request: buffer-list destroyed" mean?
>
> It means that the request was not sane anymore, or specifically that
> clustered number of sectors was set to lower value than current number
> of sectors (which isn't valid, of course). The buffer-list destroyed
> comment tells you that this corruption is most likely due to the
> buffer_head list on the request having been corrupted -- which in turn
> probably means that someone seriously screwed this request.
>
> hda8: bad access: block=5296, count=-2
> end_request: I/O error, dev 03:08 (hda), sector 5296
> hda8: bad access: block=5298, count=-4
> end_request: I/O error, dev 03:08 (hda), sector 5298
> hda8: bad access: block=5300, count=-6
> end_request: I/O error, dev 03:08 (hda), sector 5300
>
> This errors would seem to backup that theory :-)
'k, thanks.
> Is this an SMP board? Also, is
Uniprocessor VIA C3, running 2.4.15-pre4. The controller is a
VT8231. Running at UDMA100.
> end_request: buffer-list destroyed
>
> the very first error message?
Yes, it is.
It is reproducible after around three few hours. Exactly the
same.
-
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2001-11-17 1:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-15 19:22 death by ATA Andrew Morton
2001-11-16 22:45 ` Jens Axboe
2001-11-17 1:21 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox