death by ATA

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* death by ATA
@ 2001-11-15 19:22 Andrew Morton
  2001-11-16 22:45 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2001-11-15 19:22 UTC (permalink / raw)
  To: Andre Hedrick, lkml

Hi, Andre.

A shared mapping stress test tool was developed for ext3 testing.
It's called `bash-shared-mapping', and you run it with a script
called `run-bash-shared-mapping.sh'.  These tools are in

	http://www.zip.com.au/~akpm/ext3-tools.tar.gz

This program is the meanest stress tester I have ever seen.  It
has the scalps of three or four core kernel bugs on its belt and
several ext3 ones too.

Its IPI rate kills my otherwise-fine-runs-cerberus-overnight BP6
in thirty seconds.

Its IDE load kills my otherwise-fine-runs-cerberus-overnight
P6DBE in a few hours.

We have another failure here on a uniprocessor VIA C3, running
2.4.15-pre4. The controller is a VT8231.  Running at UDMA100.

The oops indicates a bug in the IDE driver's error recovery. Note
the "ide0: reset: success" followed by a null pointer deref in the
IDE interrupt handler.  You'll see that local variable `rq' has
value zero.

What does "end-request: buffer-list destroyed" mean?

What does "ide_dmaproc: chipset supported ide_dma_timeout func only: 14"
mean?

Do you think this is caused by a hardware failure?  If so, do you
expect that the reset recovery (once the oops is fixed) will bring
the disk back online?

Thanks.


end_request: buffer-list destroyed
hda8: bad access: block=5296, count=-2                            
end_request: I/O error, dev 03:08 (hda), sector 5296
hda8: bad access: block=5298, count=-4
end_request: I/O error, dev 03:08 (hda), sector 5298
hda8: bad access: block=5300, count=-6              
end_request: I/O error, dev 03:08 (hda), sector 5300
hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command                                      
hda: status timeout: status=0xd0 { Busy }
hda: no DRQ after issuing WRITE          
ide0: reset: success           
Unable to handle kernel NULL pointer dereference at virtual address 00000020
 printing eip:                                                              
c01dee53      
*pde = 00000000
Oops: 0000     
CPU:    0 
EIP:    0010:[<c01dee53>]    Not tainted
EFLAGS: 00010002                        
eax: c039a350   ebx: 00000000   ecx: c11e5160   edx: c03701f7
esi: 00000008   edi: c039a380   ebp: c039a340   esp: c02f7f30
ds: 0018   es: 0018   ss: 0018                               
Process swapper (pid: 0, stackpage=c02f7000)
Stack: c01d5df0 c039a380 c039a380 c11e5160 c039a380 c11e5160 00000202 c01d62b0 
       c039a380 c01dedf0 c118f640 04000001 0000000e c02f7fac c010833a 0000000e 
       c11e5160 c02f7fac c02f7fac 0000000e c036dac0 c118f640 c01084c8 0000000e 
Call Trace: [<c01d5df0>] [<c01d62b0>] [<c01dedf0>] [<c010833a>] [<c01084c8>]   
   [<c01051a0>] [<c010a738>] [<c01051a0>] [<c01051c4>] [<c0105242>] [<c0105000>] 
                                                                                 
Code: 8b 43 20 74 08 48 75 0c e9 b0 00 00 00 48 0f 85 a9 00 00 00 
Kernel panic: Aiee, killing interrupt handler!                
In interrupt handler - not syncing         

c01dee53      
*pde = 00000000
Oops: 0000     
CPU:    0 
EIP:    0010:[<c01dee53>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002                        
eax: c039a350   ebx: 00000000   ecx: c11e5160   edx: c03701f7
esi: 00000008   edi: c039a380   ebp: c039a340   esp: c02f7f30
ds: 0018   es: 0018   ss: 0018                               
Process swapper (pid: 0, stackpage=c02f7000)
Stack: c01d5df0 c039a380 c039a380 c11e5160 c039a380 c11e5160 00000202 c01d62b0 
       c039a380 c01dedf0 c118f640 04000001 0000000e c02f7fac c010833a 0000000e 
       c11e5160 c02f7fac c02f7fac 0000000e c036dac0 c118f640 c01084c8 0000000e 
Call Trace: [<c01d5df0>] [<c01d62b0>] [<c01dedf0>] [<c010833a>] [<c01084c8>]   
   [<c01051a0>] [<c010a738>] [<c01051a0>] [<c01051c4>] [<c0105242>] [<c0105000>] 
Code: 8b 43 20 74 08 48 75 0c e9 b0 00 00 00 48 0f 85 a9 00 00 00 

>>EIP; c01dee53 <write_intr+63/150>   <=====
Trace; c01d5df0 <ide_do_request+2b0/300>
Trace; c01d62b0 <ide_intr+100/160>
Trace; c01dedf0 <write_intr+0/150>
Trace; c010833a <handle_IRQ_event+3a/70>
Trace; c01084c8 <do_IRQ+78/c0>
Trace; c01051a0 <default_idle+0/40>
Trace; c010a738 <call_do_IRQ+5/d>
Trace; c01051a0 <default_idle+0/40>
Trace; c01051c4 <default_idle+24/40>
Trace; c0105242 <cpu_idle+42/60>
Trace; c0105000 <_stext+0/0>
Code;  c01dee53 <write_intr+63/150>
0000000000000000 <_EIP>:
Code;  c01dee53 <write_intr+63/150>   <=====
   0:   8b 43 20                  mov    0x20(%ebx),%eax   <=====
Code;  c01dee56 <write_intr+66/150>
   3:   74 08                     je     d <_EIP+0xd> c01dee60 <write_intr+70/150>
Code;  c01dee58 <write_intr+68/150>
   5:   48                        dec    %eax
Code;  c01dee59 <write_intr+69/150>
   6:   75 0c                     jne    14 <_EIP+0x14> c01dee67 <write_intr+77/150>
Code;  c01dee5b <write_intr+6b/150>
   8:   e9 b0 00 00 00            jmp    bd <_EIP+0xbd> c01def10 <write_intr+120/150>
Code;  c01dee60 <write_intr+70/150>
   d:   48                        dec    %eax
Code;  c01dee61 <write_intr+71/150>
   e:   0f 85 a9 00 00 00         jne    bd <_EIP+0xbd> c01def10 <write_intr+120/150>

 <0>Kernel panic: Aiee, killing interrupt handler!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: death by ATA
  2001-11-15 19:22 death by ATA Andrew Morton
@ 2001-11-16 22:45 ` Jens Axboe
  2001-11-17  1:21   ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2001-11-16 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andre Hedrick, lkml

On Thu, Nov 15 2001, Andrew Morton wrote:
> What does "end-request: buffer-list destroyed" mean?

It means that the request was not sane anymore, or specifically that
clustered number of sectors was set to lower value than current number
of sectors (which isn't valid, of course). The buffer-list destroyed
comment tells you that this corruption is most likely due to the
buffer_head list on the request having been corrupted -- which in turn
probably means that someone seriously screwed this request.

hda8: bad access: block=5296, count=-2                            
end_request: I/O error, dev 03:08 (hda), sector 5296
hda8: bad access: block=5298, count=-4
end_request: I/O error, dev 03:08 (hda), sector 5298
hda8: bad access: block=5300, count=-6              
end_request: I/O error, dev 03:08 (hda), sector 5300

This errors would seem to backup that theory :-)

Is this an SMP board? Also, is

end_request: buffer-list destroyed

the very first error message?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: death by ATA
  2001-11-16 22:45 ` Jens Axboe
@ 2001-11-17  1:21   ` Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2001-11-17  1:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andre Hedrick, lkml

Jens Axboe wrote:
> 
> On Thu, Nov 15 2001, Andrew Morton wrote:
> > What does "end-request: buffer-list destroyed" mean?
> 
> It means that the request was not sane anymore, or specifically that
> clustered number of sectors was set to lower value than current number
> of sectors (which isn't valid, of course). The buffer-list destroyed
> comment tells you that this corruption is most likely due to the
> buffer_head list on the request having been corrupted -- which in turn
> probably means that someone seriously screwed this request.
> 
> hda8: bad access: block=5296, count=-2
> end_request: I/O error, dev 03:08 (hda), sector 5296
> hda8: bad access: block=5298, count=-4
> end_request: I/O error, dev 03:08 (hda), sector 5298
> hda8: bad access: block=5300, count=-6
> end_request: I/O error, dev 03:08 (hda), sector 5300
> 
> This errors would seem to backup that theory :-)

'k, thanks.

> Is this an SMP board? Also, is

Uniprocessor VIA C3, running 2.4.15-pre4. The controller is a
VT8231.  Running at UDMA100.

> end_request: buffer-list destroyed
> 
> the very first error message?

Yes, it is.

It is reproducible after around three few hours.  Exactly the
same.

-

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-11-17  1:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-15 19:22 death by ATA Andrew Morton
2001-11-16 22:45 ` Jens Axboe
2001-11-17  1:21   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox