Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
@ 2005-08-31 17:37 Justin Piszcz
  2005-08-31 18:09 ` Justin Piszcz
  2005-09-05  7:32 ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 6+ messages in thread
From: Justin Piszcz @ 2005-08-31 17:37 UTC (permalink / raw)
  To: linux-kernel, akpm, support; +Cc: B.Zolnierkiewicz, linux-ide, apiszcz

All,

I am trying to get everyone together on this to hopefully solve a serious 
bug that I have seen on multiple machines with:

a) A Promise ATA/133 controller (ATA/100 works OK)
b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)

The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
hde: ST3400832A, ATA DISK drive

With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
If I *ENABLE DMA* then proceed to do anything with the disk, it will 
FREEZE the box, no oops, etc, *FREEZE*.

hdparm -t /dev/hde
mkfs.xfs -f /dev/hde1

Will freeze the box.

-------

Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.

I have e-mailed the list quite a few times with this issue, I am surprised 
very few people run into it.

Here is the error in the logs:

Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
Aug 31 11:30:25 p34 kernel: ide2: reset: success

After this, the machine locks up with 2.6.13.

With 2.6.13-rc7, I have not seen this once.

Can anyone offer any insight to why this is happening? I have a few 
machines with the ATA/133 controller and 400GB drives; therefore, I'd 
prefer to fix the problem rather than hooking up older, ATA/100 drives, 
just so I can run newer kernels...

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
  2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
@ 2005-08-31 18:09 ` Justin Piszcz
  2005-09-05  7:26   ` Bartlomiej Zolnierkiewicz
  2005-09-05  7:32 ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 6+ messages in thread
From: Justin Piszcz @ 2005-08-31 18:09 UTC (permalink / raw)
  To: linux-kernel, akpm, support; +Cc: B.Zolnierkiewicz, linux-ide, apiszcz

I do not even have IDE Taskfile Access enabled, so how is the kernel 
printing these error messages before it freezes?

linux-2.6.13/drivers/ide/ide-taskfile.c:                printk(KERN_ERR 
"%s: no DRQ after issuing %sWRITE%s\n",


   lqqqqqqqqqqqqqqqqqqqqqqq ATA/ATAPI/MFM/RLL support qqqqqqqqqqqqqqqqqqqqqqqk
   x x[ ]     IDE Taskfile Access


Anyone have any suggestions how I can solve this problem?


On Wed, 31 Aug 2005, Justin Piszcz wrote:

> All,
>
> I am trying to get everyone together on this to hopefully solve a serious bug 
> that I have seen on multiple machines with:
>
> a) A Promise ATA/133 controller (ATA/100 works OK)
> b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)
>
> The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
> hde: ST3400832A, ATA DISK drive
>
> With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
> If I *ENABLE DMA* then proceed to do anything with the disk, it will FREEZE 
> the box, no oops, etc, *FREEZE*.
>
> hdparm -t /dev/hde
> mkfs.xfs -f /dev/hde1
>
> Will freeze the box.
>
> -------
>
> Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.
>
> I have e-mailed the list quite a few times with this issue, I am surprised 
> very few people run into it.
>
> Here is the error in the logs:
>
> Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
> Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
> Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
> SeekComplete DataRequest }
> Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
> Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
> Aug 31 11:30:25 p34 kernel: ide2: reset: success
>
> After this, the machine locks up with 2.6.13.
>
> With 2.6.13-rc7, I have not seen this once.
>
> Can anyone offer any insight to why this is happening? I have a few machines 
> with the ATA/133 controller and 400GB drives; therefore, I'd prefer to fix 
> the problem rather than hooking up older, ATA/100 drives, just so I can run 
> newer kernels...
>
> Thanks.
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
  2005-08-31 18:09 ` Justin Piszcz
@ 2005-09-05  7:26   ` Bartlomiej Zolnierkiewicz
  2005-09-05 12:29     ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-09-05  7:26 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, akpm, support, linux-ide, apiszcz

On 8/31/05, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> I do not even have IDE Taskfile Access enabled, so how is the kernel
> printing these error messages before it freezes?
> 
> linux-2.6.13/drivers/ide/ide-taskfile.c:                printk(KERN_ERR
> "%s: no DRQ after issuing %sWRITE%s\n",
> 
> 
>    lqqqqqqqqqqqqqqqqqqqqqqq ATA/ATAPI/MFM/RLL support qqqqqqqqqqqqqqqqqqqqqqqk
>    x x[ ]     IDE Taskfile Access

After DMA timeout driver reverted back to PIO,
ide-taskfile.c also holds PIO code besides IDE Taskfile Access.

Bartlomiej

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
  2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
  2005-08-31 18:09 ` Justin Piszcz
@ 2005-09-05  7:32 ` Bartlomiej Zolnierkiewicz
  1 sibling, 0 replies; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-09-05  7:32 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, akpm, support, linux-ide, apiszcz

On 8/31/05, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> All,
> 
> I am trying to get everyone together on this to hopefully solve a serious
> bug that I have seen on multiple machines with:
> 
> a) A Promise ATA/133 controller (ATA/100 works OK)
> b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)
> 
> The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
> hde: ST3400832A, ATA DISK drive
> 
> With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
> If I *ENABLE DMA* then proceed to do anything with the disk, it will
> FREEZE the box, no oops, etc, *FREEZE*.
> 
> hdparm -t /dev/hde
> mkfs.xfs -f /dev/hde1
> 
> Will freeze the box.
> 
> -------
> 
> Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.
> 
> I have e-mailed the list quite a few times with this issue, I am surprised
> very few people run into it.
> 
> Here is the error in the logs:
> 
> Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
> Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
> Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
> SeekComplete DataRequest }
> Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
> Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
> Aug 31 11:30:25 p34 kernel: ide2: reset: success
> 
> After this, the machine locks up with 2.6.13.
> 
> With 2.6.13-rc7, I have not seen this once.

Absolutely no IDE changes from -rc7 to 2.6.13 final
and I don't see anything suspicious in the patch.

You may try using git to track this regression
(but it looks like a bad drive for me).

Bartlomiej

> Can anyone offer any insight to why this is happening? I have a few
> machines with the ATA/133 controller and 400GB drives; therefore, I'd
> prefer to fix the problem rather than hooking up older, ATA/100 drives,
> just so I can run newer kernels...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
  2005-09-05  7:26   ` Bartlomiej Zolnierkiewicz
@ 2005-09-05 12:29     ` Alan Cox
  2005-09-05 17:23       ` Justin Piszcz
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2005-09-05 12:29 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Justin Piszcz, linux-kernel, akpm, linux-ide, apiszcz

On Llu, 2005-09-05 at 09:26 +0200, Bartlomiej Zolnierkiewicz wrote:
> After DMA timeout driver reverted back to PIO,
> ide-taskfile.c also holds PIO code besides IDE Taskfile Access.

On SMP after a DMA timeout it will potentially freeze. There are some
paths in that code which lead to double lock takes and hangs, plus some
timer races.

Justin can you make a backup (I mean that seriously), then build a
kernel with spin lock debug enabled and see if you can reproduce the
problem and get a trace. 

If its the locking you'll get a trace and the kernel will continue. At
that point because the spinlock debug continues unsafely through a
double lock after the trace you are in the "danger zone" hence the
backup warning

[Yes the spin lock debug code really should warn you its dangerous for
non debug uses or get patched as it is in Fedora to trace and stop]

If its a hardware or other problem it will still hang

if its an unrelated lock problem it should still get a trace.

Why you see this only on 2.6.13 not 2.6.13-rc7 I don't know. It makes me
wonder if you have a bad drive - but then you imply going back to rc7
goes back to stable. Can you therefore also check the .config options
between the two kernels match.

Alan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
  2005-09-05 12:29     ` Alan Cox
@ 2005-09-05 17:23       ` Justin Piszcz
  0 siblings, 0 replies; 6+ messages in thread
From: Justin Piszcz @ 2005-09-05 17:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Bartlomiej Zolnierkiewicz, linux-kernel, akpm, linux-ide, apiszcz

Also,

Part of the problem may be that I have two ATA/133 Promise cards in one 
box and only one ATA/133 in the other box.

Kernel 2.6.13 has fixed the problem with one ATA/133 card in the box.
Kernel 2.6.13 has not fixed the problem with two ATA/133 cards in the box.

FYI

Justin.


On Mon, 5 Sep 2005, Alan Cox wrote:

> On Llu, 2005-09-05 at 09:26 +0200, Bartlomiej Zolnierkiewicz wrote:
>> After DMA timeout driver reverted back to PIO,
>> ide-taskfile.c also holds PIO code besides IDE Taskfile Access.
>
>
> On SMP after a DMA timeout it will potentially freeze. There are some
> paths in that code which lead to double lock takes and hangs, plus some
> timer races.
>
> Justin can you make a backup (I mean that seriously), then build a
> kernel with spin lock debug enabled and see if you can reproduce the
> problem and get a trace.
>
> If its the locking you'll get a trace and the kernel will continue. At
> that point because the spinlock debug continues unsafely through a
> double lock after the trace you are in the "danger zone" hence the
> backup warning
>
> [Yes the spin lock debug code really should warn you its dangerous for
> non debug uses or get patched as it is in Fedora to trace and stop]
>
> If its a hardware or other problem it will still hang
>
> if its an unrelated lock problem it should still get a trace.
>
>
> Why you see this only on 2.6.13 not 2.6.13-rc7 I don't know. It makes me
> wonder if you have a bad drive - but then you imply going back to rc7
> goes back to stable. Can you therefore also check the .config options
> between the two kernels match.
>
> Alan
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-09-05 17:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
2005-08-31 18:09 ` Justin Piszcz
2005-09-05  7:26   ` Bartlomiej Zolnierkiewicz
2005-09-05 12:29     ` Alan Cox
2005-09-05 17:23       ` Justin Piszcz
2005-09-05  7:32 ` Bartlomiej Zolnierkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).