* Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
@ 2005-08-31 17:37 Justin Piszcz
2005-08-31 18:09 ` Justin Piszcz
2005-09-05 7:32 ` Bartlomiej Zolnierkiewicz
0 siblings, 2 replies; 6+ messages in thread
From: Justin Piszcz @ 2005-08-31 17:37 UTC (permalink / raw)
To: linux-kernel, akpm, support; +Cc: B.Zolnierkiewicz, linux-ide, apiszcz
All,
I am trying to get everyone together on this to hopefully solve a serious
bug that I have seen on multiple machines with:
a) A Promise ATA/133 controller (ATA/100 works OK)
b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)
The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
hde: ST3400832A, ATA DISK drive
With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
If I *ENABLE DMA* then proceed to do anything with the disk, it will
FREEZE the box, no oops, etc, *FREEZE*.
hdparm -t /dev/hde
mkfs.xfs -f /dev/hde1
Will freeze the box.
-------
Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.
I have e-mailed the list quite a few times with this issue, I am surprised
very few people run into it.
Here is the error in the logs:
Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
Aug 31 11:30:25 p34 kernel: ide2: reset: success
After this, the machine locks up with 2.6.13.
With 2.6.13-rc7, I have not seen this once.
Can anyone offer any insight to why this is happening? I have a few
machines with the ATA/133 controller and 400GB drives; therefore, I'd
prefer to fix the problem rather than hooking up older, ATA/100 drives,
just so I can run newer kernels...
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
@ 2005-08-31 18:09 ` Justin Piszcz
2005-09-05 7:26 ` Bartlomiej Zolnierkiewicz
2005-09-05 7:32 ` Bartlomiej Zolnierkiewicz
1 sibling, 1 reply; 6+ messages in thread
From: Justin Piszcz @ 2005-08-31 18:09 UTC (permalink / raw)
To: linux-kernel, akpm, support; +Cc: B.Zolnierkiewicz, linux-ide, apiszcz
I do not even have IDE Taskfile Access enabled, so how is the kernel
printing these error messages before it freezes?
linux-2.6.13/drivers/ide/ide-taskfile.c: printk(KERN_ERR
"%s: no DRQ after issuing %sWRITE%s\n",
lqqqqqqqqqqqqqqqqqqqqqqq ATA/ATAPI/MFM/RLL support qqqqqqqqqqqqqqqqqqqqqqqk
x x[ ] IDE Taskfile Access
Anyone have any suggestions how I can solve this problem?
On Wed, 31 Aug 2005, Justin Piszcz wrote:
> All,
>
> I am trying to get everyone together on this to hopefully solve a serious bug
> that I have seen on multiple machines with:
>
> a) A Promise ATA/133 controller (ATA/100 works OK)
> b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)
>
> The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
> hde: ST3400832A, ATA DISK drive
>
> With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
> If I *ENABLE DMA* then proceed to do anything with the disk, it will FREEZE
> the box, no oops, etc, *FREEZE*.
>
> hdparm -t /dev/hde
> mkfs.xfs -f /dev/hde1
>
> Will freeze the box.
>
> -------
>
> Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.
>
> I have e-mailed the list quite a few times with this issue, I am surprised
> very few people run into it.
>
> Here is the error in the logs:
>
> Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
> Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
> Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
> SeekComplete DataRequest }
> Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
> Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
> Aug 31 11:30:25 p34 kernel: ide2: reset: success
>
> After this, the machine locks up with 2.6.13.
>
> With 2.6.13-rc7, I have not seen this once.
>
> Can anyone offer any insight to why this is happening? I have a few machines
> with the ATA/133 controller and 400GB drives; therefore, I'd prefer to fix
> the problem rather than hooking up older, ATA/100 drives, just so I can run
> newer kernels...
>
> Thanks.
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
2005-08-31 18:09 ` Justin Piszcz
@ 2005-09-05 7:26 ` Bartlomiej Zolnierkiewicz
2005-09-05 12:29 ` Alan Cox
0 siblings, 1 reply; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-09-05 7:26 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, akpm, support, linux-ide, apiszcz
On 8/31/05, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> I do not even have IDE Taskfile Access enabled, so how is the kernel
> printing these error messages before it freezes?
>
> linux-2.6.13/drivers/ide/ide-taskfile.c: printk(KERN_ERR
> "%s: no DRQ after issuing %sWRITE%s\n",
>
>
> lqqqqqqqqqqqqqqqqqqqqqqq ATA/ATAPI/MFM/RLL support qqqqqqqqqqqqqqqqqqqqqqqk
> x x[ ] IDE Taskfile Access
After DMA timeout driver reverted back to PIO,
ide-taskfile.c also holds PIO code besides IDE Taskfile Access.
Bartlomiej
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
2005-09-05 7:26 ` Bartlomiej Zolnierkiewicz
@ 2005-09-05 12:29 ` Alan Cox
2005-09-05 17:23 ` Justin Piszcz
0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2005-09-05 12:29 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Justin Piszcz, linux-kernel, akpm, linux-ide, apiszcz
On Llu, 2005-09-05 at 09:26 +0200, Bartlomiej Zolnierkiewicz wrote:
> After DMA timeout driver reverted back to PIO,
> ide-taskfile.c also holds PIO code besides IDE Taskfile Access.
On SMP after a DMA timeout it will potentially freeze. There are some
paths in that code which lead to double lock takes and hangs, plus some
timer races.
Justin can you make a backup (I mean that seriously), then build a
kernel with spin lock debug enabled and see if you can reproduce the
problem and get a trace.
If its the locking you'll get a trace and the kernel will continue. At
that point because the spinlock debug continues unsafely through a
double lock after the trace you are in the "danger zone" hence the
backup warning
[Yes the spin lock debug code really should warn you its dangerous for
non debug uses or get patched as it is in Fedora to trace and stop]
If its a hardware or other problem it will still hang
if its an unrelated lock problem it should still get a trace.
Why you see this only on 2.6.13 not 2.6.13-rc7 I don't know. It makes me
wonder if you have a bad drive - but then you imply going back to rc7
goes back to stable. Can you therefore also check the .config options
between the two kernels match.
Alan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
2005-09-05 12:29 ` Alan Cox
@ 2005-09-05 17:23 ` Justin Piszcz
0 siblings, 0 replies; 6+ messages in thread
From: Justin Piszcz @ 2005-09-05 17:23 UTC (permalink / raw)
To: Alan Cox
Cc: Bartlomiej Zolnierkiewicz, linux-kernel, akpm, linux-ide, apiszcz
Also,
Part of the problem may be that I have two ATA/133 Promise cards in one
box and only one ATA/133 in the other box.
Kernel 2.6.13 has fixed the problem with one ATA/133 card in the box.
Kernel 2.6.13 has not fixed the problem with two ATA/133 cards in the box.
FYI
Justin.
On Mon, 5 Sep 2005, Alan Cox wrote:
> On Llu, 2005-09-05 at 09:26 +0200, Bartlomiej Zolnierkiewicz wrote:
>> After DMA timeout driver reverted back to PIO,
>> ide-taskfile.c also holds PIO code besides IDE Taskfile Access.
>
>
> On SMP after a DMA timeout it will potentially freeze. There are some
> paths in that code which lead to double lock takes and hangs, plus some
> timer races.
>
> Justin can you make a backup (I mean that seriously), then build a
> kernel with spin lock debug enabled and see if you can reproduce the
> problem and get a trace.
>
> If its the locking you'll get a trace and the kernel will continue. At
> that point because the spinlock debug continues unsafely through a
> double lock after the trace you are in the "danger zone" hence the
> backup warning
>
> [Yes the spin lock debug code really should warn you its dangerous for
> non debug uses or get patched as it is in Fedora to trace and stop]
>
> If its a hardware or other problem it will still hang
>
> if its an unrelated lock problem it should still get a trace.
>
>
> Why you see this only on 2.6.13 not 2.6.13-rc7 I don't know. It makes me
> wonder if you have a bad drive - but then you imply going back to rc7
> goes back to stable. Can you therefore also check the .config options
> between the two kernels match.
>
> Alan
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
2005-08-31 18:09 ` Justin Piszcz
@ 2005-09-05 7:32 ` Bartlomiej Zolnierkiewicz
1 sibling, 0 replies; 6+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-09-05 7:32 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, akpm, support, linux-ide, apiszcz
On 8/31/05, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> All,
>
> I am trying to get everyone together on this to hopefully solve a serious
> bug that I have seen on multiple machines with:
>
> a) A Promise ATA/133 controller (ATA/100 works OK)
> b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK)
>
> The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk.
> hde: ST3400832A, ATA DISK drive
>
> With older kernels, if I *DO NOT ENABLE DMA* it does not crash.
> If I *ENABLE DMA* then proceed to do anything with the disk, it will
> FREEZE the box, no oops, etc, *FREEZE*.
>
> hdparm -t /dev/hde
> mkfs.xfs -f /dev/hde1
>
> Will freeze the box.
>
> -------
>
> Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5.
>
> I have e-mailed the list quite a few times with this issue, I am surprised
> very few people run into it.
>
> Here is the error in the logs:
>
> Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20
> Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA
> Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady
> SeekComplete DataRequest }
> Aug 31 11:30:25 p34 kernel: hde: drive not ready for command
> Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy }
> Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset.
> Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT
> Aug 31 11:30:25 p34 kernel: ide2: reset: success
>
> After this, the machine locks up with 2.6.13.
>
> With 2.6.13-rc7, I have not seen this once.
Absolutely no IDE changes from -rc7 to 2.6.13 final
and I don't see anything suspicious in the patch.
You may try using git to track this regression
(but it looks like a bad drive for me).
Bartlomiej
> Can anyone offer any insight to why this is happening? I have a few
> machines with the ATA/133 controller and 400GB drives; therefore, I'd
> prefer to fix the problem rather than hooking up older, ATA/100 drives,
> just so I can run newer kernels...
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-09-05 17:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 17:37 Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH) Justin Piszcz
2005-08-31 18:09 ` Justin Piszcz
2005-09-05 7:26 ` Bartlomiej Zolnierkiewicz
2005-09-05 12:29 ` Alan Cox
2005-09-05 17:23 ` Justin Piszcz
2005-09-05 7:32 ` Bartlomiej Zolnierkiewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).