IDE DMA Problems...system hangs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* IDE DMA Problems...system hangs
@ 2001-02-14 20:09 Jasmeet Sidhu
  2001-02-14 20:28 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jasmeet Sidhu @ 2001-02-14 20:09 UTC (permalink / raw)
  To: linux-kernel

Hey guys,

I am attaching my previous email for additional info.  Now I am using 
kernel 2.4.1-ac12 and these problems have not gone away.

Anybody else having these problems with a ide raid 5?

The Raid 5 performance should also be questioned..here are some number 
returned by hdparam

/dev/hda -    IBM DLTA 20GB (ext2)
/dev/md0 - 8 IBM DLTA 45GB (Reiserfs)

[root@bertha hdparm-3.9]# ./hdparm -t /dev/hda
/dev/hda:
Timing buffered disk reads:  64 MB in  2.36 seconds = 27.12 MB/sec

[root@bertha hdparm-3.9]# ./hdparm -t /dev/md0
/dev/md0:
Timing buffered disk reads:  64 MB in 22.16 seconds =  2.89 MB/sec

Is this to be expected?  This much performance loss?  Anybody else using 
IDE raid, I would really appreciate your input on this setup.

Here is the log from syslog

Feb 13 05:23:27 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 05:23:27 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 05:23:28 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 13 05:23:33 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 05:23:33 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
...
Feb 13 08:47:48 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 08:47:48 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
...
Feb 13 09:54:07 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 13 09:54:07 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
...
Feb 13 12:10:43 bertha kernel: hds: dma_intr: bad DMA status
Feb 13 12:10:43 bertha kernel: hds: dma_intr: status=0x50 { DriveReady 
SeekComplete }
Feb 13 12:12:22 bertha kernel: hdg: timeout waiting for DMA
Feb 13 12:12:22 bertha kernel: ide_dmaproc: chipset supported 
ide_dma_timeout func only: 14
Feb 13 12:12:22 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady 
SeekComplete }
Feb 13 12:12:42 bertha kernel: hdg: timeout waiting for DMA
Feb 13 12:12:42 bertha kernel: ide_dmaproc: chipset supported 
ide_dma_timeout func only: 14
Feb 13 12:12:42 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady 
SeekComplete }
Feb 13 12:13:02 bertha kernel: hdg: timeout waiting for DMA
Feb 13 12:13:02 bertha kernel: ide_dmaproc: chipset supported 
ide_dma_timeout func only: 14
Feb 13 12:13:02 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady 
SeekComplete }
Feb 13 12:13:12 bertha kernel: hdg: timeout waiting for DMA
Feb 13 12:13:12 bertha kernel: ide_dmaproc: chipset supported 
ide_dma_timeout func only: 14
Feb 13 12:13:12 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady 
SeekComplete }
Feb 13 12:13:12 bertha kernel: hdg: DMA disabled
Feb 13 12:13:12 bertha kernel: ide3: reset: success	<------- * SYSTEM HUNG 
AT THIS POINT *
Feb 13 23:31:13 bertha syslogd 1.3-3: restart.

--------------------------------------------------
To: linux-kernel@vger.kernel.org
Subject: DMA blues... (Raid5 + Promise ATA/100)

I have a software raid setup using the latest kernel, but the system keeps 
crashing.

Each drive is connected to the respective ide port via ATA100 capable 
cables.  Each drive is a master..no slaves.  The configuration is that 
/dev/hdc is a hot spare and /dev/hd[e,g,i,k,m,o,q,s] are all setup as raid 
5.  These are all 75GB drives and are recognized as such.

I have searched the linux-kernel archives and saw many posts addressing the 
problems that I was experiencing, namely the freezing caused by the code in 
the body of delay_50ms() in drivers/ide/ide.c.  This was fixed in the 
current patch and as discussed earlier on the linux-kernel mailing list by 
using mdelay(50).  This fixed the problems to some extent, the system 
seemed very reliable and I did not get a single "DriveStatusError BadCRC" 
or a "DriveReady SeekComplete Index Error" for a while.  But after I had 
copied  a large amount of data to the raid, about 17GB, the system crashed 
completely and could only be recovered by a cold reboot.  Before using the 
latest patches, the system would usually crash after about 4-6GB of data 
had been moved.  Here are the log entries...

Feb 12 06:41:12 bertha kernel: hdo: dma_intr: status=0x53 { DriveReady 
SeekComplete Index Error }
Feb 12 06:41:12 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 12 06:45:42 bertha kernel: hdo: timeout waiting for DMA
Feb 12 06:45:42 bertha kernel: hdo: ide_dma_timeout: Lets do it again!stat 
= 0x50, dma_stat = 0x20
Feb 12 06:45:42 bertha kernel: hdo: irq timeout: status=0x50 { DriveReady 
SeekComplete }
Feb 12 06:45:42 bertha kernel: hdo: ide_set_handler: handler not null; 
old=c01d0710, new=c01dac70
Feb 12 06:45:42 bertha kernel: bug: kernel timer added twice at c01d0585.
Feb 12 09:13:15 bertha syslogd 1.3-3: restart.

Let me know If I should post any additional information that might help in 
troubleshooting.  Is this a possible issue with the kernel code or is this 
a problem with hardware?  Any help is appreciated...

- Jasmeet Sidhu

Some other info that might help:

[root@bertha /root]# uname -a
Linux bertha 2.4.1-ac9 #1 Mon Feb 12 02:43:08 PST 2001 i686 unknown

Patches Applied to Kernel 2.4.1:
	1) ide.2.4.1-p8.all.01172001.patch
	2) patch-2.4.1-ac9

Asus A7V VIA KT133 and Onboard Promise ATA100 Controller (PDC20267)
1GHz AMD Thunderbird Athalon Processor
Four Promise PCI ATA100 Controllers (PDC20267)
Netgear GA620 Gigabit Ethernet Card

Boot Drive (Root + Swap)
hda: IBM-DTLA-307020, ATA DISK drive

Raid 5 Drives:
hdc: IBM-DTLA-307075, ATA DISK drive	*SPARE*
hde: IBM-DTLA-307075, ATA DISK drive
hdg: IBM-DTLA-307075, ATA DISK drive
hdi: IBM-DTLA-307075, ATA DISK drive
hdk: IBM-DTLA-307075, ATA DISK drive
hdm: IBM-DTLA-307075, ATA DISK drive
hdo: IBM-DTLA-307075, ATA DISK drive
hdq: IBM-DTLA-307075, ATA DISK drive
hds: IBM-DTLA-307075, ATA DISK drive 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:09 IDE DMA Problems...system hangs Jasmeet Sidhu
@ 2001-02-14 20:28 ` Alan Cox
  2001-02-14 21:37   ` Barry K. Nathan
  2001-02-14 20:40 ` Jasmeet Sidhu
  2001-02-14 21:34 ` Thomas Dodd
  2 siblings, 1 reply; 9+ messages in thread
From: Alan Cox @ 2001-02-14 20:28 UTC (permalink / raw)
  To: Jasmeet Sidhu; +Cc: linux-kernel

> Anybody else having these problems with a ide raid 5?
> The Raid 5 performance should also be questioned..here are some number 
> returned by hdparam

You will get horribly bad performance off raid5 if you have stripes on both
hda/hdb  or hdc/hdd etc.

> Feb 13 05:23:27 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 13 05:23:27 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }

You have inadequate cabling. CRC errors are indications of that. Make sure you
are using sufficiently short cables for ATA33 and proper 80pin ATA66 cables.

> Feb 13 12:12:42 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady 
> SeekComplete }
> Feb 13 12:13:02 bertha kernel: hdg: timeout waiting for DMA

This could be cabling too, cant be sure

> Feb 13 12:13:12 bertha kernel: hdg: DMA disabled

It gave up using DMA

> Feb 13 12:13:12 bertha kernel: ide3: reset: success	<------- * SYSTEM HUNG 
> AT THIS POINT *

Ok thats a reasonable behaviour, except it shouldnt have then hung.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:09 IDE DMA Problems...system hangs Jasmeet Sidhu
  2001-02-14 20:28 ` Alan Cox
@ 2001-02-14 20:40 ` Jasmeet Sidhu
  2001-02-14 20:54   ` Alan Cox
  2001-02-15 23:38   ` Jasmeet Sidhu
  2001-02-14 21:34 ` Thomas Dodd
  2 siblings, 2 replies; 9+ messages in thread
From: Jasmeet Sidhu @ 2001-02-14 20:40 UTC (permalink / raw)
  To: Alan Cox, Jasmeet Sidhu; +Cc: linux-kernel

At 08:28 PM 2/14/2001 +0000, Alan Cox wrote:
> > Anybody else having these problems with a ide raid 5?
> > The Raid 5 performance should also be questioned..here are some number
> > returned by hdparam
>
>You will get horribly bad performance off raid5 if you have stripes on both
>hda/hdb  or hdc/hdd etc.

If I am reading this correctly, then by striping on both hda/hdb and 
/hdc/hdd you mean that I have two drives per ide channel.  In other words, 
you think I have a Master and a Slave type of a setup?  This is 
incorrect.  Each drive on the system is a master.  I have 5 promise cards 
in the system (4 PCI and 1 onboard AUS a7v mobo).  This gives me the 
ability to have 10 master drives. Since I am only striping on one drive per 
ide chanel, the penalty should not be much in terms of performance.  Maybe 
its just that the hdparam utility is not a good tool for benchamarking a 
raid set?

> > Feb 13 05:23:27 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady
> > SeekComplete Error }
> > Feb 13 05:23:27 bertha kernel: hdo: dma_intr: error=0x84 { 
> DriveStatusError
> > BadCRC }
>
>You have inadequate cabling. CRC errors are indications of that. Make sure you
>are using sufficiently short cables for ATA33 and proper 80pin ATA66 cables.

All the cables are ATA/100 capable.  But I cannot think of another reason 
as to why I might be getting CRC errors.  I will invest in better cables 
and see if it changes anything.

> > Feb 13 12:12:42 bertha kernel: hdg: irq timeout: status=0x50 { DriveReady
> > SeekComplete }
> > Feb 13 12:13:02 bertha kernel: hdg: timeout waiting for DMA
>
>This could be cabling too, cant be sure
>
> > Feb 13 12:13:12 bertha kernel: hdg: DMA disabled
>
>It gave up using DMA

agreed.

> > Feb 13 12:13:12 bertha kernel: ide3: reset: success   <------- * SYSTEM 
> HUNG
> > AT THIS POINT *
>
>Ok thats a reasonable behaviour, except it shouldnt have then hung.

This is also my main point of frustration.  The system should be able to 
disable DMA if its giving it a lot of problems, but it should not hang.  I 
have been experiencing this for quite a while with the newer 
kernels.  Should I try the latest ac13 patch?  I glanced of the changes and 
didnt seem like anything had changed regarding the ide subsystem.

Is there anyway I can force the kernel to output more messages...maybe that 
could help narrow down the problem?

J.Sidhu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:40 ` Jasmeet Sidhu
@ 2001-02-14 20:54   ` Alan Cox
  2001-02-15 23:38   ` Jasmeet Sidhu
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Cox @ 2001-02-14 20:54 UTC (permalink / raw)
  To: Jasmeet Sidhu; +Cc: Alan Cox, Jasmeet Sidhu, linux-kernel

> >You will get horribly bad performance off raid5 if you have stripes on both
> >hda/hdb  or hdc/hdd etc.
> 
> If I am reading this correctly, then by striping on both hda/hdb and 
> /hdc/hdd you mean that I have two drives per ide channel.  In other words, 
> you think I have a Master and a Slave type of a setup?  This is 
> incorrect.  Each drive on the system is a master.  I have 5 promise cards 

Ok then your performance should be fine (at least reasonably so, the lack
of tagged queueing does hurt)

> ide chanel, the penalty should not be much in terms of performance.  Maybe 
> its just that the hdparam utility is not a good tool for benchamarking a 
> raid set?

Its not a good raid benchmark tool but its a good indication of general problems.
Bonnie is a good tool for accurate assessment.

> disable DMA if its giving it a lot of problems, but it should not hang.  I 
> have been experiencing this for quite a while with the newer 
> kernels.  Should I try the latest ac13 patch?  I glanced of the changes and 
> didnt seem like anything had changed regarding the ide subsystem.

I've not changed anything related to DMA handling specifically. The current
-ac does have a fix for a couple of cases where an IDE reset on the promise
could hang the box dead. That may be the problem.

> Is there anyway I can force the kernel to output more messages...maybe that 
> could help narrow down the problem?

Ask andre@linux-ide.org. He may know the status of the promise support

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:09 IDE DMA Problems...system hangs Jasmeet Sidhu
  2001-02-14 20:28 ` Alan Cox
  2001-02-14 20:40 ` Jasmeet Sidhu
@ 2001-02-14 21:34 ` Thomas Dodd
  2 siblings, 0 replies; 9+ messages in thread
From: Thomas Dodd @ 2001-02-14 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jasmeet Sidhu

Jasmeet Sidhu wrote:
> 
> Hey guys,
> 
> I am attaching my previous email for additional info.  Now I am using
> kernel 2.4.1-ac12 and these problems have not gone away.
> 
> Anybody else having these problems with a ide raid 5?
> 
> The Raid 5 performance should also be questioned..here are some number
> returned by hdparam
> 
> /dev/hda -    IBM DLTA 20GB (ext2)
> /dev/md0 - 8 IBM DLTA 45GB (Reiserfs)
> 
> [root@bertha hdparm-3.9]# ./hdparm -t /dev/hda
> /dev/hda:
> Timing buffered disk reads:  64 MB in  2.36 seconds = 27.12 MB/sec
> 
> [root@bertha hdparm-3.9]# ./hdparm -t /dev/md0
> /dev/md0:
> Timing buffered disk reads:  64 MB in 22.16 seconds =  2.89 MB/sec
> 
> Is this to be expected?  This much performance loss?  Anybody else using
> IDE raid, I would really appreciate your input on this setup.

md2 = RAID0 ext2

hda = hdb = IBM DTTA-351010 (10GB, 5400RPM, UDMA33)

# hdparm -tT /dev/hda /dev/md2
/dev/hda:
Timing buffered disk reads:  64 MB in  5.27 seconds = 12.14 MB/sec
Timing buffer-cache reads:   128 MB in  0.82 seconds =156.10 MB/sec

/dev/md2:
Timing buffered disk reads:  64 MB in  3.34 seconds = 19.16 MB/sec
Timing buffer-cache reads:   128 MB in  0.80 seconds =160.00 MB/sec

On AMD K7 w/ 7409 (Viper) chipset, DMA66 mode w/ 80-pin cable.
kernel = 2.4.1-ac8, no errors in kernel log.
So I get a 58% increase. You should almost max out the bus.

You probably have a bad cable. Try hdparam on each disk and see if
any of them have errors/ cause the lockup.

	-Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:28 ` Alan Cox
@ 2001-02-14 21:37   ` Barry K. Nathan
  0 siblings, 0 replies; 9+ messages in thread
From: Barry K. Nathan @ 2001-02-14 21:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jasmeet Sidhu, linux-kernel

Alan Cox wrote: 
>> Feb 13 05:23:27 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
>> SeekComplete Error }
>> Feb 13 05:23:27 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
>> BadCRC }
> 
>You have inadequate cabling. CRC errors are indications of that. Make sure you
>are using sufficiently short cables for ATA33 and proper 80pin ATA66 cables.

I've had cases (on VIA chipsets) where, even or ATA33, a 40-pin cable
caused CRC errors for ATA33 and an 80-pin cable fixed things. (The same
40-pin cable does ATA33 without problems on an AMD 750 or an Intel BX,
though.)

IIRC, Andre Hedrick has said in the past that a marginal PSU or
motherboard can also cause CRC errors.

-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-14 20:40 ` Jasmeet Sidhu
  2001-02-14 20:54   ` Alan Cox
@ 2001-02-15 23:38   ` Jasmeet Sidhu
  2001-02-16  0:03     ` Andre Hedrick
  2001-02-16  9:40     ` Alan Cox
  1 sibling, 2 replies; 9+ messages in thread
From: Jasmeet Sidhu @ 2001-02-15 23:38 UTC (permalink / raw)
  To: linux-kernel


 >>I've not changed anything related to DMA handling specifically. The current
 >>-ac does have a fix for a couple of cases where an IDE reset on the promise
 >>could hang the box dead. That may be the problem.

I tried the new patches (2.4.1-ac13) and it seemed very stable.  After 
moving about 50GB of data to the raid5, the system crashed.  here is the 
syslog... (the system had been up for about 20 hours)

Feb 14 03:48:53 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 14 03:48:53 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
<snip - about 40 lines exact same hdo: error>
Feb 14 19:35:52 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 14 19:35:52 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 14 19:35:52 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 14 20:13:06 bertha kernel: hdi: dma_intr: bad DMA status
Feb 14 20:13:06 bertha kernel: hdi: dma_intr: status=0x50 { DriveReady 
SeekComplete }

Feb 15 01:26:34 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 15 01:26:34 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:26:34 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 15 01:26:34 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:26:38 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 15 01:26:38 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x53 { DriveReady 
SeekComplete Index Error }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
BadCRC }
Feb 15 01:54:01 bertha kernel: hdg: timeout waiting for DMA
<SYSTEM FROZEN>

Jasmeet


At 08:54 PM 2/14/2001 +0000, Alan Cox wrote:
> > >You will get horribly bad performance off raid5 if you have stripes on 
> both
> > >hda/hdb  or hdc/hdd etc.
> >
> > If I am reading this correctly, then by striping on both hda/hdb and
> > /hdc/hdd you mean that I have two drives per ide channel.  In other words,
> > you think I have a Master and a Slave type of a setup?  This is
> > incorrect.  Each drive on the system is a master.  I have 5 promise cards
>
>Ok then your performance should be fine (at least reasonably so, the lack
>of tagged queueing does hurt)
>
> > ide chanel, the penalty should not be much in terms of performance.  Maybe
> > its just that the hdparam utility is not a good tool for benchamarking a
> > raid set?
>
>Its not a good raid benchmark tool but its a good indication of general 
>problems.
>Bonnie is a good tool for accurate assessment.
>
> > disable DMA if its giving it a lot of problems, but it should not hang.  I
> > have been experiencing this for quite a while with the newer
> > kernels.  Should I try the latest ac13 patch?  I glanced of the changes 
> and
> > didnt seem like anything had changed regarding the ide subsystem.
>
>I've not changed anything related to DMA handling specifically. The current
>-ac does have a fix for a couple of cases where an IDE reset on the promise
>could hang the box dead. That may be the problem.
>
> > Is there anyway I can force the kernel to output more messages...maybe 
> that
> > could help narrow down the problem?
>
>Ask andre@linux-ide.org. He may know the status of the promise support


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-15 23:38   ` Jasmeet Sidhu
@ 2001-02-16  0:03     ` Andre Hedrick
  2001-02-16  9:40     ` Alan Cox
  1 sibling, 0 replies; 9+ messages in thread
From: Andre Hedrick @ 2001-02-16  0:03 UTC (permalink / raw)
  To: Jasmeet Sidhu; +Cc: linux-kernel


You have junk for cables or they are noe sheilded correctly from
crosstalk.  But I do not think this is the case.
Go check your power-supply for stality and load.
Then do a ripple noise test to make sure that underload, it does not cause
the clock on the drives to fail.


On Thu, 15 Feb 2001, Jasmeet Sidhu wrote:

> 
>  >>I've not changed anything related to DMA handling specifically. The current
>  >>-ac does have a fix for a couple of cases where an IDE reset on the promise
>  >>could hang the box dead. That may be the problem.
> 
> I tried the new patches (2.4.1-ac13) and it seemed very stable.  After 
> moving about 50GB of data to the raid5, the system crashed.  here is the 
> syslog... (the system had been up for about 20 hours)
> 
> Feb 14 03:48:53 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 14 03:48:53 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> <snip - about 40 lines exact same hdo: error>
> Feb 14 19:35:52 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 14 19:35:52 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 14 19:35:52 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 14 20:13:06 bertha kernel: hdi: dma_intr: bad DMA status
> Feb 14 20:13:06 bertha kernel: hdi: dma_intr: status=0x50 { DriveReady 
> SeekComplete }
> 
> Feb 15 01:26:34 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 15 01:26:34 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:26:34 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 15 01:26:34 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:26:38 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 15 01:26:38 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x53 { DriveReady 
> SeekComplete Index Error }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: status=0x51 { DriveReady 
> SeekComplete Error }
> Feb 15 01:45:06 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError 
> BadCRC }
> Feb 15 01:54:01 bertha kernel: hdg: timeout waiting for DMA
> <SYSTEM FROZEN>
> 
> Jasmeet
> 
> 
> At 08:54 PM 2/14/2001 +0000, Alan Cox wrote:
> > > >You will get horribly bad performance off raid5 if you have stripes on 
> > both
> > > >hda/hdb  or hdc/hdd etc.
> > >
> > > If I am reading this correctly, then by striping on both hda/hdb and
> > > /hdc/hdd you mean that I have two drives per ide channel.  In other words,
> > > you think I have a Master and a Slave type of a setup?  This is
> > > incorrect.  Each drive on the system is a master.  I have 5 promise cards
> >
> >Ok then your performance should be fine (at least reasonably so, the lack
> >of tagged queueing does hurt)
> >
> > > ide chanel, the penalty should not be much in terms of performance.  Maybe
> > > its just that the hdparam utility is not a good tool for benchamarking a
> > > raid set?
> >
> >Its not a good raid benchmark tool but its a good indication of general 
> >problems.
> >Bonnie is a good tool for accurate assessment.
> >
> > > disable DMA if its giving it a lot of problems, but it should not hang.  I
> > > have been experiencing this for quite a while with the newer
> > > kernels.  Should I try the latest ac13 patch?  I glanced of the changes 
> > and
> > > didnt seem like anything had changed regarding the ide subsystem.
> >
> >I've not changed anything related to DMA handling specifically. The current
> >-ac does have a fix for a couple of cases where an IDE reset on the promise
> >could hang the box dead. That may be the problem.
> >
> > > Is there anyway I can force the kernel to output more messages...maybe 
> > that
> > > could help narrow down the problem?
> >
> >Ask andre@linux-ide.org. He may know the status of the promise support
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-----------------------------------------------------------------------------
ASL, Inc.                                     Toll free: 1-877-ASL-3535
1757 Houret Court                             Fax: 1-408-941-2071
Milpitas, CA 95035                            Web: www.aslab.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: IDE DMA Problems...system hangs
  2001-02-15 23:38   ` Jasmeet Sidhu
  2001-02-16  0:03     ` Andre Hedrick
@ 2001-02-16  9:40     ` Alan Cox
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Cox @ 2001-02-16  9:40 UTC (permalink / raw)
  To: Jasmeet Sidhu; +Cc: linux-kernel

> I tried the new patches (2.4.1-ac13) and it seemed very stable.  After 
> moving about 50GB of data to the raid5, the system crashed.  here is the 
> syslog... (the system had been up for about 20 hours)

Ok so better but not perfect

> Feb 15 01:54:01 bertha kernel: hdg: timeout waiting for DMA
> <SYSTEM FROZEN>

hdg is a promise card ?

Alan


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-02-16  9:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-02-14 20:09 IDE DMA Problems...system hangs Jasmeet Sidhu
2001-02-14 20:28 ` Alan Cox
2001-02-14 21:37   ` Barry K. Nathan
2001-02-14 20:40 ` Jasmeet Sidhu
2001-02-14 20:54   ` Alan Cox
2001-02-15 23:38   ` Jasmeet Sidhu
2001-02-16  0:03     ` Andre Hedrick
2001-02-16  9:40     ` Alan Cox
2001-02-14 21:34 ` Thomas Dodd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox