public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Fwd: HPT 370 / RAID 5 possible corruption issue.]
@ 2001-10-09 23:01 Dylan Griffiths
  2001-10-11 23:24 ` Jakob Østergaard
  0 siblings, 1 reply; 3+ messages in thread
From: Dylan Griffiths @ 2001-10-09 23:01 UTC (permalink / raw)
  To: Linux Kernel

	I'm forwarding this here since Ingo/Andre haven't replied to me in a week. 
  I don't like silent data corruption, so I hope SOMEONE pays attention to 
this.

-------- Original Message --------
Subject: HPT 370 / RAID 5 possible corruption issue.
Date: Tue, 02 Oct 2001 13:44:24 -0600
From: Dylan Griffiths <Dylan_G@bigfoot.com>
To: Andre Hedrick <andre@linux-ide.org>
CC: mingo@redhat.com

	Hi.  I have an HPT 370 in a box here.  It has 2 Quantum drives connectod to
it (one master per channel) that are in a RAID 5 set with two more
Quantums on the VIA onboard IDE controller.  When I run an md5sum of a
group of files vs. the precomputed md5sums, sometimes they don't match in
different spots.

	After googling around the web, I found a similar report with the HPT 366
controller and software RAID:
http://www.linux-consulting.com/Raid/Docs/raid_highload.tst.txt

In there, the fellow found that reading from a drive connected to the HPT
366 controller would have different results depending on load.

The first time I noticed it was on my client box (this is a RAID5 homedir
exported by NFS to the entire LAN).  This sequence shows how I couldn't
verify a download on a TV show I was going to watch:

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r23 : crc does not match
(43BB6DFC!=F53FAF16)
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r36 : crc does not match
(F28A8F31!=E64A1180)
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.rar : crc does not match
(F0A13E95!=5943980D)
41 files, 38 OK, 3 badcrc.  153.198 seconds, 5189.3K/s

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r04 : crc does not match
(C8BDABAB!=BF3DB71A)
41 files, 40 OK, 1 badcrc.  229.229 seconds, 3468.1K/s

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r18 : crc does not match
(74F22550!=CEDD8A8F)
41 files, 40 OK, 1 badcrc.  133.518 seconds, 5954.2K/s

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r22 : crc does not match
(5E5E484E!=CE08A191)
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r25 : crc does not match
(B0006BB7!=38531314)
41 files, 39 OK, 2 badcrc.  132.956 seconds, 5979.4K/s

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  139.748 seconds, 5688.8K/s

dylang@shadowgate:~/movies/TV$ cfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r02 : crc does not match
(484D1D14!=771F6235)
41 files, 40 OK, 1 badcrc.  138.904 seconds, 5723.4K/s

I thought it might've been a network problem, but on the server itself:

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  82.916 seconds, 9588.0K/s

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r26 : crc does not match
(365E02AE!=ED0E2554)                      ** Heavy NFS activity (4 x 100mb
files moved)
41 files, 40 OK, 1 badcrc.  154.080 seconds, 5159.7K/s

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  82.232 seconds, 9667.7K/s

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  81.370 seconds, 9770.2K/s

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  81.954 seconds, 9700.6K/s

dylang@kaneda:~/movies/TV$ ~/cfv Star.Trek.ENT-S1E01-Broken.Bow-Part.2.sfv
Star.Trek.ENT-S1E01-Broken.Bow-Part.2.r*
41 files, 41 OK.  128.051 seconds, 6208.5K/s
      *** lighter NFS activity (1 x 100mb file moved)


	So which is buckling under pressure, the RAID 5 code or the HPT 370 driver 
or card?  The system is an Athlon 550 with 768mb of PC133 RAM running 
2.4.10 and using an EEPro 100 for networking.

root@kaneda:~# cat /proc/interrupts
             CPU0
    0:    5844936          XT-PIC  timer
    1:          2          XT-PIC  keyboard
    2:          0          XT-PIC  cascade
    8:          1          XT-PIC  rtc
    9:    6993615          XT-PIC  eth0
   10:     423816          XT-PIC  ide2, ide3
   14:   17763212          XT-PIC  ide0
   15:    3159230          XT-PIC  ide1


IDE dmesg output:

Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 39
VP_IDE: chipset revision 16
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci00:07.1
      ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
      ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA
HPT370: IDE controller on PCI bus 00 dev 60
PCI: Enabling device 00:0c.0 (0005 -> 0007)
HPT370: chipset revision 3
HPT370: not 100% native mode: will probe irqs later
      ide2: BM-DMA at 0xcc00-0xcc07, BIOS settings: hde:DMA, hdf:pio
      ide3: BM-DMA at 0xcc08-0xcc0f, BIOS settings: hdg:DMA, hdh:pio
hda: QUANTUM FIREBALLP AS40.0, ATA DISK drive
hdb: QUANTUM FIREBALLP AS40.0, ATA DISK drive
hdc: FUJITSU MPG3204AT E, ATA DISK drive
hdd: FUJITSU MPG3204AT E, ATA DISK drive
hde: QUANTUM FIREBALLP AS40.0, ATA DISK drive
hdg: QUANTUM FIREBALLP AS40.0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
ide2 at 0xdc00-0xdc07,0xd802 on irq 10
ide3 at 0xd400-0xd407,0xd002 on irq 10
hda: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=4866/255/63
hdb: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=4866/255/63
hdc: 40031712 sectors (20496 MB) w/512KiB Cache, CHS=39714/16/63
hdd: 40031712 sectors (20496 MB) w/512KiB Cache, CHS=39714/16/63
hde: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=77557/16/63, UDMA(100)
hdg: 78177792 sectors (40027 MB) w/1902KiB Cache, CHS=77557/16/63, UDMA(100)

RAID info:

root@kaneda:~# cat /proc/mdstat
Personalities : [linear] [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md1 : active linear ide/host0/bus1/target1/lun0/part1[1]
ide/host0/bus1/target0/lun0/part1[0]
        40031488 blocks 32k rounding

md0 : active raid5 ide/host2/bus1/target0/lun0/part6[3]
ide/host2/bus0/target0/lun0/part6[2] ide/host0/bus0/target1/lun0/part6[1]
ide/host0/bus0/target0/lun0/part6[0]
        111266112 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>



-- 
     www.kuro5hin.org -- technology and culture, from the trenches.
                          -=-=-=-=-=-
Those that give up liberty to obtain safety deserve neither.
  -- Benjamin Franklin
   http://www.zdnet.com/zdnn/stories/news/0,4586,2812463,00.html
   http://slashdot.org/article.pl?sid=01/09/16/1647231
                          -=-=-=-=-=-


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Fwd: HPT 370 / RAID 5 possible corruption issue.]
  2001-10-09 23:01 [Fwd: HPT 370 / RAID 5 possible corruption issue.] Dylan Griffiths
@ 2001-10-11 23:24 ` Jakob Østergaard
  2001-10-14  6:21   ` Dylan Griffiths
  0 siblings, 1 reply; 3+ messages in thread
From: Jakob Østergaard @ 2001-10-11 23:24 UTC (permalink / raw)
  To: Dylan Griffiths; +Cc: Linux Kernel

On Tue, Oct 09, 2001 at 05:01:50PM -0600, Dylan Griffiths wrote:
...
> 	Hi.  I have an HPT 370 in a box here.  It has 2 Quantum drives connectod to
> it (one master per channel) that are in a RAID 5 set with two more
> Quantums on the VIA onboard IDE controller.  When I run an md5sum of a
> group of files vs. the precomputed md5sums, sometimes they don't match in
> different spots.
> 
> 	After googling around the web, I found a similar report with the HPT 366
> controller and software RAID:
> http://www.linux-consulting.com/Raid/Docs/raid_highload.tst.txt
> 
> In there, the fellow found that reading from a drive connected to the HPT
> 366 controller would have different results depending on load.

I can't say what the current status is.   But some time ago some people I know
got burnt with silent corruption from using HPT cards with RAID5 and RAID0, the
cards were replaced with Promise cards, and the problem went away (as it should
- I've been running a lot of RAID on Promise cards and never saw the problem).

As long as there are Promise cards to get, I'm not going anywhere near HPT.

Maybe there's a fix somewhere, maybe there's a magic BIOS setting or upgrade,
maybe something else can make it work, I don't know.  Promise cards are cheap
so I don't care.

Sorry for not being able to give you "good" information, but at least now you
got "some" information.   Hope it helps, for what it's worth.

Cheers,

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Fwd: HPT 370 / RAID 5 possible corruption issue.]
  2001-10-11 23:24 ` Jakob Østergaard
@ 2001-10-14  6:21   ` Dylan Griffiths
  0 siblings, 0 replies; 3+ messages in thread
From: Dylan Griffiths @ 2001-10-14  6:21 UTC (permalink / raw)
  To: Jakob Østergaard; +Cc: Linux Kernel

Jakob Østergaard wrote:

> I can't say what the current status is.   But some time ago some people I know
> got burnt with silent corruption from using HPT cards with RAID5 and RAID0, the
> cards were replaced with Promise cards, and the problem went away (as it should
> - I've been running a lot of RAID on Promise cards and never saw the problem).


I've got a spare Promise card now that I will test and keep posted of the 
results.

 
> As long as there are Promise cards to get, I'm not going anywhere near HPT.
> 
> Maybe there's a fix somewhere, maybe there's a magic BIOS setting or upgrade,
> maybe something else can make it work, I don't know.  Promise cards are cheap
> so I don't care.
> 
> Sorry for not being able to give you "good" information, but at least now you
> got "some" information.   Hope it helps, for what it's worth.
> 

I wonder, if the HPT card support is so bad, or the hardware itself is so 
squirelly, why it's not marked as UNSTABLE or has a note about the HW 
being evil.

-- 
     www.kuro5hin.org -- technology and culture, from the trenches.
                          -=-=-=-=-=-
Those that give up liberty to obtain safety deserve neither.
  -- Benjamin Franklin
   http://www.zdnet.com/zdnn/stories/news/0,4586,2812463,00.html
   http://slashdot.org/article.pl?sid=01/09/16/1647231
                          -=-=-=-=-=-


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-10-14  6:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-09 23:01 [Fwd: HPT 370 / RAID 5 possible corruption issue.] Dylan Griffiths
2001-10-11 23:24 ` Jakob Østergaard
2001-10-14  6:21   ` Dylan Griffiths

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox