From: Mark Cooke <mpc@jts.homeip.net>
To: linux-ide@vger.kernel.org
Subject: 2.4.26rc1 / HPT 374 / RAID = data read corruption with disks on primary channels.
Date: Tue, 30 Mar 2004 21:04:18 +0100 [thread overview]
Message-ID: <1080677057.11947.83.camel@sage.kitchen> (raw)
[-- Attachment #1: Type: text/plain, Size: 10242 bytes --]
Hi all,
I've been having some trouble with an Abit IT7 machine. It is a
pentium-4 machine, 1GB ram (passes days of all-test memtest86), 80GB
seagate on the ICH4 as the system disk. 4 x 160GB seagates, one on each
channel of the HPT374.
hdc: ICH4 80GB ST380021A FwRev=3.10
hde: HPT Disk 0 160GB ST3160023A FwRev=3.04
hdg: HPT Disk 1 160GB ST3160023A FwRev=3.06
hdi: HPT Disk 2 160GB ST3160023A FwRev=3.06
hdk: HPT Disk 3 160GB ST3160023A FwRev=3.06
The 160G disks are all split into 4 partitions, and a set of 4-disk
RAID-5 partitions created using one partition from each disk. All are
running ext3.
Checksumming a large (>RAM sized) file on any of the raid-5 devices
gives a different checksum every time. Ie 'while true; do md5sum
big_file ; done' produces a list of different checksums. The same file
on the ICH4 works as expected.
Characterising the errors shows random blocks of 4-byte corruption. I
ran a second test copying file 1 to file 2, and doing a 'cmp -l 1 2',
and on a 1GB file it gave 20 bytes of errors, in 5 groups of 4
contiguous bytes. The errors do not have an obvious pattern of single
bit errors, nor am I seeing any messages in the system logs relating to
the file copying. The number of location of the errors varies without
any obvious pattern.
All the disks passed their individual (long offline) SMART self-test's,
a dd if=/dev/hdX of=/dev/null, and are all connected as udma5, with
80wire cables. Drive temperatures are all showing under 40C after an
extended period of intensive file checksumming / copying.
See below for details of the further work I did to track this, but at
this point, I believe there is some strange issue with the primary
channel on the two highpoint controllers that does not exist for the two
drives on the secondary channels, as arrays built from disks 1+3 work
without errors, whereas any use of disks 0+2 produces random read
errors.
Questions:
Any known issues with what I'm trying to do ?
Any workarounds / suggestions for isolating the problem ?
Any recommendations for PCI ide cards that work right ?
Thanks for any input!
Mark
Futher investigation summary:
After finding the above I moved the data off one of the raid-5
partitions and did some experiments with different disks/raid levels:
1. 4 disk raid-0 stripe. This gave the 4-byte corruption errors.
2. 2 disk raid-0 stripe, using disk0+2. This gave the same 4-byte
corruption errors.
3. 2 disk raid-1 mirror, using disk1+3, but with disk-3 failed out of
the array to reduce i/o to a minimal level. This works without errors.
4. 2 disk raid-1 mirror. using disk1+3. This works without errors.
5. 2 disk raid-1 mirror, using disk0+2. This produces errors again.
6. 2 disk raid-1 mirror, using disk0+2, with disk-2 failed out of the
array. More errors.
7. 2 disk raid-1 mirror, using disk0+2, with disk-0 failed out of the
array. More errors.
Example cmp -l output from a 1GB file:
294961149 6 204
294961150 0 317
294961151 223 311
294961152 377 40
434229245 173 3
434229246 16 4
434229247 342 23
434229248 141 210
497602557 65 377
497602558 71 377
497602559 220 377
497602560 42 377
625459197 35 36
625459198 263 151
625459199 252 244
625459200 322 102
634101757 377 17
634101758 377 232
634101759 377 302
634101760 377 234
lspci:
00:00.0 Host bridge: Intel Corp. 82845 845 (Brookdale) Chipset Host Bridge (rev 11)
00:01.0 PCI bridge: Intel Corp. 82845 845 (Brookdale) Chipset AGP Bridge (rev 11)
00:1d.0 USB Controller: Intel Corp. 82801DB USB (Hub #1) (rev 01)
00:1d.1 USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 01)
00:1d.2 USB Controller: Intel Corp. 82801DB USB (Hub #3) (rev 01)
00:1d.7 USB Controller: Intel Corp. 82801DB USB EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corp. 82801DB ISA Bridge (LPC) (rev 01)
00:1f.1 IDE interface: Intel Corp. 82801DB ICH4 IDE (rev 01)
00:1f.3 SMBus: Intel Corp. 82801DB SMBus (rev 01)
00:1f.5 Multimedia audio controller: Intel Corp. 82801DB AC'97 Audio (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV11DDR [GeForce2 MX 100 DDR/200 DDR] (rev b2)
02:00.0 Multimedia video controller: Brooktree Corporation Bt848 Video Capture (rev 12)
02:01.0 SCSI storage controller: Tekram Technology Co.,Ltd. TRM-S1040 (rev 01)
02:02.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
02:03.0 Network controller: Harris Semiconductor Prism 2.5 Wavelan chipset (rev 01)
02:04.0 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
02:04.1 RAID bus controller: Triones Technologies, Inc. HPT374 (rev 07)
02:05.0 USB Controller: VIA Technologies, Inc. USB (rev 50)
02:05.1 USB Controller: VIA Technologies, Inc. USB (rev 50)
02:05.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 51)
02:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
02:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)
proc/interrupts:
CPU0
0: 299882 IO-APIC-edge timer
1: 5 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
7: 0 XT-PIC parport0
8: 1 IO-APIC-edge rtc
14: 71251 IO-APIC-edge ide0
15: 505029 IO-APIC-edge ide1
16: 222120 IO-APIC-level usb-uhci, bttv0, nvidia
17: 21 IO-APIC-level DC395x_TRM, Intel 82801DB-ICH4, ohci1394
18: 76625 IO-APIC-level usb-uhci, usb-uhci, eth0
19: 24273 IO-APIC-level usb-uhci, usb-uhci, wifi0
20: 2913200 IO-APIC-level ide2, ide3, ide4, ide5
21: 0 IO-APIC-level ehci_hcd
22: 4338 IO-APIC-level eth1
23: 0 IO-APIC-level ehci_hcd
NMI: 0
LOC: 299838
ERR: 0
MIS: 0
raidtab:
raiddev /dev/md1
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/hde1
raid-disk 0
device /dev/hdg1
raid-disk 1
device /dev/hdi1
raid-disk 2
device /dev/hdk1
raid-disk 3
raiddev /dev/md2
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/hde2
raid-disk 0
device /dev/hdg2
raid-disk 1
device /dev/hdi2
raid-disk 2
device /dev/hdk2
raid-disk 3
raiddev /dev/md3
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
chunk-size 32
device /dev/hde3
raid-disk 0
device /dev/hdg3
raid-disk 1
device /dev/hdi3
raid-disk 2
device /dev/hdk3
raid-disk 3
raiddev /dev/md4
raid-level 0
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
chunk-size 64
device /dev/hde4
raid-disk 0
device /dev/hdg4
raid-disk 1
device /dev/hdi4
raid-disk 2
device /dev/hdk4
raid-disk 3
dmesg extract:
hda: JLMS XJ-HD165H, ATAPI CD/DVD-ROM drive
hdb: CD-RW CDR-6S52, ATAPI CD/DVD-ROM drive
hdc: ST380021A, ATA DISK drive
hde: ST3160023A, ATA DISK drive
hdg: ST3160023A, ATA DISK drive
hdi: ST3160023A, ATA DISK drive
hdk: ST3160023A, ATA DISK drive
hdc: attached ide-disk driver.
hdc: host protected area => 1
hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=155061/16/63, UDMA(100)
hde: attached ide-disk driver.
hde: host protected area => 1
hde: 312581808 sectors (160042 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdg: attached ide-disk driver.
hdg: host protected area => 1
hdg: 312581808 sectors (160042 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdi: attached ide-disk driver.
hdi: host protected area => 1
hdi: 312581808 sectors (160042 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdk: attached ide-disk driver.
hdk: host protected area => 1
hdk: 312581808 sectors (160042 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
hdc: hdc1 hdc2 hdc3 hdc4
hde: hde1 hde2 hde3 hde4
hdg: hdg1 hdg2 hdg3 hdg4
hdi: hdi1 hdi2 hdi3 hdi4
hdk: hdk1 hdk2 hdk3 hdk4
/proc/ide/hpt366
HighPoint HPT366/368/370/372/374
Controller: 0
Chipset: HPT374
--------------- Primary Channel --------------- Secondary Channel --------------Enabled: yes yes
Cable: ATA-66 ATA-66
--------------- drive0 --------- drive1 ------- drive0 ---------- drive1 -------DMA capable: yes no yes no
Mode: UDMA off UDMA off
Controller: 1
Chipset: HPT374
--------------- Primary Channel --------------- Secondary Channel --------------Enabled: yes yes
Cable: ATA-66 ATA-66
--------------- drive0 --------- drive1 ------- drive0 ---------- drive1 -------DMA capable: yes no yes no
Mode: UDMA off UDMA off
/proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping : 4
cpu MHz : 2009.991
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4010.80
kernel config and drive smart logs attached.
[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 11127 bytes --]
[-- Attachment #3: smart.gz --]
[-- Type: application/x-gzip, Size: 2014 bytes --]
next reply other threads:[~2004-03-30 20:04 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-30 20:04 Mark Cooke [this message]
2004-03-31 7:08 ` 2.4.26rc1 / HPT 374 / RAID = data read corruption with disks onprimary channels Tomi Orava
2004-03-31 15:01 ` Mark Cooke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1080677057.11947.83.camel@sage.kitchen \
--to=mpc@jts.homeip.net \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox