* Silent Corruption on RAID5
@ 2006-01-22 16:44 Michael Barnwell
2006-01-22 18:42 ` Mitchell Laks
2006-01-27 12:29 ` Molle Bestefich
0 siblings, 2 replies; 4+ messages in thread
From: Michael Barnwell @ 2006-01-22 16:44 UTC (permalink / raw)
To: linux-raid
Hi,
I'm experiencing silent data corruption on my RAID 5 set of four 400GB
SATA disks.
I first had the problem a couple of weeks ago and thought it was related
to using reiserfs on my system because I hadn't used it before but have
another perfectly functional RAID 5 array running ext3 after lots of
testing I find the problem happens with ext3 on the array as well, and
after even more testing I find that the problem only occurs on the array
not the individual hard disks.
My test consists of making a ~10GB file of zeros, then checking it for
non-zero bytes, I've also tried creating the file of zeros on a
functional array and copying it across with the same results.
dd bs=1024 count=10000k if=/dev/zero of=./10GB.tst
od -t x1 s0/10GB.tst
These commands give me one row of zeros on my other RAID 5 set on the
same box and on each individual hard disk in the array when I put ext3
on them all to see if one was faulty but when they are in RAID the od
spouts lots of non-zeros at me.
<snip>
21524747740 00 00 00 00 00 00 00 00 00 00 00 00 00 50 5c 36
21524747760 00 10 00 00 00 00 a7 23 00 10 00 80 00 00 00 00
21524750000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
21525147740 00 00 00 00 00 00 00 00 00 00 00 00 00 50 5c 36
21525147760 00 10 00 00 00 00 a7 23 00 10 00 80 00 00 00 00
21525150000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<snip>
There is a fair bit of that and the last time I ran it I got an I/O
array and the mount went into read only mode.
I figure the problem is either with the Silicon Image 3114 hardware or
driver that supports the array or the RAID subsystem but like I
mentioned earlier my other RAID 5 set of three 120GB drives on two IDE
controllers works fine.
I'm running Debian sarge with a 2.6.15-1 kernel, it has an Athlon
XP2200, 1GB of RAM, Asus A7N8X-Deluxe motherboard, 2 Maxtor IDE
controllers, one Silicon Image 3114 PCI adapter, along with the on-board
Silicon Image 3112 controller - 2x 10GB IDE disks and a DVD ROM drive on
the on-board IDE controller, 3x 120GB Seagate hard disks on the PCI IDE
adapters, 2x 80GB Seagate disks on the on-board SilImg 3112 controller
and finally 4x 400GB disks on the SilImg 3114 PCI adapter.
biggs:/mnt/test/s0# uname -a
Linux biggs 2.6.15.1.060121 #1 Sat Jan 21 17:01:30 GMT 2006 i686 GNU/Linux
biggs:/mnt/test/s0# cat /proc/mdstat
Personalities : [raid1] [raid5]
md2 : active raid5 sdd1[4] sdc1[2] sdb1[1] sda1[0]
1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[=>...................] recovery = 8.3% (32500608/390708736)
finish=253.3min speed=23564K/sec
md1 : active raid5 hdg1[0] hde1[2] hdi1[1]
234436352 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md0 : active raid1 hdb2[0] hda2[1]
9502336 blocks [2/2] [UU]
unused devices: <none>
(Note: md2 is the array with problems, and I've done the tests when its
been fully synced with the same results)
So, does anyone have any suggestions or tests I could perform to narrow
down where my problem is?
Regards,
Michael Barnwell.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Silent Corruption on RAID5
2006-01-22 16:44 Silent Corruption on RAID5 Michael Barnwell
@ 2006-01-22 18:42 ` Mitchell Laks
2006-01-22 20:58 ` Michael Barnwell
2006-01-27 12:29 ` Molle Bestefich
1 sibling, 1 reply; 4+ messages in thread
From: Mitchell Laks @ 2006-01-22 18:42 UTC (permalink / raw)
To: linux-raid
On Sunday 22 January 2006 11:44 am, Michael Barnwell wrote:
> Hi,
>
> I'm experiencing silent data corruption on my RAID 5 set of four 400GB
> SATA disks.
> dd bs=1024 count=10000k if=/dev/zero of=./10GB.tst
> od -t x1 s0/10GB.tst
>
> These commands give me one row of zeros on my other RAID 5 set on the
> I'm running Debian sarge with a 2.6.15-1 kernel, it has an Athlon
> XP2200, 1GB of RAM, Asus A7N8X-Deluxe motherboard, 2 Maxtor IDE
> controllers, one Silicon Image 3114 PCI adapter, along with the on-board
> Silicon Image 3112 controller - 2x 10GB IDE disks and a DVD ROM drive on
> the on-board IDE controller, 3x 120GB Seagate hard disks on the PCI IDE
> adapters, 2x 80GB Seagate disks on the on-board SilImg 3112 controller
> and finally 4x 400GB disks on the SilImg 3114 PCI adapter.
>
Dear Michael,
If you look at my recent post and the response from David Greaves, I suspect
it is because of the presence of multiple diffferent SATA controllers.
Could you make a try of running your test with ONLY the SilImg 3114 adapter
populated with disks. Also I am not aware if the 3112 and 3114 use different
kernel modules, make sure the other one is not loaded.
I ran your test on my raid1 system with the debian SID 2.6.15 kernel and ran
the test on both motherboard sata_via and pci card sata_promise controlled
raid devices (i have raid1 though) and had no problem.
I could only run od -t x1 10GB.tst.
what is the "s0 " for?
I tried s0 or -s0 and the machine didnt accept that switch for od.
od -t x1 -s0 10GB.tst
"od: no type may be specified when dumping strings"
For what its worth, on my system the Promise controller wipes out the
via VT8237 onboard controller. You seem to have the opposite problem.
I am afraid that SATA controllers may not yet be stable enough for
production.
Mitchell Laks
> So, does anyone have any suggestions or tests I could perform to narrow
> down where my problem is?
>
> Regards,
>
> Michael Barnwell.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Silent Corruption on RAID5
2006-01-22 18:42 ` Mitchell Laks
@ 2006-01-22 20:58 ` Michael Barnwell
0 siblings, 0 replies; 4+ messages in thread
From: Michael Barnwell @ 2006-01-22 20:58 UTC (permalink / raw)
To: Mitchell Laks; +Cc: linux-raid
Hi,
Mitchell Laks wrote:
> On Sunday 22 January 2006 11:44 am, Michael Barnwell wrote:
>> Hi,
>>
>> I'm experiencing silent data corruption on my RAID 5 set of four 400GB
>> SATA disks.
>
>> dd bs=1024 count=10000k if=/dev/zero of=./10GB.tst
>> od -t x1 s0/10GB.tst
>>
>> These commands give me one row of zeros on my other RAID 5 set on the
>
>> I'm running Debian sarge with a 2.6.15-1 kernel, it has an Athlon
>> XP2200, 1GB of RAM, Asus A7N8X-Deluxe motherboard, 2 Maxtor IDE
>> controllers, one Silicon Image 3114 PCI adapter, along with the on-board
>> Silicon Image 3112 controller - 2x 10GB IDE disks and a DVD ROM drive on
>> the on-board IDE controller, 3x 120GB Seagate hard disks on the PCI IDE
>> adapters, 2x 80GB Seagate disks on the on-board SilImg 3112 controller
>> and finally 4x 400GB disks on the SilImg 3114 PCI adapter.
>>
>
> Dear Michael,
>
> If you look at my recent post and the response from David Greaves, I suspect
> it is because of the presence of multiple diffferent SATA controllers.
I just tried disabling the on-board SATA controller via the jumper on
the motherboard and then recreating the array and file system and the
problem happened again.
> Could you make a try of running your test with ONLY the SilImg 3114 adapter
> populated with disks. Also I am not aware if the 3112 and 3114 use different
> kernel modules, make sure the other one is not loaded.
They use the same module.
> I ran your test on my raid1 system with the debian SID 2.6.15 kernel and ran
> the test on both motherboard sata_via and pci card sata_promise controlled
> raid devices (i have raid1 though) and had no problem.
>
> I could only run od -t x1 10GB.tst.
> what is the "s0 " for?
> I tried s0 or -s0 and the machine didnt accept that switch for od.
>
> od -t x1 -s0 10GB.tst
> "od: no type may be specified when dumping strings"
That was a copy and paste error, its just od -t x1 10GB.tst
> For what its worth, on my system the Promise controller wipes out the
> via VT8237 onboard controller. You seem to have the opposite problem.
I tried a BIOS update this morning because it updated the SATA BIOS on
the on-board card and allowed me to see both of them during the booting
section (the PCI one finds drives and lets me access the SilImg BIOS
then the on-board one does the same).
> I am afraid that SATA controllers may not yet be stable enough for
> production.
Are other chipsets better supported?
> Mitchell Laks
>
<snip>
Thanks,
Michael Barnwell.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Silent Corruption on RAID5
2006-01-22 16:44 Silent Corruption on RAID5 Michael Barnwell
2006-01-22 18:42 ` Mitchell Laks
@ 2006-01-27 12:29 ` Molle Bestefich
1 sibling, 0 replies; 4+ messages in thread
From: Molle Bestefich @ 2006-01-27 12:29 UTC (permalink / raw)
To: Michael Barnwell; +Cc: linux-raid
Michael Barnwell wrote:
> I'm experiencing silent data corruption
> on my RAID 5 set of four 400GB SATA disks.
I have circa the same hardware:
* AMD Opteron 250
* Silicon Image 3114
* 300 GB Maxtor SATA
Just to add a data point, I've run your test on my RAID 1 (not RAID 5
!) without problems.
localhost ~ # dd bs=1024 count=10000k if=/dev/zero of=./10GB.tst
10240000+0 records in
10240000+0 records out
localhost ~ # od -t x1 ./10GB.tst
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
116100000000
localhost ~ # uname -a
Linux localhost 2.6.12.6-xen #6 SMP Fri Jan 6 06:49:53 CET 2006 x86_64
AMD Opteron(tm) Processor 250 AuthenticAMD GNU/Linux
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-01-27 12:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-22 16:44 Silent Corruption on RAID5 Michael Barnwell
2006-01-22 18:42 ` Mitchell Laks
2006-01-22 20:58 ` Michael Barnwell
2006-01-27 12:29 ` Molle Bestefich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).