Call for RAID-6 users

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Call for RAID-6 users
@ 2004-07-23 23:32 H. Peter Anvin
  2004-07-26 21:38 ` Jim Paris
  2004-07-30 21:11 ` maarten van den Berg
  0 siblings, 2 replies; 26+ messages in thread
From: H. Peter Anvin @ 2004-07-23 23:32 UTC (permalink / raw)
  To: linux-raid

I'm considering removing the "experimental" label from RAID-6.  It
appears at this point to be just as stable as RAID-5 (since it's based
on RAID-5 code that's obviously all that can be expected.)

Thus, if you have used RAID-6 and have good or bad experiences, I'd
like to know them as soon as possible.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin
@ 2004-07-26 21:38 ` Jim Paris
  2004-07-27  2:05   ` Matthew - RAID
  2004-07-30 15:58   ` H. Peter Anvin
  2004-07-30 21:11 ` maarten van den Berg
  1 sibling, 2 replies; 26+ messages in thread
From: Jim Paris @ 2004-07-26 21:38 UTC (permalink / raw)
  To: linux-raid

> Thus, if you have used RAID-6 and have good or bad experiences, I'd
> like to know them as soon as possible.

Just tried setting up a RAID-6 on a new server, and I'm seeing
complete filesystem corruption.

I have 6 250GB disks, and want them all in the array.  I've created it
degraded, with the first disk missing, since that disk temporarily
holds the system.

Using kernel 2.6.7, mdadm 1.6.0, I did something like this:

# mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd{g,i,k,m,o}2

which gives me:

md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1]
      976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU]

Then created the filesystem:
# mkreiserfs /dev/md1
# reiserfsck /dev/md1  # <-- no errors
# df -H                # <-- shows proper size (1 TB)

Then copied the system do it:
# mount /dev/md1 /mnt/root
# cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ; cd /
# umount /mnt/root
# reiserfsck /dev/md1  # <-- many, many errors

There were no errors in dmesg while copying the data to the
filesystem, or while running reiserfsck.  The filesystem gives tons of
errors if I try to use it as root.  Not sure what else to try.  It's
easy to reproduce, and seems to fail the exact same way every time.

I need this server up soon, so I may just settle for RAID5, but I can
keep it around for testing for a few days.  Let me know if you'd like
access to the machine.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-26 21:38 ` Jim Paris
@ 2004-07-27  2:05   ` Matthew - RAID
  2004-07-27  2:12     ` Jim Paris
  2004-07-30 15:58   ` H. Peter Anvin
  1 sibling, 1 reply; 26+ messages in thread
From: Matthew - RAID @ 2004-07-27  2:05 UTC (permalink / raw)
  To: Jim Paris, linux-raid


On Mon, 26 Jul 2004 17:38:11 -0400, "Jim Paris" <jim@jtan.com> said:
> > Thus, if you have used RAID-6 and have good or bad experiences, I'd
> > like to know them as soon as possible.
> 
> Just tried setting up a RAID-6 on a new server, and I'm seeing
> complete filesystem corruption.
> # cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ;
> cd /
> # umount /mnt/root
> # reiserfsck /dev/md1  # <-- many, many errors

My reading of things was that /proc and any in-use mount points needed
to be handled specially when using tar to do the copy.  Then again, the
--one-file-system argument could be taking care of that; I haven't heard
of using it.

Is it OK to use tar on / including /proc and /dev like this?

cpio, cp, tar - I've seen HOWTOs use all three to copy / !  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27  2:05   ` Matthew - RAID
@ 2004-07-27  2:12     ` Jim Paris
  2004-07-27 16:40       ` Ricky Beam
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-27  2:12 UTC (permalink / raw)
  To: Matthew - RAID; +Cc: linux-raid

> > > Thus, if you have used RAID-6 and have good or bad experiences, I'd
> > > like to know them as soon as possible.
> > 
> > Just tried setting up a RAID-6 on a new server, and I'm seeing
> > complete filesystem corruption.
> > # cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ;
> > cd /
> > # umount /mnt/root
> > # reiserfsck /dev/md1  # <-- many, many errors
> 
> My reading of things was that /proc and any in-use mount points needed
> to be handled specially when using tar to do the copy.  Then again, the
> --one-file-system argument could be taking care of that; I haven't heard
> of using it.
> 
> Is it OK to use tar on / including /proc and /dev like this?

I've done the same thing with setting up RAID-5 in the past, so the
procedure should be okay.  --one-file-system excludes /proc, and
tar handles special files in /dev properly.

I can do more specific tests (writing particular data to the disk and
reading it back), but I'm not sure what patterns would be useful.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27  2:12     ` Jim Paris
@ 2004-07-27 16:40       ` Ricky Beam
  2004-07-27 17:20         ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: Ricky Beam @ 2004-07-27 16:40 UTC (permalink / raw)
  To: Jim Paris; +Cc: linux-raid

On Mon, 26 Jul 2004, Jim Paris wrote:
>tar handles special files in /dev properly.

Don't bet your life on it.  GNU Tar (tm) will work most of the time on
most systems (read: non-linux), but NEVER assume tar will read device
entries correctly. (cpio is the "correct" tool.  dump is the Correct
Tool (tm).)

--Ricky

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27 16:40       ` Ricky Beam
@ 2004-07-27 17:20         ` Jim Paris
  2004-07-27 18:19           ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-27 17:20 UTC (permalink / raw)
  To: Ricky Beam; +Cc: linux-raid

> >tar handles special files in /dev properly.
> 
> Don't bet your life on it.  GNU Tar (tm) will work most of the time on
> most systems (read: non-linux), but NEVER assume tar will read device
> entries correctly. (cpio is the "correct" tool.  dump is the Correct
> Tool (tm).)

Whatever.  My problem is with RAID-6, not tar.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27 17:20         ` Jim Paris
@ 2004-07-27 18:19           ` Jim Paris
  2004-07-27 18:48             ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-27 18:19 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 420 bytes --]

> > >tar handles special files in /dev properly.
> > 
> > Don't bet your life on it.  GNU Tar (tm) will work most of the time on
> > most systems (read: non-linux), but NEVER assume tar will read device
> > entries correctly. (cpio is the "correct" tool.  dump is the Correct
> > Tool (tm).)

Attached is another trace that shows this corruption on a single
regular file copied with "cp" onto a fresh filesystem.

-jim

[-- Attachment #2: typescript --]
[-- Type: text/plain, Size: 3109 bytes --]

Script started on Tue Jul 27 14:11:19 2004
bucket:/# mdadm --stop /dev/md1
bucket:/# mdadm --zero-superblock /dev/hd[gikmo]2
mdadm: /dev/hdg2 does not appear to have an MD superblock.
mdadm: /dev/hdi2 does not appear to have an MD superblock.
mdadm: /dev/hdk2 does not appear to have an MD superblock.
mdadm: /dev/hdm2 does not appear to have an MD superblock.
mdadm: /dev/hdo2 does not appear to have an MD superblock.
bucket:/# for i in /dev/hd[gikmo]2; do dd if=/dev/zero of=$i bs=1M count=100 ; done
100+0 records in
100+0 records out
104857600 bytes transferred in 2.967763 seconds (35332202 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.969712 seconds (35309012 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.973760 seconds (35260950 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.965225 seconds (35362443 bytes/sec)
100+0 records in
100+0 records out
104857600 bytes transferred in 2.976899 seconds (35223769 bytes/sec)
bucket:/# mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing  /dev/hd[gikmo]2
mdadm: array /dev/md1 started.
bucket:/# mkreiserfs /dev/md1
mkreiserfs 3.6.17 (2003 www.namesys.com)

A pair of credits:
Continuing core development of ReiserFS is  mostly paid for by Hans Reiser from
money made selling licenses  in addition to the GPL to companies who don't want
it known that they use ReiserFS  as a foundation for their proprietary product.
And my lawyer asked 'People pay you money for this?'. Yup. Life is good. If you
buy ReiserFS, you can focus on your value add rather than reinventing an entire
FS.

Vitaly Fertman wrote  fsck for V3 and  maintains the reiserfsprogs package now.
He wrote librepair,  userspace plugins repair code, fsck for V4,  and worked on
developing libreiser4 and userspace plugins with Umka.

Guessing about desired format.. Kernel 2.6.7 is running.
Format 3.6 with standard journal
Count of blocks on the device: 244067328
Number of blocks consumed by mkreiserfs formatting process: 15660
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 7fbea519-7d20-40dd-916c-d99136e6e347
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
	ALL DATA WILL BE LOST ON '/dev/md1'!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok

Tell your friends to use a kernel based on 2.4.18 or later, and especially not a
kernel based on 2.4.9, when you use reiserFS. Have fun.

ReiserFS is successfully created on /dev/md1.
bucket:/# dd if=/dev/urandom of=testfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 17.675100 seconds (5932504 bytes/sec)
bucket:/# mount /dev/md1 /mnt/root
bucket:/# cp testfile /mnt/root
bucket:/# umount /mnt/root
bucket:/# mount /dev/md1 /mnt/root
bucket:/# md5sum testfile /mnt/root/testfile
95454f5f81b58f89cdd2f6954d721302  testfile
3b9c4173f3c1c0b315938b5a864f411b  /mnt/root/testfile
bucket:/# exit

Script done on Tue Jul 27 14:15:21 2004

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27 18:19           ` Jim Paris
@ 2004-07-27 18:48             ` Jim Paris
  2004-07-28  3:09               ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-27 18:48 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 293 bytes --]

> Attached is another trace that shows this corruption on a single
> regular file copied with "cp" onto a fresh filesystem.

And here's a trace showing problems even without a filesystem.
Writing data near the end is fatal.  This does not happen if I write
to the first 1G of the array.

-jim

[-- Attachment #2: typescript --]
[-- Type: text/plain, Size: 5326 bytes --]

Script started on Tue Jul 27 14:44:26 2004
bucket:~# mdadm --stop /dev/md1
bucket:~# mdadm --zero-superblock /dev/hd[gikmo]2
bucket:~# for i in /dev/hd[gikmo]2 ; do dd if=/dev/zero of=$i bs=1M count=1 ; done
1+0 records in
1+0 records out
1048576 bytes transferred in 0.032001 seconds (32766876 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.032197 seconds (32567587 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.032125 seconds (32640483 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.032064 seconds (32702568 bytes/sec)
1+0 records in
1+0 records out
1048576 bytes transferred in 0.032315 seconds (32448665 bytes/sec)
bucket:~# mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd[gikmo]2
mdadm: array /dev/md1 started.
bucket:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] 
md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1]
      976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU]
      
md0 : active raid1 hdo1[5] hdm1[4] hdk1[3] hdi1[2] hdg1[1]
      128384 blocks [6/5] [_UUUUU]
      
unused devices: <none>
bucket:~# dd if=/dev/urandom of=/dev/md1 bs=1M count=100 seek=900000
100+0 records in
100+0 records out
104857600 bytes transferred in 19.095634 seconds (5491182 bytes/sec)
bucket:~# mdadm --stop /dev/md1
bucket:~# mdadm --run /dev/md1
mdadm: failed to run array /dev/md1: Invalid argument
bucket:~# dmesg
md: bind<hdg2>
md: bind<hdi2>
md: bind<hdk2>
md: bind<hdm2>
md: bind<hdo2>
raid6: device hdo2 operational as raid disk 5
raid6: device hdm2 operational as raid disk 4
raid6: device hdk2 operational as raid disk 3
raid6: device hdi2 operational as raid disk 2
raid6: device hdg2 operational as raid disk 1
raid6: allocated 6269kB for md1
raid6: raid level 6 set md1 active with 5 out of 6 devices, algorithm 2
RAID6 conf printout:
 --- rd:6 wd:5 fd:1
 disk 1, o:1, dev:hdg2
 disk 2, o:1, dev:hdi2
 disk 3, o:1, dev:hdk2
 disk 4, o:1, dev:hdm2
 disk 5, o:1, dev:hdo2
md: md1 stopped.
md: unbind<hdo2>
md: export_rdev(hdo2)
md: unbind<hdm2>
md: export_rdev(hdm2)
md: unbind<hdk2>
md: export_rdev(hdk2)
md: unbind<hdi2>
md: export_rdev(hdi2)
md: unbind<hdg2>
md: export_rdev(hdg2)
md: bug in file drivers/md/md.c, line 1513

md:	**********************************
md:	* <COMPLETE RAID STATE PRINTOUT> *
md:	**********************************
md1: 
md0: <hdo1><hdm1><hdk1><hdi1><hdg1>
md: rdev hdo1, SZ:00128384 F:0 S:1 DN:5
md: rdev superblock:
md:  SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab
md:     L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0
md:     UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d898 E:0000001c
     D  0:  DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8>
     D  1:  DISK<N:1,hdg1(34,1),R:1,S:6>
     D  2:  DISK<N:2,hdi1(56,1),R:2,S:6>
     D  3:  DISK<N:3,hdk1(57,1),R:3,S:6>
     D  4:  DISK<N:4,hdm1(88,1),R:4,S:6>
     D  5:  DISK<N:5,hdo1(89,1),R:5,S:6>
md:     THIS:  DISK<N:5,hdo1(89,1),R:5,S:6>
md: rdev hdm1, SZ:00128384 F:0 S:1 DN:4
md: rdev superblock:
md:  SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab
md:     L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0
md:     UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d895 E:0000001c
     D  0:  DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8>
     D  1:  DISK<N:1,hdg1(34,1),R:1,S:6>
     D  2:  DISK<N:2,hdi1(56,1),R:2,S:6>
     D  3:  DISK<N:3,hdk1(57,1),R:3,S:6>
     D  4:  DISK<N:4,hdm1(88,1),R:4,S:6>
     D  5:  DISK<N:5,hdo1(89,1),R:5,S:6>
md:     THIS:  DISK<N:4,hdm1(88,1),R:4,S:6>
md: rdev hdk1, SZ:00128384 F:0 S:1 DN:3
md: rdev superblock:
md:  SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab
md:     L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0
md:     UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d874 E:0000001c
     D  0:  DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8>
     D  1:  DISK<N:1,hdg1(34,1),R:1,S:6>
     D  2:  DISK<N:2,hdi1(56,1),R:2,S:6>
     D  3:  DISK<N:3,hdk1(57,1),R:3,S:6>
     D  4:  DISK<N:4,hdm1(88,1),R:4,S:6>
     D  5:  DISK<N:5,hdo1(89,1),R:5,S:6>
md:     THIS:  DISK<N:3,hdk1(57,1),R:3,S:6>
md: rdev hdi1, SZ:00128384 F:0 S:1 DN:2
md: rdev superblock:
md:  SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab
md:     L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0
md:     UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d871 E:0000001c
     D  0:  DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8>
     D  1:  DISK<N:1,hdg1(34,1),R:1,S:6>
     D  2:  DISK<N:2,hdi1(56,1),R:2,S:6>
     D  3:  DISK<N:3,hdk1(57,1),R:3,S:6>
     D  4:  DISK<N:4,hdm1(88,1),R:4,S:6>
     D  5:  DISK<N:5,hdo1(89,1),R:5,S:6>
md:     THIS:  DISK<N:2,hdi1(56,1),R:2,S:6>
md: rdev hdg1, SZ:00128384 F:0 S:1 DN:1
md: rdev superblock:
md:  SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab
md:     L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0
md:     UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d859 E:0000001c
     D  0:  DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8>
     D  1:  DISK<N:1,hdg1(34,1),R:1,S:6>
     D  2:  DISK<N:2,hdi1(56,1),R:2,S:6>
     D  3:  DISK<N:3,hdk1(57,1),R:3,S:6>
     D  4:  DISK<N:4,hdm1(88,1),R:4,S:6>
     D  5:  DISK<N:5,hdo1(89,1),R:5,S:6>
md:     THIS:  DISK<N:1,hdg1(34,1),R:1,S:6>
md:	**********************************

bucket:~# exit

Script done on Tue Jul 27 14:44:51 2004

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-27 18:48             ` Jim Paris
@ 2004-07-28  3:09               ` Jim Paris
  2004-07-28  8:36                 ` David Greaves
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-28  3:09 UTC (permalink / raw)
  To: linux-raid

> And here's a trace showing problems even without a filesystem.
> Writing data near the end is fatal.  This does not happen if I write
> to the first 1G of the array.

Sorry, that test was bogus, and I needed to learn how to use mdadm. 
I haven't actually managed to cause corruption on a raw device with no
filesystem.  However, copying a single 200MB file onto Reiserfs will
cause corruption.  It takes a lot more work (e.g. actually copying an
installed system onto it), but XFS shows eventual corruption as well,
so it's not specific to the filesystem type.

I see no problems if I start the array with a complete set of disks;
the corruption only happens if it starts degraded (tested with both 1
and 2 disks missing, and with the missing disks being at both the
beginning and the end).  This happens on Linux 2.6.3 and 2.6.7, with
mdadm 1.5.0 and 1.4.0, with and without CONFIG_LBD.  RAID-5 works
correctly in all tested configurations.  I have tried varying the
number of disks in the array.

Interestingly, if I start it with all disks, it starts reconstructing
immediately.  If I start it with only one disk missing, it does not
reconstruct anything.  Shouldn't it be creating one of P or Q?

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-28  3:09               ` Jim Paris
@ 2004-07-28  8:36                 ` David Greaves
  2004-07-28 10:02                   ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: David Greaves @ 2004-07-28  8:36 UTC (permalink / raw)
  To: linux-raid

Jim Paris wrote:

>>And here's a trace showing problems even without a filesystem.
>>Writing data near the end is fatal.  This does not happen if I write
>>to the first 1G of the array.
>>    
>>
>
>Sorry, that test was bogus, and I needed to learn how to use mdadm. 
>I haven't actually managed to cause corruption on a raw device with no
>filesystem.  However, copying a single 200MB file onto Reiserfs will
>cause corruption.  It takes a lot more work (e.g. actually copying an
>installed system onto it), but XFS shows eventual corruption as well,
>so it's not specific to the filesystem type.
>
>I see no problems if I start the array with a complete set of disks;
>the corruption only happens if it starts degraded (tested with both 1
>and 2 disks missing, and with the missing disks being at both the
>beginning and the end).  This happens on Linux 2.6.3 and 2.6.7, with
>mdadm 1.5.0 and 1.4.0, with and without CONFIG_LBD.  RAID-5 works
>correctly in all tested configurations.  I have tried varying the
>number of disks in the array.
>
>  
>
FWIW a month or so ago I used mdadm + 2.6.4 and constructed a 5x250Gb 
RAID 5 array with one drive missing.
When I added the missing drive and reconstruction had finished I had fs 
corruption.

I used the reiser tools to fix it but lost an awful lot of data.

I reported it in detail here 
[http://marc.theaimsgroup.com/?l=linux-raid&m=108687793611905&w=2] and 
got zero response <shrug>

Since then it's been fine.

I don't have much faith in it though ;)

David
PS around that time there was a patch 
[http://marc.theaimsgroup.com/?l=linux-raid&m=108635099921570&w=2]
for a bug in the RAID5 resync code.
it was only for the raid5.c
It doesn't look raid5 algorithm specific ... :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-28  8:36                 ` David Greaves
@ 2004-07-28 10:02                   ` Jim Paris
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Paris @ 2004-07-28 10:02 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

Hi David,

> FWIW a month or so ago I used mdadm + 2.6.4 and constructed a 5x250Gb 
> RAID 5 array with one drive missing.
> When I added the missing drive and reconstruction had finished I had fs 
> corruption.
> 
> I used the reiser tools to fix it but lost an awful lot of data.
> 
> I reported it in detail here 
> [http://marc.theaimsgroup.com/?l=linux-raid&m=108687793611905&w=2] and 
> got zero response <shrug>

Yeah, I saw that posting.  For me, raid5 appears to work fine,
although like you, my faith is dropping. :)  For all I know, my RAID6
problems could also exist in the very-similar RAID5 code, but just not
show up as often.

> PS around that time there was a patch 
> [http://marc.theaimsgroup.com/?l=linux-raid&m=108635099921570&w=2]
> for a bug in the RAID5 resync code.
> it was only for the raid5.c
> It doesn't look raid5 algorithm specific ... :)

Thanks for the tip.  Unfortunately, that fix was already in the RAID6
code in 2.6.7.  Just in case, I upgraded my kernel to 2.6.8-rc2, which
includes that patch, and still have the same problem.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-26 21:38 ` Jim Paris
  2004-07-27  2:05   ` Matthew - RAID
@ 2004-07-30 15:58   ` H. Peter Anvin
  2004-07-30 19:39     ` Jim Paris
  1 sibling, 1 reply; 26+ messages in thread
From: H. Peter Anvin @ 2004-07-30 15:58 UTC (permalink / raw)
  To: linux-raid

Followup to:  <20040726213811.GA17363@jim.sh>
By author:    Jim Paris <jim@jtan.com>
In newsgroup: linux.dev.raid
>
> > Thus, if you have used RAID-6 and have good or bad experiences, I'd
> > like to know them as soon as possible.
> 
> Just tried setting up a RAID-6 on a new server, and I'm seeing
> complete filesystem corruption.
> 
> I have 6 250GB disks, and want them all in the array.  I've created it
> degraded, with the first disk missing, since that disk temporarily
> holds the system.
> 
> Using kernel 2.6.7, mdadm 1.6.0, I did something like this:
> 
> # mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd{g,i,k,m,o}2
> 
> which gives me:
> 
> md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1]
>       976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU]
> 

Okay, found the messages...

Can you create failures by creating a full array and then fail out
drives?  That would rule out problems with the way mdadm creates the
array.

**** When the array is just created, it's not synchronized!!! ****

Thus, when the array is first created it needs to finish synchronizing
before it's usable.  My current guess based on what I've seen so far
is that it's a bug in mdadm in creating arrays with exactly 1 missing
drive, as opposed to a kernel bug.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 15:58   ` H. Peter Anvin
@ 2004-07-30 19:39     ` Jim Paris
  2004-07-30 19:45       ` H. Peter Anvin
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-07-30 19:39 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

> I haven't seen any of those messages, so this is the first case
> happening.

I figured you were just busy, but wanted to see if anyone else could
guide my debugging before my boss made me give up and do RAID-5 :)
Thanks for the reply.

> Can you create failures by creating a full array and then fail out
> drives?  That would rule out problems with the way mdadm creates the
> array.

Yes, same problem.  If I create a full array with 6 devices, wait for
it to finish the synchronizing, then fail the first drive, I see the
same corruption.  See attached r6test-full.sh to demonstrate.

> My current guess based on what I've seen so far is that it's a bug
> in mdadm in creating arrays with exactly 1 missing drive, as opposed
> to a kernel bug.

FWIW, this does occur with an array created with 2 missing drives, as
well.

-jim

[-- Attachment #2: r6test-full.sh --]
[-- Type: application/x-sh, Size: 1039 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 19:39     ` Jim Paris
@ 2004-07-30 19:45       ` H. Peter Anvin
  0 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2004-07-30 19:45 UTC (permalink / raw)
  To: Jim Paris; +Cc: linux-raid

Jim Paris wrote:
> 
> Yes, same problem.  If I create a full array with 6 devices, wait for
> it to finish the synchronizing, then fail the first drive, I see the
> same corruption.  See attached r6test-full.sh to demonstrate.
> 

Okay, I will look at this next week (I'm travelling and about to step 
onto a plane...)

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin
  2004-07-26 21:38 ` Jim Paris
@ 2004-07-30 21:11 ` maarten van den Berg
  2004-07-30 21:38   ` maarten van den Berg
  2004-08-05 23:46   ` H. Peter Anvin
  1 sibling, 2 replies; 26+ messages in thread
From: maarten van den Berg @ 2004-07-30 21:11 UTC (permalink / raw)
  To: linux-raid

On Saturday 24 July 2004 01:32, H. Peter Anvin wrote:
> I'm considering removing the "experimental" label from RAID-6.  It
> appears at this point to be just as stable as RAID-5 (since it's based
> on RAID-5 code that's obviously all that can be expected.)

This encouraged me to try it today...

> Thus, if you have used RAID-6 and have good or bad experiences, I'd
> like to know them as soon as possible.

I'm still early in the testing phase, so nothing to report as yet.  
But I have a question:  I tried to reproduce a reported issue when creating a 
degraded raid6 array.  But when I created a raid6 array with one disk 
missing, /proc/mdstat reported no resync going on.  Am I not correct in 
assuming that raid6 with 1 missing drive should at least start resyncing the 
other drive(s) ?  It would only be really degraded with two missing drives...

So instead, I defined a full raid6 array which it is now resyncing...
My resync speed is rather slow (6000K/sec). I'll have to compare it to 
resyncing a raid5 array though before concluding anything from that.  Cause 
this system is somewhat CPU challenged indeed: a lowly celeron 500.

I will try to run some script(s) provided on this list to see if I can 
reproduce anything.

System info:

SuSE 9.1 from DVD media, 
	(with all updates installed _PRIOR_ to creating the array)
Kernel 2.6.5-7.95
mdadm - v1.5.0 - 22 Jan 2004

Harddisks and/or controllers:
one 160 GB ATA off the onboard controller (hda)
two 160GB SATA off a promise 150TX2 (as sda and sdb)
two 160 GB SATA off a SiI 3112 controller (as hde and hdg)

Maarten

-- 
When I answered where I wanted to go today, they just hung up -- Unknown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 21:11 ` maarten van den Berg
@ 2004-07-30 21:38   ` maarten van den Berg
  2004-07-31  0:28     ` maarten van den Berg
  2004-08-05 23:51     ` H. Peter Anvin
  2004-08-05 23:46   ` H. Peter Anvin
  1 sibling, 2 replies; 26+ messages in thread
From: maarten van den Berg @ 2004-07-30 21:38 UTC (permalink / raw)
  To: linux-raid

On Friday 30 July 2004 23:11, maarten van den Berg wrote:
> On Saturday 24 July 2004 01:32, H. Peter Anvin wrote:

> I'm still early in the testing phase, so nothing to report as yet.
> But I have a question:  I tried to reproduce a reported issue when creating
> a degraded raid6 array.  But when I created a raid6 array with one disk
> missing, /proc/mdstat reported no resync going on.  Am I not correct in
> assuming that raid6 with 1 missing drive should at least start resyncing
> the other drive(s) ?  It would only be really degraded with two missing
> drives...
>
> So instead, I defined a full raid6 array which it is now resyncing...
> My resync speed is rather slow (6000K/sec). I'll have to compare it to
> resyncing a raid5 array though before concluding anything from that.  Cause
> this system is somewhat CPU challenged indeed: a lowly celeron 500.

To confirm, after stopping the raid6 array (didn't want to wait this long) I 
created a raid5 array on the same machine and it resyncs at 14000K/sec.
Is this expected behaviour, the 6M/sec for raid6 vs 14M/sec for raid5 ?
I suppose raid6 has to sync two drives, which would maybe explain the speed 
difference(?)   In any case, hdparm -tT report 50M/sec on each single drive. 
Is this discrepancy in speed normal ?
(yes yes, I played with the /proc/sys/dev/raid/ speed settings (to no avail))

Maarten

-- 
When I answered where I wanted to go today, they just hung up -- Unknown


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 21:38   ` maarten van den Berg
@ 2004-07-31  0:28     ` maarten van den Berg
  2004-08-01 13:03       ` Kernel panic, FS corruption Was: " maarten van den Berg
  2004-08-05 23:51     ` H. Peter Anvin
  1 sibling, 1 reply; 26+ messages in thread
From: maarten van den Berg @ 2004-07-31  0:28 UTC (permalink / raw)
  To: linux-raid

On Friday 30 July 2004 23:38, maarten van den Berg wrote:
> On Friday 30 July 2004 23:11, maarten van den Berg wrote:
> > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote:

Again replying to myself.   I have a full report now.

Realizing this all took way too much time I started from scratch and defined 
multiple small partitions (2GB) and defined a raid6 array on one set and a 
raid5 array on the other. Both are full arrays; no missing drives. I used 
reiserfs on both. Hard- and software specs as before, back in the thread.

I tested it by copying trees from / to the respective raid arrays and running 
md5sum on the source and the copies (and repeating after reboots).
Then I went and disconnected SATA cables to get them degraded. The first cable 
went perfect, both arrays came up fine and a md5sum on the available files, 
and a new copy + md5sum on that went fine too.
The second cable however, went wrong; I inadvertently moved a third cable so I 
was left with three missing devices, so let's skip over that: when I 
reattached that cable the md1 raid6 device was still fine, with two failed 
drives. I did the <copy new stuff, run md5sum over it> thing again.

Then I reattached all cables. I did verify the md5sums before refilling the 
raid6 array using mdadm -a, and did that afterwards too.  To my astonishment, 
the raid5 array was back up again.  I thought raid5 with two drives missing 
was deactivated, but obviously things have changed now and a missing drive 
does not equal anymore a failed drive.  I presume.  
/proc/mdstat just after booting looked like this:

Personalities : [raid1] [raid5] [raid6]
md1 : active raid6 hdg3[2] hda3[0] sda3[3]
      5879424 blocks level 6, 64k chunk, algorithm 2 [5/3] [U_UU_]

md2 : active raid5 hdg4[2] hde4[1] hda4[0] sda4[3]
      7839232 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]

md0 : active raid1 sda1[1] hda1[0]
      1574272 blocks [3/2] [UU_]

The md5sums after hotadding were the same as before and verified fine.

Now seen as the <disconnect cable> trick doesn't mark a drive failed, should I 
now repeat the tests with marking failed by either doing that through mdadm 
or maybe pull the cable while the system is up ?  Cause I'm not totally 
convinced now that the array got marked degraded. I could mount it with two 
drives missing [raid6], but the fact that the raid5 device didn't get broken 
puzzles me a bit...  

Oh well, since I'm just experimenting I'll take the plunge anyway and pull a 
live cable now:
...
Well, the first thing to observe is that the system becomes unresponsive 
immediately. New logins don't spawn, and /var/logmessages says this:
	kernel: ATA: abnormal status 0x7F on port 0xD481521C
Now even the keyboard doesn't respond anymore...  reset-button !

Upon reboot, mdadm --detail reports the missing disk as "removed", not failed.
But maybe that is the same(?).  Rebooting again after reattaching the cable, 
this time the arrays stayed degraded.  I ran the ubiquitous md5sums but found 
nothing wrong either before hotadding the missing drives and after.

So, at least in my experience raid6 works fine.  Also, the problems reported 
with SuSE 9.1 could not be observed (probably due to updating the kernel).
Moreover, it also seems the underlying SATA is stable [with these cards], 
which I'm very glad to notice, reading some of the stories...

More version-info etcetera upon request.

Maarten

P.S.: My resync speed stays this low.  Anything that can be done...? 

-- 
When I answered where I wanted to go today, they just hung up -- Unknown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-07-31  0:28     ` maarten van den Berg
@ 2004-08-01 13:03       ` maarten van den Berg
  2004-08-01 18:05         ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: maarten van den Berg @ 2004-08-01 13:03 UTC (permalink / raw)
  To: linux-raid

On Saturday 31 July 2004 02:28, maarten van den Berg wrote:
> On Friday 30 July 2004 23:38, maarten van den Berg wrote:
> > On Friday 30 July 2004 23:11, maarten van den Berg wrote:
> > > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote:


I eventually got a kernel panic when copying large amounts of data to a 
[degraded] raid6 array, which this time was the full 600 GB size.
Don't know if it is helpful to anyone but info below:

Message from syslogd@agent2 at Sun Aug  1 08:59:28 2004 ...
agent2 kernel: REISERFS: panic (device Null superblock): vs-6025: 
check_internal_block_head: invalid level level=58989, nr_items=6145, 
free_space=39964 rdkey
 
Umount didn't work, neither did shutdown. After reset I have FS corruption, 
according to reiserfsck:

agent2:~ # cat /proc/mdstat
Personalities : [raid1] [raid6]
md1 : active raid6 hdg3[3] hde3[2] hda3[0] sda3[4] sdb3[5]
      618437888 blocks level 6, 64k chunk, algorithm 2 [6/5] [U_UUUU]

md0 : active raid1 sdb1[2] sda1[3] hda1[0] hde1[1] hdg1[4]
      1574272 blocks [3/3] [UUU]

unused devices: <none>
agent2:~ # reiserfsck /dev/md1
reiserfsck 3.6.13 (2003 www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and  it fails **
** please  email bug reports to reiserfs-list@namesys.com, **
** providing  as  much  information  as  possible --  your **
** hardware,  kernel,  patches,  settings,  all reiserfsck **
** messages  (including version),  the reiserfsck logfile, **
** check  the  syslog file  for  any  related information. **
** If you would like advice on using this program, support **
** is available  for $25 at  www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/md1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sun Aug  1 14:45:08 2004
###########
Replaying journal..
Trans replayed: mountid 10, transid 2171, desc 5755, len 30, commit 5786, next 
trans offset 5769
Trans replayed: mountid 10, transid 2172, desc 5787, len 14, commit 5802, next 
trans offset 5785
Trans replayed: mountid 10, transid 2173, desc 5803, len 23, commit 5827, next 
trans offset 5810
Trans replayed: mountid 10, transid 2174, desc 5828, len 27, commit 5856, next 
trans offset 5839
Trans replayed: mountid 10, transid 2175, desc 5857, len 25, commit 5883, next 
trans offset 5866
Trans replayed: mountid 10, transid 2176, desc 5884, len 27, commit 5912, next 
trans offset 5895
Trans replayed: mountid 10, transid 2177, desc 5913, len 26, commit 5940, next 
trans offset 5923
Trans replayed: mountid 10, transid 2178, desc 5941, len 24, commit 5966, next 
trans offset 5949
Reiserfs journal '/dev/md1' in blocks [18..8211]: 8 transactions replayed
Checking internal tree../  1 (of   2)/  3 (of 128)/ 12 (of 170)block 67043329: 
The level of the node (65534) is not correct, (1) expected
 the problem in the internal node occured (67043329), whole subtree is skipped
/ 14 (of 128)/105 (of 133)block 139100161: The level of the node (65534) is 
not correct, (1) expected
 the problem in the internal node occured (139100161), whole subtree is 
skipped
/ 15 (of 128)/ 23 (of 170)block 5701633: The level of the node (44292) is not 
correct, (1) expected
 the problem in the internal node occured (5701633), whole subtree is skipped
/ 16 (of 128)/ 80 (of 170)block 109215745: The level of the node (65534) is 
not correct, (1) expected

[snip much more of the same...]

 the problem in the internal node occured (4718593), whole subtree is skipped
/120 (of 133)/ 47 (of 170)block 59801637: The level of the node (65534) is not 
correct, (1) expected
 the problem in the internal node occured (59801637), whole subtree is skipped
/123 (of 133)/ 72 (of 169)block 126386304: The level of the node (4828) is not 
correct, (1) expected
 the problem in the internal node occured (126386304), whole subtree is 
skipped
/124 (of 133)block 126386316: The level of the node (58989) is not correct, 
(2) expected
 the problem in the internal node occured (126386316), whole subtree is 
skipped
finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
92 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Sun Aug  1 14:47:17 2004
###########


Hours before the kernel panic, during a copy, I see tons of this in syslog:

Aug  1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 
65534 does not match to the expected o
ne 1
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: 
invalid format found in block 6704
3329. Fsck?
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: 
reiserfs_read_locked_inode: i/o failure occurred
 trying to find stat data of [130 132 0x0 SD]
Aug  1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 
65534 does not match to the expected o
ne 1
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: 
invalid format found in block 6704
3329. Fsck?
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: 
reiserfs_read_locked_inode: i/o failure occurred
 trying to find stat data of [130 132 0x0 SD]
Aug  1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 
65534 does not match to the expected o
ne 1
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: 
invalid format found in block 6704
3329. Fsck?
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: 
reiserfs_read_locked_inode: i/o failure occurred
 trying to find stat data of [130 132 0x0 SD]
Aug  1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 
65534 does not match to the expected o
ne 1
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: 
invalid format found in block 6704
3329. Fsck?
Aug  1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: 
reiserfs_read_locked_inode: i/o failure occurred
 trying to find stat data of [130 132 0x0 SD]

This lasted about a minute -last entry dated Aug  1 04:16:46- but logged 
thousands of lines during that.  Then syslog is quiet again until the kernel 
panic occurs:

Aug  1 08:49:55 agent2 -- MARK --
Aug  1 08:59:00 agent2 /USR/SBIN/CRON[8553]: (root) CMD ( rm -f /var/spool/
cron/lastrun/cron.hourly)
Aug  1 08:59:28 agent2 kernel: REISERFS: panic (device Null superblock): 
vs-6025: check_internal_block_head: inva
lid level level=58989, nr_items=6145, free_space=39964 rdkey
Aug  1 08:59:28 agent2 kernel: ------------[ cut here ]------------
Aug  1 08:59:28 agent2 kernel: kernel BUG at fs/reiserfs/prints.c:362!
Aug  1 08:59:28 agent2 kernel: invalid operand: 0000 [#1]
Aug  1 08:59:28 agent2 kernel: CPU:    0
Aug  1 08:59:28 agent2 kernel: EIP:    0060:[__crc_ide_end_request
+942296/1608427]    Not tainted
Aug  1 08:59:28 agent2 kernel: EIP:    0060:[<d48ad7c1>]    Not tainted
Aug  1 08:59:28 agent2 kernel: EFLAGS: 00010286   (2.6.5-7.95-default)
Aug  1 08:59:28 agent2 kernel: EIP is at reiserfs_panic+0x31/0x60 [reiserfs]
Aug  1 08:59:28 agent2 kernel: eax: 00000093   ebx: 00000000   ecx: 00000002   
edx: d2181f38
Aug  1 08:59:28 agent2 kernel: esi: d255b000   edi: ccd43d48   ebp: 0000002a   
esp: c3415898
Aug  1 08:59:28 agent2 kernel: ds: 007b   es: 007b   ss: 0068
Aug  1 08:59:28 agent2 kernel: Process cp (pid: 8456, threadinfo=c3414000 
task=d18f4700)
Aug  1 08:59:29 agent2 kernel: Stack: d48c5a0c d48c34fe d48d1520 000003f0 
d48ad85a 00000000 d48c5a54 ccd43d48
Aug  1 08:59:29 agent2 kernel:        000003f0 c3415924 d255b2a8 d48b161e 
d255b000 c4cb9800 00000000 000017d8
Aug  1 08:59:29 agent2 kernel:        ccd43d48 d0a7fa3c 00000000 00000001 
c3415914 c3415924 d0a7fa3c 00000001
Aug  1 08:59:29 agent2 kernel: Call Trace:
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+942449/1608427] 
check_internal+0x6a/0x80 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48ad85a>] check_internal+0x6a/0x80 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+958261/1608427] 
internal_move_pointers_items+0x1be/0x2c0 [
reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b161e>] internal_move_pointers_items
+0x1be/0x2c0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+958904/1608427] 
internal_shift_right+0xb1/0xd0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b18a1>] internal_shift_right+0xb1/0xd0 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+959947/1608427] 
balance_internal+0x174/0xae0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b1cb4>] balance_internal+0x174/0xae0 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+424174/1608427] 
ata_qc_issue+0xf7/0x2a0 [libata]
Aug  1 08:59:29 agent2 kernel:  [<d482efd7>] ata_qc_issue+0xf7/0x2a0 [libata]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+985323/1608427] 
get_cnode+0x14/0x70 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b7fd4>] get_cnode+0x14/0x70 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+991353/1608427] 
journal_mark_dirty+0x102/0x230 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b9762>] journal_mark_dirty+0x102/0x230 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+950897/1608427] 
leaf_delete_items_entirely+0x15a/0x200 [re
iserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48af95a>] leaf_delete_items_entirely
+0x15a/0x200 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+950259/1608427] 
leaf_paste_in_buffer+0x1fc/0x320 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48af6dc>] leaf_paste_in_buffer+0x1fc/0x320 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+859729/1608427] 
do_balance+0x78a/0x3160 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d489953a>] do_balance+0x78a/0x3160 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [autoremove_wake_function+0/48] 
autoremove_wake_function+0x0/0x30
Aug  1 08:59:29 agent2 kernel:  [<c011f1c0>] autoremove_wake_function+0x0/0x30
Aug  1 08:59:29 agent2 kernel:  [submit_bh+393/544] submit_bh+0x189/0x220
Aug  1 08:59:29 agent2 kernel:  [<c0159f49>] submit_bh+0x189/0x220
Aug  1 08:59:29 agent2 kernel:  [__bread+81/160] __bread+0x51/0xa0
Aug  1 08:59:29 agent2 kernel:  [<c015d221>] __bread+0x51/0xa0
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+921709/1608427] 
get_neighbors+0xe6/0x140 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48a8756>] get_neighbors+0xe6/0x140 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+921750/1608427] 
get_neighbors+0x10f/0x140 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48a877f>] get_neighbors+0x10f/0x140 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [wake_up_buffer+5/32] wake_up_buffer+0x5/0x20
Aug  1 08:59:29 agent2 kernel:  [<c015b2d5>] wake_up_buffer+0x5/0x20
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+986558/1608427] 
reiserfs_prepare_for_journal+0x47/0x70 [re
iserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b84a7>] reiserfs_prepare_for_journal
+0x47/0x70 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+924363/1608427] 
fix_nodes+0x884/0x1ba0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48a91b4>] fix_nodes+0x884/0x1ba0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+975120/1608427] 
reiserfs_paste_into_item+0x1d9/0x220 [reis
erfs]
Aug  1 08:59:29 agent2 kernel:  [<d48b57f9>] reiserfs_paste_into_item
+0x1d9/0x220 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+874042/1608427] 
reiserfs_add_entry+0x293/0x430 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d489cd23>] reiserfs_add_entry+0x293/0x430 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+878853/1608427] 
reiserfs_create+0x11e/0x1e0 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d489dfee>] reiserfs_create+0x11e/0x1e0 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+1016040/1608427] 
reiserfs_permission+0x1/0x10 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48bf7d1>] reiserfs_permission+0x1/0x10 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [__crc_ide_end_request+1016046/1608427] 
reiserfs_permission+0x7/0x10 [reiserfs]
Aug  1 08:59:29 agent2 kernel:  [<d48bf7d7>] reiserfs_permission+0x7/0x10 
[reiserfs]
Aug  1 08:59:29 agent2 kernel:  [vfs_create+153/304] vfs_create+0x99/0x130
Aug  1 08:59:29 agent2 kernel:  [<c01656f9>] vfs_create+0x99/0x130
Aug  1 08:59:29 agent2 kernel:  [open_namei+830/1072] open_namei+0x33e/0x430
Aug  1 08:59:29 agent2 kernel:  [<c016772e>] open_namei+0x33e/0x430
Aug  1 08:59:29 agent2 kernel:  [filp_open+78/128] filp_open+0x4e/0x80
Aug  1 08:59:29 agent2 kernel:  [<c0155b8e>] filp_open+0x4e/0x80
Aug  1 08:59:29 agent2 kernel:  [sys_open+131/208] sys_open+0x83/0xd0
Aug  1 08:59:29 agent2 kernel:  [<c0155c43>] sys_open+0x83/0xd0
Aug  1 08:59:29 agent2 kernel:  [sysenter_past_esp+82/121] sysenter_past_esp
+0x52/0x79
Aug  1 08:59:29 agent2 kernel:  [<c0107dc9>] sysenter_past_esp+0x52/0x79
Aug  1 08:59:29 agent2 kernel:
Aug  1 08:59:29 agent2 kernel: Code: 0f 0b 6a 01 0e 35 8c d4 b8 fe 34 8c d4 83 
c4 0c 85 db 74 06
Aug  1 09:09:55 agent2 -- MARK --
Aug  1 09:29:55 agent2 -- MARK --


Maarten


-- 
When I answered where I wanted to go today, they just hung up -- Unknown


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-01 13:03       ` Kernel panic, FS corruption Was: " maarten van den Berg
@ 2004-08-01 18:05         ` Jim Paris
  2004-08-01 22:10           ` maarten van den Berg
  2004-08-05 23:54           ` H. Peter Anvin
  0 siblings, 2 replies; 26+ messages in thread
From: Jim Paris @ 2004-08-01 18:05 UTC (permalink / raw)
  To: maarten van den Berg; +Cc: linux-raid

> I eventually got a kernel panic when copying large amounts of data to a 
> [degraded] raid6 array, which this time was the full 600 GB size.
> Don't know if it is helpful to anyone but info below:

The panic is from reiserfs, and it's occuring because the FS is
getting corrupted due to the raid6 problems.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-01 18:05         ` Jim Paris
@ 2004-08-01 22:10           ` maarten van den Berg
  2004-08-05 23:54           ` H. Peter Anvin
  1 sibling, 0 replies; 26+ messages in thread
From: maarten van den Berg @ 2004-08-01 22:10 UTC (permalink / raw)
  To: linux-raid

On Sunday 01 August 2004 20:05, you wrote:
> > I eventually got a kernel panic when copying large amounts of data to a
> > [degraded] raid6 array, which this time was the full 600 GB size.
> > Don't know if it is helpful to anyone but info below:
>
> The panic is from reiserfs, and it's occuring because the FS is
> getting corrupted due to the raid6 problems.

Ok.  Thanks.  I expected as much.  I will now try to make a raid 5 array 
instead, and make double sure I do not suffer the same fate.
I am cautious because I do not want to be bitten by a bug in this SuSE kernel 
or a bug in one of the SATA drivers. 

Pity though, that this was not discovered when testing with the smallish 
arrays I ran yesterday.  Seems like it takes a lot of beating to reproduce.

Maarten

> -jim

-- 
When I answered where I wanted to go today, they just hung up -- Unknown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 21:11 ` maarten van den Berg
  2004-07-30 21:38   ` maarten van den Berg
@ 2004-08-05 23:46   ` H. Peter Anvin
  1 sibling, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2004-08-05 23:46 UTC (permalink / raw)
  To: linux-raid

Followup to:  <200407302311.04942.maarten@ultratux.net>
By author:    maarten van den Berg <maarten@ultratux.net>
In newsgroup: linux.dev.raid
> 
> I'm still early in the testing phase, so nothing to report as yet.  
> But I have a question:  I tried to reproduce a reported issue when creating a 
> degraded raid6 array.  But when I created a raid6 array with one disk 
> missing, /proc/mdstat reported no resync going on.  Am I not correct in 
> assuming that raid6 with 1 missing drive should at least start resyncing the 
> other drive(s) ?  It would only be really degraded with two missing drives...
> 

This is correct; when an array is first created it needs resync, and
with less than two drives missing this should happen.

> So instead, I defined a full raid6 array which it is now resyncing...
> My resync speed is rather slow (6000K/sec). I'll have to compare it to 
> resyncing a raid5 array though before concluding anything from that.  Cause 
> this system is somewhat CPU challenged indeed: a lowly celeron 500.

The RAID-6 computations on that system will be quite slow indeed.  At
least you have MMX.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Call for RAID-6 users
  2004-07-30 21:38   ` maarten van den Berg
  2004-07-31  0:28     ` maarten van den Berg
@ 2004-08-05 23:51     ` H. Peter Anvin
  1 sibling, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2004-08-05 23:51 UTC (permalink / raw)
  To: linux-raid

Followup to:  <200407302338.33823.maarten@ultratux.net>
By author:    maarten van den Berg <maarten@ultratux.net>
In newsgroup: linux.dev.raid
>
> On Friday 30 July 2004 23:11, maarten van den Berg wrote:
> > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote:
> 
> > I'm still early in the testing phase, so nothing to report as yet.
> > But I have a question:  I tried to reproduce a reported issue when creating
> > a degraded raid6 array.  But when I created a raid6 array with one disk
> > missing, /proc/mdstat reported no resync going on.  Am I not correct in
> > assuming that raid6 with 1 missing drive should at least start resyncing
> > the other drive(s) ?  It would only be really degraded with two missing
> > drives...
> >
> > So instead, I defined a full raid6 array which it is now resyncing...
> > My resync speed is rather slow (6000K/sec). I'll have to compare it to
> > resyncing a raid5 array though before concluding anything from that.  Cause
> > this system is somewhat CPU challenged indeed: a lowly celeron 500.
> 
> To confirm, after stopping the raid6 array (didn't want to wait this long) I 
> created a raid5 array on the same machine and it resyncs at 14000K/sec.
> Is this expected behaviour, the 6M/sec for raid6 vs 14M/sec for raid5 ?
> I suppose raid6 has to sync two drives, which would maybe explain the speed 
> difference(?)   In any case, hdparm -tT report 50M/sec on each single drive. 
> Is this discrepancy in speed normal ?
> (yes yes, I played with the /proc/sys/dev/raid/ speed settings (to no avail))
> 

A newly created RAID-5 array uses a special trick to do the initial
sync faster.  Unfortunately that trick is not possible for RAID-6.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-01 18:05         ` Jim Paris
  2004-08-01 22:10           ` maarten van den Berg
@ 2004-08-05 23:54           ` H. Peter Anvin
  2004-08-06  0:19             ` Jim Paris
  1 sibling, 1 reply; 26+ messages in thread
From: H. Peter Anvin @ 2004-08-05 23:54 UTC (permalink / raw)
  To: linux-raid

Followup to:  <20040801180536.GA3897@jim.sh>
By author:    Jim Paris <jim@jtan.com>
In newsgroup: linux.dev.raid
>
> > I eventually got a kernel panic when copying large amounts of data to a 
> > [degraded] raid6 array, which this time was the full 600 GB size.
> > Don't know if it is helpful to anyone but info below:
> 
> The panic is from reiserfs, and it's occuring because the FS is
> getting corrupted due to the raid6 problems.
> 

It's still very odd to me that so far the only thing that triggers
this kind of problems is reiserfs.  Either reiserfs just has a really
odd series of access patterns, or it is relying on behaviour which
isn't actually guaranteed.  I suspect the former, but it's still odd.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-05 23:54           ` H. Peter Anvin
@ 2004-08-06  0:19             ` Jim Paris
  2004-08-06  0:36               ` H. Peter Anvin
  0 siblings, 1 reply; 26+ messages in thread
From: Jim Paris @ 2004-08-06  0:19 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-raid

> > > I eventually got a kernel panic when copying large amounts of data to a 
> > > [degraded] raid6 array, which this time was the full 600 GB size.
> > > Don't know if it is helpful to anyone but info below:
> > 
> > The panic is from reiserfs, and it's occuring because the FS is
> > getting corrupted due to the raid6 problems.
> 
> It's still very odd to me that so far the only thing that triggers
> this kind of problems is reiserfs.  Either reiserfs just has a really
> odd series of access patterns, or it is relying on behaviour which
> isn't actually guaranteed.  I suspect the former, but it's still odd.

No, I did see the same corruption with XFS; it just took more work
before it would show up (ie. I couldn't get it to show up by simply
copying one huge file; I had to untar a full filesystem onto it).
So I would suspect the odd access patterns.  I could also run a test
with EXT2/3 if you'd like.  I didn't manage to trigger the corruption
directly on the md device, but my access pattern was quite simple in
that case (dd big blocks to different areas).

Are you able to reproduce the problem with the scripts I sent earlier?
If not, I can give you access to a machine that can.

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-06  0:19             ` Jim Paris
@ 2004-08-06  0:36               ` H. Peter Anvin
  2004-08-06  4:04                 ` Jim Paris
  0 siblings, 1 reply; 26+ messages in thread
From: H. Peter Anvin @ 2004-08-06  0:36 UTC (permalink / raw)
  To: Jim Paris; +Cc: linux-raid

Jim Paris wrote:
> 
> No, I did see the same corruption with XFS; it just took more work
> before it would show up (ie. I couldn't get it to show up by simply
> copying one huge file; I had to untar a full filesystem onto it).
> So I would suspect the odd access patterns.  I could also run a test
> with EXT2/3 if you'd like.  I didn't manage to trigger the corruption
> directly on the md device, but my access pattern was quite simple in
> that case (dd big blocks to different areas).
> 

If you can reproduce it with ext2/3 it would make debugging simpler, because I 
understand the ext code and data structures a lot better.

Thanks for that data element; it pretty much confirms my suspicions.

 >
> Are you able to reproduce the problem with the scripts I sent earlier?
> If not, I can give you access to a machine that can.
> 

I hate to admit it, but I haven't had a chance to try yet.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Kernel panic, FS corruption  Was: Re: Call for RAID-6 users
  2004-08-06  0:36               ` H. Peter Anvin
@ 2004-08-06  4:04                 ` Jim Paris
  0 siblings, 0 replies; 26+ messages in thread
From: Jim Paris @ 2004-08-06  4:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-raid

> If you can reproduce it with ext2/3 it would make debugging simpler, 
> because I understand the ext code and data structures a lot better.

This demonstrates it on ext2.  I can't seem to reproduce it with just
simple use of 'dd', but it shows up if I untar a ton of data.

This script:
- creates five 100MB "disks" through loopback
- puts them in a six-disk RAID-6 array (resulting size=400MB, degraded)
- untars about 350MB of data to the array
- runs e2fsck, which shows filesystem errors

Usage: 
- put r6ext.sh and big.tar.bz2 in a directory
- run r6ext.sh as root

Sorry for the huge files, but e2fsck didn't show any problems when I
scaled everything down by a factor of 10.  You could probably make
your own big.tar.bz2 and see the same problem, as there's nothing
special about this data.

  http://stonewall.mit.edu/~jim/r6ext.sh
  http://stonewall.mit.edu/~jim/big.tar.bz2 (77MB)

-jim

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2004-08-06  4:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin
2004-07-26 21:38 ` Jim Paris
2004-07-27  2:05   ` Matthew - RAID
2004-07-27  2:12     ` Jim Paris
2004-07-27 16:40       ` Ricky Beam
2004-07-27 17:20         ` Jim Paris
2004-07-27 18:19           ` Jim Paris
2004-07-27 18:48             ` Jim Paris
2004-07-28  3:09               ` Jim Paris
2004-07-28  8:36                 ` David Greaves
2004-07-28 10:02                   ` Jim Paris
2004-07-30 15:58   ` H. Peter Anvin
2004-07-30 19:39     ` Jim Paris
2004-07-30 19:45       ` H. Peter Anvin
2004-07-30 21:11 ` maarten van den Berg
2004-07-30 21:38   ` maarten van den Berg
2004-07-31  0:28     ` maarten van den Berg
2004-08-01 13:03       ` Kernel panic, FS corruption Was: " maarten van den Berg
2004-08-01 18:05         ` Jim Paris
2004-08-01 22:10           ` maarten van den Berg
2004-08-05 23:54           ` H. Peter Anvin
2004-08-06  0:19             ` Jim Paris
2004-08-06  0:36               ` H. Peter Anvin
2004-08-06  4:04                 ` Jim Paris
2004-08-05 23:51     ` H. Peter Anvin
2004-08-05 23:46   ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).