* Call for RAID-6 users @ 2004-07-23 23:32 H. Peter Anvin 2004-07-26 21:38 ` Jim Paris 2004-07-30 21:11 ` maarten van den Berg 0 siblings, 2 replies; 26+ messages in thread From: H. Peter Anvin @ 2004-07-23 23:32 UTC (permalink / raw) To: linux-raid I'm considering removing the "experimental" label from RAID-6. It appears at this point to be just as stable as RAID-5 (since it's based on RAID-5 code that's obviously all that can be expected.) Thus, if you have used RAID-6 and have good or bad experiences, I'd like to know them as soon as possible. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin @ 2004-07-26 21:38 ` Jim Paris 2004-07-27 2:05 ` Matthew - RAID 2004-07-30 15:58 ` H. Peter Anvin 2004-07-30 21:11 ` maarten van den Berg 1 sibling, 2 replies; 26+ messages in thread From: Jim Paris @ 2004-07-26 21:38 UTC (permalink / raw) To: linux-raid > Thus, if you have used RAID-6 and have good or bad experiences, I'd > like to know them as soon as possible. Just tried setting up a RAID-6 on a new server, and I'm seeing complete filesystem corruption. I have 6 250GB disks, and want them all in the array. I've created it degraded, with the first disk missing, since that disk temporarily holds the system. Using kernel 2.6.7, mdadm 1.6.0, I did something like this: # mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd{g,i,k,m,o}2 which gives me: md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1] 976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU] Then created the filesystem: # mkreiserfs /dev/md1 # reiserfsck /dev/md1 # <-- no errors # df -H # <-- shows proper size (1 TB) Then copied the system do it: # mount /dev/md1 /mnt/root # cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ; cd / # umount /mnt/root # reiserfsck /dev/md1 # <-- many, many errors There were no errors in dmesg while copying the data to the filesystem, or while running reiserfsck. The filesystem gives tons of errors if I try to use it as root. Not sure what else to try. It's easy to reproduce, and seems to fail the exact same way every time. I need this server up soon, so I may just settle for RAID5, but I can keep it around for testing for a few days. Let me know if you'd like access to the machine. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-26 21:38 ` Jim Paris @ 2004-07-27 2:05 ` Matthew - RAID 2004-07-27 2:12 ` Jim Paris 2004-07-30 15:58 ` H. Peter Anvin 1 sibling, 1 reply; 26+ messages in thread From: Matthew - RAID @ 2004-07-27 2:05 UTC (permalink / raw) To: Jim Paris, linux-raid On Mon, 26 Jul 2004 17:38:11 -0400, "Jim Paris" <jim@jtan.com> said: > > Thus, if you have used RAID-6 and have good or bad experiences, I'd > > like to know them as soon as possible. > > Just tried setting up a RAID-6 on a new server, and I'm seeing > complete filesystem corruption. > # cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ; > cd / > # umount /mnt/root > # reiserfsck /dev/md1 # <-- many, many errors My reading of things was that /proc and any in-use mount points needed to be handled specially when using tar to do the copy. Then again, the --one-file-system argument could be taking care of that; I haven't heard of using it. Is it OK to use tar on / including /proc and /dev like this? cpio, cp, tar - I've seen HOWTOs use all three to copy / ! ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 2:05 ` Matthew - RAID @ 2004-07-27 2:12 ` Jim Paris 2004-07-27 16:40 ` Ricky Beam 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-27 2:12 UTC (permalink / raw) To: Matthew - RAID; +Cc: linux-raid > > > Thus, if you have used RAID-6 and have good or bad experiences, I'd > > > like to know them as soon as possible. > > > > Just tried setting up a RAID-6 on a new server, and I'm seeing > > complete filesystem corruption. > > # cd /mnt/root ; tar --one-file-system -cf - / | tar --preserve -xvf - ; > > cd / > > # umount /mnt/root > > # reiserfsck /dev/md1 # <-- many, many errors > > My reading of things was that /proc and any in-use mount points needed > to be handled specially when using tar to do the copy. Then again, the > --one-file-system argument could be taking care of that; I haven't heard > of using it. > > Is it OK to use tar on / including /proc and /dev like this? I've done the same thing with setting up RAID-5 in the past, so the procedure should be okay. --one-file-system excludes /proc, and tar handles special files in /dev properly. I can do more specific tests (writing particular data to the disk and reading it back), but I'm not sure what patterns would be useful. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 2:12 ` Jim Paris @ 2004-07-27 16:40 ` Ricky Beam 2004-07-27 17:20 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: Ricky Beam @ 2004-07-27 16:40 UTC (permalink / raw) To: Jim Paris; +Cc: linux-raid On Mon, 26 Jul 2004, Jim Paris wrote: >tar handles special files in /dev properly. Don't bet your life on it. GNU Tar (tm) will work most of the time on most systems (read: non-linux), but NEVER assume tar will read device entries correctly. (cpio is the "correct" tool. dump is the Correct Tool (tm).) --Ricky ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 16:40 ` Ricky Beam @ 2004-07-27 17:20 ` Jim Paris 2004-07-27 18:19 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-27 17:20 UTC (permalink / raw) To: Ricky Beam; +Cc: linux-raid > >tar handles special files in /dev properly. > > Don't bet your life on it. GNU Tar (tm) will work most of the time on > most systems (read: non-linux), but NEVER assume tar will read device > entries correctly. (cpio is the "correct" tool. dump is the Correct > Tool (tm).) Whatever. My problem is with RAID-6, not tar. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 17:20 ` Jim Paris @ 2004-07-27 18:19 ` Jim Paris 2004-07-27 18:48 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-27 18:19 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 420 bytes --] > > >tar handles special files in /dev properly. > > > > Don't bet your life on it. GNU Tar (tm) will work most of the time on > > most systems (read: non-linux), but NEVER assume tar will read device > > entries correctly. (cpio is the "correct" tool. dump is the Correct > > Tool (tm).) Attached is another trace that shows this corruption on a single regular file copied with "cp" onto a fresh filesystem. -jim [-- Attachment #2: typescript --] [-- Type: text/plain, Size: 3109 bytes --] Script started on Tue Jul 27 14:11:19 2004 bucket:/# mdadm --stop /dev/md1 bucket:/# mdadm --zero-superblock /dev/hd[gikmo]2 mdadm: /dev/hdg2 does not appear to have an MD superblock. mdadm: /dev/hdi2 does not appear to have an MD superblock. mdadm: /dev/hdk2 does not appear to have an MD superblock. mdadm: /dev/hdm2 does not appear to have an MD superblock. mdadm: /dev/hdo2 does not appear to have an MD superblock. bucket:/# for i in /dev/hd[gikmo]2; do dd if=/dev/zero of=$i bs=1M count=100 ; done 100+0 records in 100+0 records out 104857600 bytes transferred in 2.967763 seconds (35332202 bytes/sec) 100+0 records in 100+0 records out 104857600 bytes transferred in 2.969712 seconds (35309012 bytes/sec) 100+0 records in 100+0 records out 104857600 bytes transferred in 2.973760 seconds (35260950 bytes/sec) 100+0 records in 100+0 records out 104857600 bytes transferred in 2.965225 seconds (35362443 bytes/sec) 100+0 records in 100+0 records out 104857600 bytes transferred in 2.976899 seconds (35223769 bytes/sec) bucket:/# mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd[gikmo]2 mdadm: array /dev/md1 started. bucket:/# mkreiserfs /dev/md1 mkreiserfs 3.6.17 (2003 www.namesys.com) A pair of credits: Continuing core development of ReiserFS is mostly paid for by Hans Reiser from money made selling licenses in addition to the GPL to companies who don't want it known that they use ReiserFS as a foundation for their proprietary product. And my lawyer asked 'People pay you money for this?'. Yup. Life is good. If you buy ReiserFS, you can focus on your value add rather than reinventing an entire FS. Vitaly Fertman wrote fsck for V3 and maintains the reiserfsprogs package now. He wrote librepair, userspace plugins repair code, fsck for V4, and worked on developing libreiser4 and userspace plugins with Umka. Guessing about desired format.. Kernel 2.6.7 is running. Format 3.6 with standard journal Count of blocks on the device: 244067328 Number of blocks consumed by mkreiserfs formatting process: 15660 Blocksize: 4096 Hash function used to sort names: "r5" Journal Size 8193 blocks (first block 18) Journal Max transaction length 1024 inode generation number: 0 UUID: 7fbea519-7d20-40dd-916c-d99136e6e347 ATTENTION: YOU SHOULD REBOOT AFTER FDISK! ALL DATA WILL BE LOST ON '/dev/md1'! Continue (y/n):y Initializing journal - 0%....20%....40%....60%....80%....100% Syncing..ok Tell your friends to use a kernel based on 2.4.18 or later, and especially not a kernel based on 2.4.9, when you use reiserFS. Have fun. ReiserFS is successfully created on /dev/md1. bucket:/# dd if=/dev/urandom of=testfile bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 17.675100 seconds (5932504 bytes/sec) bucket:/# mount /dev/md1 /mnt/root bucket:/# cp testfile /mnt/root bucket:/# umount /mnt/root bucket:/# mount /dev/md1 /mnt/root bucket:/# md5sum testfile /mnt/root/testfile 95454f5f81b58f89cdd2f6954d721302 testfile 3b9c4173f3c1c0b315938b5a864f411b /mnt/root/testfile bucket:/# exit Script done on Tue Jul 27 14:15:21 2004 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 18:19 ` Jim Paris @ 2004-07-27 18:48 ` Jim Paris 2004-07-28 3:09 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-27 18:48 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 293 bytes --] > Attached is another trace that shows this corruption on a single > regular file copied with "cp" onto a fresh filesystem. And here's a trace showing problems even without a filesystem. Writing data near the end is fatal. This does not happen if I write to the first 1G of the array. -jim [-- Attachment #2: typescript --] [-- Type: text/plain, Size: 5326 bytes --] Script started on Tue Jul 27 14:44:26 2004 bucket:~# mdadm --stop /dev/md1 bucket:~# mdadm --zero-superblock /dev/hd[gikmo]2 bucket:~# for i in /dev/hd[gikmo]2 ; do dd if=/dev/zero of=$i bs=1M count=1 ; done 1+0 records in 1+0 records out 1048576 bytes transferred in 0.032001 seconds (32766876 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.032197 seconds (32567587 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.032125 seconds (32640483 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.032064 seconds (32702568 bytes/sec) 1+0 records in 1+0 records out 1048576 bytes transferred in 0.032315 seconds (32448665 bytes/sec) bucket:~# mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd[gikmo]2 mdadm: array /dev/md1 started. bucket:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1] 976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU] md0 : active raid1 hdo1[5] hdm1[4] hdk1[3] hdi1[2] hdg1[1] 128384 blocks [6/5] [_UUUUU] unused devices: <none> bucket:~# dd if=/dev/urandom of=/dev/md1 bs=1M count=100 seek=900000 100+0 records in 100+0 records out 104857600 bytes transferred in 19.095634 seconds (5491182 bytes/sec) bucket:~# mdadm --stop /dev/md1 bucket:~# mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Invalid argument bucket:~# dmesg md: bind<hdg2> md: bind<hdi2> md: bind<hdk2> md: bind<hdm2> md: bind<hdo2> raid6: device hdo2 operational as raid disk 5 raid6: device hdm2 operational as raid disk 4 raid6: device hdk2 operational as raid disk 3 raid6: device hdi2 operational as raid disk 2 raid6: device hdg2 operational as raid disk 1 raid6: allocated 6269kB for md1 raid6: raid level 6 set md1 active with 5 out of 6 devices, algorithm 2 RAID6 conf printout: --- rd:6 wd:5 fd:1 disk 1, o:1, dev:hdg2 disk 2, o:1, dev:hdi2 disk 3, o:1, dev:hdk2 disk 4, o:1, dev:hdm2 disk 5, o:1, dev:hdo2 md: md1 stopped. md: unbind<hdo2> md: export_rdev(hdo2) md: unbind<hdm2> md: export_rdev(hdm2) md: unbind<hdk2> md: export_rdev(hdk2) md: unbind<hdi2> md: export_rdev(hdi2) md: unbind<hdg2> md: export_rdev(hdg2) md: bug in file drivers/md/md.c, line 1513 md: ********************************** md: * <COMPLETE RAID STATE PRINTOUT> * md: ********************************** md1: md0: <hdo1><hdm1><hdk1><hdi1><hdg1> md: rdev hdo1, SZ:00128384 F:0 S:1 DN:5 md: rdev superblock: md: SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab md: L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0 md: UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d898 E:0000001c D 0: DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8> D 1: DISK<N:1,hdg1(34,1),R:1,S:6> D 2: DISK<N:2,hdi1(56,1),R:2,S:6> D 3: DISK<N:3,hdk1(57,1),R:3,S:6> D 4: DISK<N:4,hdm1(88,1),R:4,S:6> D 5: DISK<N:5,hdo1(89,1),R:5,S:6> md: THIS: DISK<N:5,hdo1(89,1),R:5,S:6> md: rdev hdm1, SZ:00128384 F:0 S:1 DN:4 md: rdev superblock: md: SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab md: L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0 md: UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d895 E:0000001c D 0: DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8> D 1: DISK<N:1,hdg1(34,1),R:1,S:6> D 2: DISK<N:2,hdi1(56,1),R:2,S:6> D 3: DISK<N:3,hdk1(57,1),R:3,S:6> D 4: DISK<N:4,hdm1(88,1),R:4,S:6> D 5: DISK<N:5,hdo1(89,1),R:5,S:6> md: THIS: DISK<N:4,hdm1(88,1),R:4,S:6> md: rdev hdk1, SZ:00128384 F:0 S:1 DN:3 md: rdev superblock: md: SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab md: L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0 md: UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d874 E:0000001c D 0: DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8> D 1: DISK<N:1,hdg1(34,1),R:1,S:6> D 2: DISK<N:2,hdi1(56,1),R:2,S:6> D 3: DISK<N:3,hdk1(57,1),R:3,S:6> D 4: DISK<N:4,hdm1(88,1),R:4,S:6> D 5: DISK<N:5,hdo1(89,1),R:5,S:6> md: THIS: DISK<N:3,hdk1(57,1),R:3,S:6> md: rdev hdi1, SZ:00128384 F:0 S:1 DN:2 md: rdev superblock: md: SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab md: L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0 md: UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d871 E:0000001c D 0: DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8> D 1: DISK<N:1,hdg1(34,1),R:1,S:6> D 2: DISK<N:2,hdi1(56,1),R:2,S:6> D 3: DISK<N:3,hdk1(57,1),R:3,S:6> D 4: DISK<N:4,hdm1(88,1),R:4,S:6> D 5: DISK<N:5,hdo1(89,1),R:5,S:6> md: THIS: DISK<N:2,hdi1(56,1),R:2,S:6> md: rdev hdg1, SZ:00128384 F:0 S:1 DN:1 md: rdev superblock: md: SB: (V:0.90.0) ID:<65daa413.7da47b48.1e1593ff.5a0a11c8> CT:410564ab md: L1 S00128384 ND:5 RD:6 md0 LO:0 CS:0 md: UT:410567ad ST:1 AD:5 WD:5 FD:0 SD:0 CSUM:86d6d859 E:0000001c D 0: DISK<N:0,unknown-block(0,0)(0,0),R:0,S:8> D 1: DISK<N:1,hdg1(34,1),R:1,S:6> D 2: DISK<N:2,hdi1(56,1),R:2,S:6> D 3: DISK<N:3,hdk1(57,1),R:3,S:6> D 4: DISK<N:4,hdm1(88,1),R:4,S:6> D 5: DISK<N:5,hdo1(89,1),R:5,S:6> md: THIS: DISK<N:1,hdg1(34,1),R:1,S:6> md: ********************************** bucket:~# exit Script done on Tue Jul 27 14:44:51 2004 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-27 18:48 ` Jim Paris @ 2004-07-28 3:09 ` Jim Paris 2004-07-28 8:36 ` David Greaves 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-28 3:09 UTC (permalink / raw) To: linux-raid > And here's a trace showing problems even without a filesystem. > Writing data near the end is fatal. This does not happen if I write > to the first 1G of the array. Sorry, that test was bogus, and I needed to learn how to use mdadm. I haven't actually managed to cause corruption on a raw device with no filesystem. However, copying a single 200MB file onto Reiserfs will cause corruption. It takes a lot more work (e.g. actually copying an installed system onto it), but XFS shows eventual corruption as well, so it's not specific to the filesystem type. I see no problems if I start the array with a complete set of disks; the corruption only happens if it starts degraded (tested with both 1 and 2 disks missing, and with the missing disks being at both the beginning and the end). This happens on Linux 2.6.3 and 2.6.7, with mdadm 1.5.0 and 1.4.0, with and without CONFIG_LBD. RAID-5 works correctly in all tested configurations. I have tried varying the number of disks in the array. Interestingly, if I start it with all disks, it starts reconstructing immediately. If I start it with only one disk missing, it does not reconstruct anything. Shouldn't it be creating one of P or Q? -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-28 3:09 ` Jim Paris @ 2004-07-28 8:36 ` David Greaves 2004-07-28 10:02 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: David Greaves @ 2004-07-28 8:36 UTC (permalink / raw) To: linux-raid Jim Paris wrote: >>And here's a trace showing problems even without a filesystem. >>Writing data near the end is fatal. This does not happen if I write >>to the first 1G of the array. >> >> > >Sorry, that test was bogus, and I needed to learn how to use mdadm. >I haven't actually managed to cause corruption on a raw device with no >filesystem. However, copying a single 200MB file onto Reiserfs will >cause corruption. It takes a lot more work (e.g. actually copying an >installed system onto it), but XFS shows eventual corruption as well, >so it's not specific to the filesystem type. > >I see no problems if I start the array with a complete set of disks; >the corruption only happens if it starts degraded (tested with both 1 >and 2 disks missing, and with the missing disks being at both the >beginning and the end). This happens on Linux 2.6.3 and 2.6.7, with >mdadm 1.5.0 and 1.4.0, with and without CONFIG_LBD. RAID-5 works >correctly in all tested configurations. I have tried varying the >number of disks in the array. > > > FWIW a month or so ago I used mdadm + 2.6.4 and constructed a 5x250Gb RAID 5 array with one drive missing. When I added the missing drive and reconstruction had finished I had fs corruption. I used the reiser tools to fix it but lost an awful lot of data. I reported it in detail here [http://marc.theaimsgroup.com/?l=linux-raid&m=108687793611905&w=2] and got zero response <shrug> Since then it's been fine. I don't have much faith in it though ;) David PS around that time there was a patch [http://marc.theaimsgroup.com/?l=linux-raid&m=108635099921570&w=2] for a bug in the RAID5 resync code. it was only for the raid5.c It doesn't look raid5 algorithm specific ... :) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-28 8:36 ` David Greaves @ 2004-07-28 10:02 ` Jim Paris 0 siblings, 0 replies; 26+ messages in thread From: Jim Paris @ 2004-07-28 10:02 UTC (permalink / raw) To: David Greaves; +Cc: linux-raid Hi David, > FWIW a month or so ago I used mdadm + 2.6.4 and constructed a 5x250Gb > RAID 5 array with one drive missing. > When I added the missing drive and reconstruction had finished I had fs > corruption. > > I used the reiser tools to fix it but lost an awful lot of data. > > I reported it in detail here > [http://marc.theaimsgroup.com/?l=linux-raid&m=108687793611905&w=2] and > got zero response <shrug> Yeah, I saw that posting. For me, raid5 appears to work fine, although like you, my faith is dropping. :) For all I know, my RAID6 problems could also exist in the very-similar RAID5 code, but just not show up as often. > PS around that time there was a patch > [http://marc.theaimsgroup.com/?l=linux-raid&m=108635099921570&w=2] > for a bug in the RAID5 resync code. > it was only for the raid5.c > It doesn't look raid5 algorithm specific ... :) Thanks for the tip. Unfortunately, that fix was already in the RAID6 code in 2.6.7. Just in case, I upgraded my kernel to 2.6.8-rc2, which includes that patch, and still have the same problem. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-26 21:38 ` Jim Paris 2004-07-27 2:05 ` Matthew - RAID @ 2004-07-30 15:58 ` H. Peter Anvin 2004-07-30 19:39 ` Jim Paris 1 sibling, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2004-07-30 15:58 UTC (permalink / raw) To: linux-raid Followup to: <20040726213811.GA17363@jim.sh> By author: Jim Paris <jim@jtan.com> In newsgroup: linux.dev.raid > > > Thus, if you have used RAID-6 and have good or bad experiences, I'd > > like to know them as soon as possible. > > Just tried setting up a RAID-6 on a new server, and I'm seeing > complete filesystem corruption. > > I have 6 250GB disks, and want them all in the array. I've created it > degraded, with the first disk missing, since that disk temporarily > holds the system. > > Using kernel 2.6.7, mdadm 1.6.0, I did something like this: > > # mdadm --create /dev/md1 --level=6 --chunk=128 --raid-devices=6 missing /dev/hd{g,i,k,m,o}2 > > which gives me: > > md1 : active raid6 hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1] > 976269312 blocks level 6, 128k chunk, algorithm 2 [6/5] [_UUUUU] > Okay, found the messages... Can you create failures by creating a full array and then fail out drives? That would rule out problems with the way mdadm creates the array. **** When the array is just created, it's not synchronized!!! **** Thus, when the array is first created it needs to finish synchronizing before it's usable. My current guess based on what I've seen so far is that it's a bug in mdadm in creating arrays with exactly 1 missing drive, as opposed to a kernel bug. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 15:58 ` H. Peter Anvin @ 2004-07-30 19:39 ` Jim Paris 2004-07-30 19:45 ` H. Peter Anvin 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-07-30 19:39 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 833 bytes --] > I haven't seen any of those messages, so this is the first case > happening. I figured you were just busy, but wanted to see if anyone else could guide my debugging before my boss made me give up and do RAID-5 :) Thanks for the reply. > Can you create failures by creating a full array and then fail out > drives? That would rule out problems with the way mdadm creates the > array. Yes, same problem. If I create a full array with 6 devices, wait for it to finish the synchronizing, then fail the first drive, I see the same corruption. See attached r6test-full.sh to demonstrate. > My current guess based on what I've seen so far is that it's a bug > in mdadm in creating arrays with exactly 1 missing drive, as opposed > to a kernel bug. FWIW, this does occur with an array created with 2 missing drives, as well. -jim [-- Attachment #2: r6test-full.sh --] [-- Type: application/x-sh, Size: 1039 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 19:39 ` Jim Paris @ 2004-07-30 19:45 ` H. Peter Anvin 0 siblings, 0 replies; 26+ messages in thread From: H. Peter Anvin @ 2004-07-30 19:45 UTC (permalink / raw) To: Jim Paris; +Cc: linux-raid Jim Paris wrote: > > Yes, same problem. If I create a full array with 6 devices, wait for > it to finish the synchronizing, then fail the first drive, I see the > same corruption. See attached r6test-full.sh to demonstrate. > Okay, I will look at this next week (I'm travelling and about to step onto a plane...) -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin 2004-07-26 21:38 ` Jim Paris @ 2004-07-30 21:11 ` maarten van den Berg 2004-07-30 21:38 ` maarten van den Berg 2004-08-05 23:46 ` H. Peter Anvin 1 sibling, 2 replies; 26+ messages in thread From: maarten van den Berg @ 2004-07-30 21:11 UTC (permalink / raw) To: linux-raid On Saturday 24 July 2004 01:32, H. Peter Anvin wrote: > I'm considering removing the "experimental" label from RAID-6. It > appears at this point to be just as stable as RAID-5 (since it's based > on RAID-5 code that's obviously all that can be expected.) This encouraged me to try it today... > Thus, if you have used RAID-6 and have good or bad experiences, I'd > like to know them as soon as possible. I'm still early in the testing phase, so nothing to report as yet. But I have a question: I tried to reproduce a reported issue when creating a degraded raid6 array. But when I created a raid6 array with one disk missing, /proc/mdstat reported no resync going on. Am I not correct in assuming that raid6 with 1 missing drive should at least start resyncing the other drive(s) ? It would only be really degraded with two missing drives... So instead, I defined a full raid6 array which it is now resyncing... My resync speed is rather slow (6000K/sec). I'll have to compare it to resyncing a raid5 array though before concluding anything from that. Cause this system is somewhat CPU challenged indeed: a lowly celeron 500. I will try to run some script(s) provided on this list to see if I can reproduce anything. System info: SuSE 9.1 from DVD media, (with all updates installed _PRIOR_ to creating the array) Kernel 2.6.5-7.95 mdadm - v1.5.0 - 22 Jan 2004 Harddisks and/or controllers: one 160 GB ATA off the onboard controller (hda) two 160GB SATA off a promise 150TX2 (as sda and sdb) two 160 GB SATA off a SiI 3112 controller (as hde and hdg) Maarten -- When I answered where I wanted to go today, they just hung up -- Unknown ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 21:11 ` maarten van den Berg @ 2004-07-30 21:38 ` maarten van den Berg 2004-07-31 0:28 ` maarten van den Berg 2004-08-05 23:51 ` H. Peter Anvin 2004-08-05 23:46 ` H. Peter Anvin 1 sibling, 2 replies; 26+ messages in thread From: maarten van den Berg @ 2004-07-30 21:38 UTC (permalink / raw) To: linux-raid On Friday 30 July 2004 23:11, maarten van den Berg wrote: > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote: > I'm still early in the testing phase, so nothing to report as yet. > But I have a question: I tried to reproduce a reported issue when creating > a degraded raid6 array. But when I created a raid6 array with one disk > missing, /proc/mdstat reported no resync going on. Am I not correct in > assuming that raid6 with 1 missing drive should at least start resyncing > the other drive(s) ? It would only be really degraded with two missing > drives... > > So instead, I defined a full raid6 array which it is now resyncing... > My resync speed is rather slow (6000K/sec). I'll have to compare it to > resyncing a raid5 array though before concluding anything from that. Cause > this system is somewhat CPU challenged indeed: a lowly celeron 500. To confirm, after stopping the raid6 array (didn't want to wait this long) I created a raid5 array on the same machine and it resyncs at 14000K/sec. Is this expected behaviour, the 6M/sec for raid6 vs 14M/sec for raid5 ? I suppose raid6 has to sync two drives, which would maybe explain the speed difference(?) In any case, hdparm -tT report 50M/sec on each single drive. Is this discrepancy in speed normal ? (yes yes, I played with the /proc/sys/dev/raid/ speed settings (to no avail)) Maarten -- When I answered where I wanted to go today, they just hung up -- Unknown ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 21:38 ` maarten van den Berg @ 2004-07-31 0:28 ` maarten van den Berg 2004-08-01 13:03 ` Kernel panic, FS corruption Was: " maarten van den Berg 2004-08-05 23:51 ` H. Peter Anvin 1 sibling, 1 reply; 26+ messages in thread From: maarten van den Berg @ 2004-07-31 0:28 UTC (permalink / raw) To: linux-raid On Friday 30 July 2004 23:38, maarten van den Berg wrote: > On Friday 30 July 2004 23:11, maarten van den Berg wrote: > > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote: Again replying to myself. I have a full report now. Realizing this all took way too much time I started from scratch and defined multiple small partitions (2GB) and defined a raid6 array on one set and a raid5 array on the other. Both are full arrays; no missing drives. I used reiserfs on both. Hard- and software specs as before, back in the thread. I tested it by copying trees from / to the respective raid arrays and running md5sum on the source and the copies (and repeating after reboots). Then I went and disconnected SATA cables to get them degraded. The first cable went perfect, both arrays came up fine and a md5sum on the available files, and a new copy + md5sum on that went fine too. The second cable however, went wrong; I inadvertently moved a third cable so I was left with three missing devices, so let's skip over that: when I reattached that cable the md1 raid6 device was still fine, with two failed drives. I did the <copy new stuff, run md5sum over it> thing again. Then I reattached all cables. I did verify the md5sums before refilling the raid6 array using mdadm -a, and did that afterwards too. To my astonishment, the raid5 array was back up again. I thought raid5 with two drives missing was deactivated, but obviously things have changed now and a missing drive does not equal anymore a failed drive. I presume. /proc/mdstat just after booting looked like this: Personalities : [raid1] [raid5] [raid6] md1 : active raid6 hdg3[2] hda3[0] sda3[3] 5879424 blocks level 6, 64k chunk, algorithm 2 [5/3] [U_UU_] md2 : active raid5 hdg4[2] hde4[1] hda4[0] sda4[3] 7839232 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_] md0 : active raid1 sda1[1] hda1[0] 1574272 blocks [3/2] [UU_] The md5sums after hotadding were the same as before and verified fine. Now seen as the <disconnect cable> trick doesn't mark a drive failed, should I now repeat the tests with marking failed by either doing that through mdadm or maybe pull the cable while the system is up ? Cause I'm not totally convinced now that the array got marked degraded. I could mount it with two drives missing [raid6], but the fact that the raid5 device didn't get broken puzzles me a bit... Oh well, since I'm just experimenting I'll take the plunge anyway and pull a live cable now: ... Well, the first thing to observe is that the system becomes unresponsive immediately. New logins don't spawn, and /var/logmessages says this: kernel: ATA: abnormal status 0x7F on port 0xD481521C Now even the keyboard doesn't respond anymore... reset-button ! Upon reboot, mdadm --detail reports the missing disk as "removed", not failed. But maybe that is the same(?). Rebooting again after reattaching the cable, this time the arrays stayed degraded. I ran the ubiquitous md5sums but found nothing wrong either before hotadding the missing drives and after. So, at least in my experience raid6 works fine. Also, the problems reported with SuSE 9.1 could not be observed (probably due to updating the kernel). Moreover, it also seems the underlying SATA is stable [with these cards], which I'm very glad to notice, reading some of the stories... More version-info etcetera upon request. Maarten P.S.: My resync speed stays this low. Anything that can be done...? -- When I answered where I wanted to go today, they just hung up -- Unknown ^ permalink raw reply [flat|nested] 26+ messages in thread
* Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-07-31 0:28 ` maarten van den Berg @ 2004-08-01 13:03 ` maarten van den Berg 2004-08-01 18:05 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: maarten van den Berg @ 2004-08-01 13:03 UTC (permalink / raw) To: linux-raid On Saturday 31 July 2004 02:28, maarten van den Berg wrote: > On Friday 30 July 2004 23:38, maarten van den Berg wrote: > > On Friday 30 July 2004 23:11, maarten van den Berg wrote: > > > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote: I eventually got a kernel panic when copying large amounts of data to a [degraded] raid6 array, which this time was the full 600 GB size. Don't know if it is helpful to anyone but info below: Message from syslogd@agent2 at Sun Aug 1 08:59:28 2004 ... agent2 kernel: REISERFS: panic (device Null superblock): vs-6025: check_internal_block_head: invalid level level=58989, nr_items=6145, free_space=39964 rdkey Umount didn't work, neither did shutdown. After reset I have FS corruption, according to reiserfsck: agent2:~ # cat /proc/mdstat Personalities : [raid1] [raid6] md1 : active raid6 hdg3[3] hde3[2] hda3[0] sda3[4] sdb3[5] 618437888 blocks level 6, 64k chunk, algorithm 2 [6/5] [U_UUUU] md0 : active raid1 sdb1[2] sda1[3] hda1[0] hde1[1] hdg1[4] 1574272 blocks [3/3] [UUU] unused devices: <none> agent2:~ # reiserfsck /dev/md1 reiserfsck 3.6.13 (2003 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to reiserfs-list@namesys.com, ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md1 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Sun Aug 1 14:45:08 2004 ########### Replaying journal.. Trans replayed: mountid 10, transid 2171, desc 5755, len 30, commit 5786, next trans offset 5769 Trans replayed: mountid 10, transid 2172, desc 5787, len 14, commit 5802, next trans offset 5785 Trans replayed: mountid 10, transid 2173, desc 5803, len 23, commit 5827, next trans offset 5810 Trans replayed: mountid 10, transid 2174, desc 5828, len 27, commit 5856, next trans offset 5839 Trans replayed: mountid 10, transid 2175, desc 5857, len 25, commit 5883, next trans offset 5866 Trans replayed: mountid 10, transid 2176, desc 5884, len 27, commit 5912, next trans offset 5895 Trans replayed: mountid 10, transid 2177, desc 5913, len 26, commit 5940, next trans offset 5923 Trans replayed: mountid 10, transid 2178, desc 5941, len 24, commit 5966, next trans offset 5949 Reiserfs journal '/dev/md1' in blocks [18..8211]: 8 transactions replayed Checking internal tree../ 1 (of 2)/ 3 (of 128)/ 12 (of 170)block 67043329: The level of the node (65534) is not correct, (1) expected the problem in the internal node occured (67043329), whole subtree is skipped / 14 (of 128)/105 (of 133)block 139100161: The level of the node (65534) is not correct, (1) expected the problem in the internal node occured (139100161), whole subtree is skipped / 15 (of 128)/ 23 (of 170)block 5701633: The level of the node (44292) is not correct, (1) expected the problem in the internal node occured (5701633), whole subtree is skipped / 16 (of 128)/ 80 (of 170)block 109215745: The level of the node (65534) is not correct, (1) expected [snip much more of the same...] the problem in the internal node occured (4718593), whole subtree is skipped /120 (of 133)/ 47 (of 170)block 59801637: The level of the node (65534) is not correct, (1) expected the problem in the internal node occured (59801637), whole subtree is skipped /123 (of 133)/ 72 (of 169)block 126386304: The level of the node (4828) is not correct, (1) expected the problem in the internal node occured (126386304), whole subtree is skipped /124 (of 133)block 126386316: The level of the node (58989) is not correct, (2) expected the problem in the internal node occured (126386316), whole subtree is skipped finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 92 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Sun Aug 1 14:47:17 2004 ########### Hours before the kernel panic, during a copy, I see tons of this in syslog: Aug 1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 65534 does not match to the expected o ne 1 Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: invalid format found in block 6704 3329. Fsck? Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [130 132 0x0 SD] Aug 1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 65534 does not match to the expected o ne 1 Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: invalid format found in block 6704 3329. Fsck? Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [130 132 0x0 SD] Aug 1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 65534 does not match to the expected o ne 1 Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: invalid format found in block 6704 3329. Fsck? Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [130 132 0x0 SD] Aug 1 04:15:54 agent2 kernel: ReiserFS: warning: is_tree_node: node level 65534 does not match to the expected o ne 1 Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-5150: search_by_key: invalid format found in block 6704 3329. Fsck? Aug 1 04:15:54 agent2 kernel: ReiserFS: md1: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [130 132 0x0 SD] This lasted about a minute -last entry dated Aug 1 04:16:46- but logged thousands of lines during that. Then syslog is quiet again until the kernel panic occurs: Aug 1 08:49:55 agent2 -- MARK -- Aug 1 08:59:00 agent2 /USR/SBIN/CRON[8553]: (root) CMD ( rm -f /var/spool/ cron/lastrun/cron.hourly) Aug 1 08:59:28 agent2 kernel: REISERFS: panic (device Null superblock): vs-6025: check_internal_block_head: inva lid level level=58989, nr_items=6145, free_space=39964 rdkey Aug 1 08:59:28 agent2 kernel: ------------[ cut here ]------------ Aug 1 08:59:28 agent2 kernel: kernel BUG at fs/reiserfs/prints.c:362! Aug 1 08:59:28 agent2 kernel: invalid operand: 0000 [#1] Aug 1 08:59:28 agent2 kernel: CPU: 0 Aug 1 08:59:28 agent2 kernel: EIP: 0060:[__crc_ide_end_request +942296/1608427] Not tainted Aug 1 08:59:28 agent2 kernel: EIP: 0060:[<d48ad7c1>] Not tainted Aug 1 08:59:28 agent2 kernel: EFLAGS: 00010286 (2.6.5-7.95-default) Aug 1 08:59:28 agent2 kernel: EIP is at reiserfs_panic+0x31/0x60 [reiserfs] Aug 1 08:59:28 agent2 kernel: eax: 00000093 ebx: 00000000 ecx: 00000002 edx: d2181f38 Aug 1 08:59:28 agent2 kernel: esi: d255b000 edi: ccd43d48 ebp: 0000002a esp: c3415898 Aug 1 08:59:28 agent2 kernel: ds: 007b es: 007b ss: 0068 Aug 1 08:59:28 agent2 kernel: Process cp (pid: 8456, threadinfo=c3414000 task=d18f4700) Aug 1 08:59:29 agent2 kernel: Stack: d48c5a0c d48c34fe d48d1520 000003f0 d48ad85a 00000000 d48c5a54 ccd43d48 Aug 1 08:59:29 agent2 kernel: 000003f0 c3415924 d255b2a8 d48b161e d255b000 c4cb9800 00000000 000017d8 Aug 1 08:59:29 agent2 kernel: ccd43d48 d0a7fa3c 00000000 00000001 c3415914 c3415924 d0a7fa3c 00000001 Aug 1 08:59:29 agent2 kernel: Call Trace: Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+942449/1608427] check_internal+0x6a/0x80 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48ad85a>] check_internal+0x6a/0x80 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+958261/1608427] internal_move_pointers_items+0x1be/0x2c0 [ reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48b161e>] internal_move_pointers_items +0x1be/0x2c0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+958904/1608427] internal_shift_right+0xb1/0xd0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48b18a1>] internal_shift_right+0xb1/0xd0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+959947/1608427] balance_internal+0x174/0xae0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48b1cb4>] balance_internal+0x174/0xae0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+424174/1608427] ata_qc_issue+0xf7/0x2a0 [libata] Aug 1 08:59:29 agent2 kernel: [<d482efd7>] ata_qc_issue+0xf7/0x2a0 [libata] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+985323/1608427] get_cnode+0x14/0x70 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48b7fd4>] get_cnode+0x14/0x70 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+991353/1608427] journal_mark_dirty+0x102/0x230 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48b9762>] journal_mark_dirty+0x102/0x230 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+950897/1608427] leaf_delete_items_entirely+0x15a/0x200 [re iserfs] Aug 1 08:59:29 agent2 kernel: [<d48af95a>] leaf_delete_items_entirely +0x15a/0x200 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+950259/1608427] leaf_paste_in_buffer+0x1fc/0x320 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48af6dc>] leaf_paste_in_buffer+0x1fc/0x320 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+859729/1608427] do_balance+0x78a/0x3160 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d489953a>] do_balance+0x78a/0x3160 [reiserfs] Aug 1 08:59:29 agent2 kernel: [autoremove_wake_function+0/48] autoremove_wake_function+0x0/0x30 Aug 1 08:59:29 agent2 kernel: [<c011f1c0>] autoremove_wake_function+0x0/0x30 Aug 1 08:59:29 agent2 kernel: [submit_bh+393/544] submit_bh+0x189/0x220 Aug 1 08:59:29 agent2 kernel: [<c0159f49>] submit_bh+0x189/0x220 Aug 1 08:59:29 agent2 kernel: [__bread+81/160] __bread+0x51/0xa0 Aug 1 08:59:29 agent2 kernel: [<c015d221>] __bread+0x51/0xa0 Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+921709/1608427] get_neighbors+0xe6/0x140 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48a8756>] get_neighbors+0xe6/0x140 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+921750/1608427] get_neighbors+0x10f/0x140 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48a877f>] get_neighbors+0x10f/0x140 [reiserfs] Aug 1 08:59:29 agent2 kernel: [wake_up_buffer+5/32] wake_up_buffer+0x5/0x20 Aug 1 08:59:29 agent2 kernel: [<c015b2d5>] wake_up_buffer+0x5/0x20 Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+986558/1608427] reiserfs_prepare_for_journal+0x47/0x70 [re iserfs] Aug 1 08:59:29 agent2 kernel: [<d48b84a7>] reiserfs_prepare_for_journal +0x47/0x70 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+924363/1608427] fix_nodes+0x884/0x1ba0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48a91b4>] fix_nodes+0x884/0x1ba0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+975120/1608427] reiserfs_paste_into_item+0x1d9/0x220 [reis erfs] Aug 1 08:59:29 agent2 kernel: [<d48b57f9>] reiserfs_paste_into_item +0x1d9/0x220 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+874042/1608427] reiserfs_add_entry+0x293/0x430 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d489cd23>] reiserfs_add_entry+0x293/0x430 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+878853/1608427] reiserfs_create+0x11e/0x1e0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d489dfee>] reiserfs_create+0x11e/0x1e0 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+1016040/1608427] reiserfs_permission+0x1/0x10 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48bf7d1>] reiserfs_permission+0x1/0x10 [reiserfs] Aug 1 08:59:29 agent2 kernel: [__crc_ide_end_request+1016046/1608427] reiserfs_permission+0x7/0x10 [reiserfs] Aug 1 08:59:29 agent2 kernel: [<d48bf7d7>] reiserfs_permission+0x7/0x10 [reiserfs] Aug 1 08:59:29 agent2 kernel: [vfs_create+153/304] vfs_create+0x99/0x130 Aug 1 08:59:29 agent2 kernel: [<c01656f9>] vfs_create+0x99/0x130 Aug 1 08:59:29 agent2 kernel: [open_namei+830/1072] open_namei+0x33e/0x430 Aug 1 08:59:29 agent2 kernel: [<c016772e>] open_namei+0x33e/0x430 Aug 1 08:59:29 agent2 kernel: [filp_open+78/128] filp_open+0x4e/0x80 Aug 1 08:59:29 agent2 kernel: [<c0155b8e>] filp_open+0x4e/0x80 Aug 1 08:59:29 agent2 kernel: [sys_open+131/208] sys_open+0x83/0xd0 Aug 1 08:59:29 agent2 kernel: [<c0155c43>] sys_open+0x83/0xd0 Aug 1 08:59:29 agent2 kernel: [sysenter_past_esp+82/121] sysenter_past_esp +0x52/0x79 Aug 1 08:59:29 agent2 kernel: [<c0107dc9>] sysenter_past_esp+0x52/0x79 Aug 1 08:59:29 agent2 kernel: Aug 1 08:59:29 agent2 kernel: Code: 0f 0b 6a 01 0e 35 8c d4 b8 fe 34 8c d4 83 c4 0c 85 db 74 06 Aug 1 09:09:55 agent2 -- MARK -- Aug 1 09:29:55 agent2 -- MARK -- Maarten -- When I answered where I wanted to go today, they just hung up -- Unknown ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-01 13:03 ` Kernel panic, FS corruption Was: " maarten van den Berg @ 2004-08-01 18:05 ` Jim Paris 2004-08-01 22:10 ` maarten van den Berg 2004-08-05 23:54 ` H. Peter Anvin 0 siblings, 2 replies; 26+ messages in thread From: Jim Paris @ 2004-08-01 18:05 UTC (permalink / raw) To: maarten van den Berg; +Cc: linux-raid > I eventually got a kernel panic when copying large amounts of data to a > [degraded] raid6 array, which this time was the full 600 GB size. > Don't know if it is helpful to anyone but info below: The panic is from reiserfs, and it's occuring because the FS is getting corrupted due to the raid6 problems. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-01 18:05 ` Jim Paris @ 2004-08-01 22:10 ` maarten van den Berg 2004-08-05 23:54 ` H. Peter Anvin 1 sibling, 0 replies; 26+ messages in thread From: maarten van den Berg @ 2004-08-01 22:10 UTC (permalink / raw) To: linux-raid On Sunday 01 August 2004 20:05, you wrote: > > I eventually got a kernel panic when copying large amounts of data to a > > [degraded] raid6 array, which this time was the full 600 GB size. > > Don't know if it is helpful to anyone but info below: > > The panic is from reiserfs, and it's occuring because the FS is > getting corrupted due to the raid6 problems. Ok. Thanks. I expected as much. I will now try to make a raid 5 array instead, and make double sure I do not suffer the same fate. I am cautious because I do not want to be bitten by a bug in this SuSE kernel or a bug in one of the SATA drivers. Pity though, that this was not discovered when testing with the smallish arrays I ran yesterday. Seems like it takes a lot of beating to reproduce. Maarten > -jim -- When I answered where I wanted to go today, they just hung up -- Unknown ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-01 18:05 ` Jim Paris 2004-08-01 22:10 ` maarten van den Berg @ 2004-08-05 23:54 ` H. Peter Anvin 2004-08-06 0:19 ` Jim Paris 1 sibling, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2004-08-05 23:54 UTC (permalink / raw) To: linux-raid Followup to: <20040801180536.GA3897@jim.sh> By author: Jim Paris <jim@jtan.com> In newsgroup: linux.dev.raid > > > I eventually got a kernel panic when copying large amounts of data to a > > [degraded] raid6 array, which this time was the full 600 GB size. > > Don't know if it is helpful to anyone but info below: > > The panic is from reiserfs, and it's occuring because the FS is > getting corrupted due to the raid6 problems. > It's still very odd to me that so far the only thing that triggers this kind of problems is reiserfs. Either reiserfs just has a really odd series of access patterns, or it is relying on behaviour which isn't actually guaranteed. I suspect the former, but it's still odd. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-05 23:54 ` H. Peter Anvin @ 2004-08-06 0:19 ` Jim Paris 2004-08-06 0:36 ` H. Peter Anvin 0 siblings, 1 reply; 26+ messages in thread From: Jim Paris @ 2004-08-06 0:19 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-raid > > > I eventually got a kernel panic when copying large amounts of data to a > > > [degraded] raid6 array, which this time was the full 600 GB size. > > > Don't know if it is helpful to anyone but info below: > > > > The panic is from reiserfs, and it's occuring because the FS is > > getting corrupted due to the raid6 problems. > > It's still very odd to me that so far the only thing that triggers > this kind of problems is reiserfs. Either reiserfs just has a really > odd series of access patterns, or it is relying on behaviour which > isn't actually guaranteed. I suspect the former, but it's still odd. No, I did see the same corruption with XFS; it just took more work before it would show up (ie. I couldn't get it to show up by simply copying one huge file; I had to untar a full filesystem onto it). So I would suspect the odd access patterns. I could also run a test with EXT2/3 if you'd like. I didn't manage to trigger the corruption directly on the md device, but my access pattern was quite simple in that case (dd big blocks to different areas). Are you able to reproduce the problem with the scripts I sent earlier? If not, I can give you access to a machine that can. -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-06 0:19 ` Jim Paris @ 2004-08-06 0:36 ` H. Peter Anvin 2004-08-06 4:04 ` Jim Paris 0 siblings, 1 reply; 26+ messages in thread From: H. Peter Anvin @ 2004-08-06 0:36 UTC (permalink / raw) To: Jim Paris; +Cc: linux-raid Jim Paris wrote: > > No, I did see the same corruption with XFS; it just took more work > before it would show up (ie. I couldn't get it to show up by simply > copying one huge file; I had to untar a full filesystem onto it). > So I would suspect the odd access patterns. I could also run a test > with EXT2/3 if you'd like. I didn't manage to trigger the corruption > directly on the md device, but my access pattern was quite simple in > that case (dd big blocks to different areas). > If you can reproduce it with ext2/3 it would make debugging simpler, because I understand the ext code and data structures a lot better. Thanks for that data element; it pretty much confirms my suspicions. > > Are you able to reproduce the problem with the scripts I sent earlier? > If not, I can give you access to a machine that can. > I hate to admit it, but I haven't had a chance to try yet. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Kernel panic, FS corruption Was: Re: Call for RAID-6 users 2004-08-06 0:36 ` H. Peter Anvin @ 2004-08-06 4:04 ` Jim Paris 0 siblings, 0 replies; 26+ messages in thread From: Jim Paris @ 2004-08-06 4:04 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-raid > If you can reproduce it with ext2/3 it would make debugging simpler, > because I understand the ext code and data structures a lot better. This demonstrates it on ext2. I can't seem to reproduce it with just simple use of 'dd', but it shows up if I untar a ton of data. This script: - creates five 100MB "disks" through loopback - puts them in a six-disk RAID-6 array (resulting size=400MB, degraded) - untars about 350MB of data to the array - runs e2fsck, which shows filesystem errors Usage: - put r6ext.sh and big.tar.bz2 in a directory - run r6ext.sh as root Sorry for the huge files, but e2fsck didn't show any problems when I scaled everything down by a factor of 10. You could probably make your own big.tar.bz2 and see the same problem, as there's nothing special about this data. http://stonewall.mit.edu/~jim/r6ext.sh http://stonewall.mit.edu/~jim/big.tar.bz2 (77MB) -jim ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 21:38 ` maarten van den Berg 2004-07-31 0:28 ` maarten van den Berg @ 2004-08-05 23:51 ` H. Peter Anvin 1 sibling, 0 replies; 26+ messages in thread From: H. Peter Anvin @ 2004-08-05 23:51 UTC (permalink / raw) To: linux-raid Followup to: <200407302338.33823.maarten@ultratux.net> By author: maarten van den Berg <maarten@ultratux.net> In newsgroup: linux.dev.raid > > On Friday 30 July 2004 23:11, maarten van den Berg wrote: > > On Saturday 24 July 2004 01:32, H. Peter Anvin wrote: > > > I'm still early in the testing phase, so nothing to report as yet. > > But I have a question: I tried to reproduce a reported issue when creating > > a degraded raid6 array. But when I created a raid6 array with one disk > > missing, /proc/mdstat reported no resync going on. Am I not correct in > > assuming that raid6 with 1 missing drive should at least start resyncing > > the other drive(s) ? It would only be really degraded with two missing > > drives... > > > > So instead, I defined a full raid6 array which it is now resyncing... > > My resync speed is rather slow (6000K/sec). I'll have to compare it to > > resyncing a raid5 array though before concluding anything from that. Cause > > this system is somewhat CPU challenged indeed: a lowly celeron 500. > > To confirm, after stopping the raid6 array (didn't want to wait this long) I > created a raid5 array on the same machine and it resyncs at 14000K/sec. > Is this expected behaviour, the 6M/sec for raid6 vs 14M/sec for raid5 ? > I suppose raid6 has to sync two drives, which would maybe explain the speed > difference(?) In any case, hdparm -tT report 50M/sec on each single drive. > Is this discrepancy in speed normal ? > (yes yes, I played with the /proc/sys/dev/raid/ speed settings (to no avail)) > A newly created RAID-5 array uses a special trick to do the initial sync faster. Unfortunately that trick is not possible for RAID-6. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Call for RAID-6 users 2004-07-30 21:11 ` maarten van den Berg 2004-07-30 21:38 ` maarten van den Berg @ 2004-08-05 23:46 ` H. Peter Anvin 1 sibling, 0 replies; 26+ messages in thread From: H. Peter Anvin @ 2004-08-05 23:46 UTC (permalink / raw) To: linux-raid Followup to: <200407302311.04942.maarten@ultratux.net> By author: maarten van den Berg <maarten@ultratux.net> In newsgroup: linux.dev.raid > > I'm still early in the testing phase, so nothing to report as yet. > But I have a question: I tried to reproduce a reported issue when creating a > degraded raid6 array. But when I created a raid6 array with one disk > missing, /proc/mdstat reported no resync going on. Am I not correct in > assuming that raid6 with 1 missing drive should at least start resyncing the > other drive(s) ? It would only be really degraded with two missing drives... > This is correct; when an array is first created it needs resync, and with less than two drives missing this should happen. > So instead, I defined a full raid6 array which it is now resyncing... > My resync speed is rather slow (6000K/sec). I'll have to compare it to > resyncing a raid5 array though before concluding anything from that. Cause > this system is somewhat CPU challenged indeed: a lowly celeron 500. The RAID-6 computations on that system will be quite slow indeed. At least you have MMX. -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2004-08-06 4:04 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-07-23 23:32 Call for RAID-6 users H. Peter Anvin 2004-07-26 21:38 ` Jim Paris 2004-07-27 2:05 ` Matthew - RAID 2004-07-27 2:12 ` Jim Paris 2004-07-27 16:40 ` Ricky Beam 2004-07-27 17:20 ` Jim Paris 2004-07-27 18:19 ` Jim Paris 2004-07-27 18:48 ` Jim Paris 2004-07-28 3:09 ` Jim Paris 2004-07-28 8:36 ` David Greaves 2004-07-28 10:02 ` Jim Paris 2004-07-30 15:58 ` H. Peter Anvin 2004-07-30 19:39 ` Jim Paris 2004-07-30 19:45 ` H. Peter Anvin 2004-07-30 21:11 ` maarten van den Berg 2004-07-30 21:38 ` maarten van den Berg 2004-07-31 0:28 ` maarten van den Berg 2004-08-01 13:03 ` Kernel panic, FS corruption Was: " maarten van den Berg 2004-08-01 18:05 ` Jim Paris 2004-08-01 22:10 ` maarten van den Berg 2004-08-05 23:54 ` H. Peter Anvin 2004-08-06 0:19 ` Jim Paris 2004-08-06 0:36 ` H. Peter Anvin 2004-08-06 4:04 ` Jim Paris 2004-08-05 23:51 ` H. Peter Anvin 2004-08-05 23:46 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).