* misunderstanding of spare and raid devices?
@ 2011-06-30 10:51 Karsten Römke
2011-06-30 10:58 ` Robin Hill
2011-06-30 11:30 ` John Robinson
0 siblings, 2 replies; 19+ messages in thread
From: Karsten Römke @ 2011-06-30 10:51 UTC (permalink / raw)
To: linux-raid
Hello,
I'm searching some hours / minutes to create a raid5 device with 4 disks and 1 spare:
I tried first with the opensuse tool but no success as I want, so I tried mdadm
Try:
mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
leads to
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0]
13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
2 spares - I don't understand that.
kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5
leads to
md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0]
13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
1 spare - but why - I expect 4 active disks and 1 spare
kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5
leads to
md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1] sda3[0]
18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
That's what I want, but I reached it more or less by random.
Where is my "think-error" (in german).
I use
kspace9:~ # mdadm --version
mdadm - v3.0.3 - 22nd October 2009
Any hints would be nice
karsten
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: misunderstanding of spare and raid devices? 2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke @ 2011-06-30 10:58 ` Robin Hill 2011-06-30 13:09 ` Karsten Römke 2011-06-30 11:30 ` John Robinson 1 sibling, 1 reply; 19+ messages in thread From: Robin Hill @ 2011-06-30 10:58 UTC (permalink / raw) To: Karsten Römke; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1386 bytes --] On Thu Jun 30, 2011 at 12:51:37 +0200, Karsten Römke wrote: > Hello, > I'm searching some hours / minutes to create a raid5 device with 4 > disks and 1 spare: > I tried first with the opensuse tool but no success as I want, so I > tried mdadm > > Try: > mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 > /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 > > leads to > Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > 2 spares - I don't understand that. > That's perfectly normal. The RAID5 array is created in degraded mode, then recovered onto the final disk. That way it becomes available for use immediately, rather than requiring all the parity to be calculated before the array is ready. As it's been started in auto-read-only mode (not sure why though) then it hasn't started recovery yet. Running "mdadm -w /dev/md0" or mounting the array will kick it into read-write mode and start the recovery process. HTH, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: misunderstanding of spare and raid devices? 2011-06-30 10:58 ` Robin Hill @ 2011-06-30 13:09 ` Karsten Römke 0 siblings, 0 replies; 19+ messages in thread From: Karsten Römke @ 2011-06-30 13:09 UTC (permalink / raw) To: linux-raid Hello, I suppose it works now. After mdadm -w /dev/md0 it starts synching. md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec thanks to both of you. Karsten Am 30.06.2011 12:58, schrieb Robin Hill: > On Thu Jun 30, 2011 at 12:51:37 +0200, Karsten Römke wrote: > >> Hello, >> I'm searching some hours / minutes to create a raid5 device with 4 >> disks and 1 spare: >> I tried first with the opensuse tool but no success as I want, so I >> tried mdadm >> >> Try: >> mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 >> /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 >> >> leads to >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] sdb2[1] sda3[0] >> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] >> >> 2 spares - I don't understand that. >> > That's perfectly normal. The RAID5 array is created in degraded mode, > then recovered onto the final disk. That way it becomes available for > use immediately, rather than requiring all the parity to be calculated > before the array is ready. As it's been started in auto-read-only mode > (not sure why though) then it hasn't started recovery yet. Running > "mdadm -w /dev/md0" or mounting the array will kick it into read-write > mode and start the recovery process. > > HTH, > Robin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? 2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke 2011-06-30 10:58 ` Robin Hill @ 2011-06-30 11:30 ` John Robinson 2011-06-30 12:32 ` Phil Turmel 1 sibling, 1 reply; 19+ messages in thread From: John Robinson @ 2011-06-30 11:30 UTC (permalink / raw) To: Karsten Römke; +Cc: linux-raid On 30/06/2011 11:51, Karsten Römke wrote: > Hello, > I'm searching some hours / minutes to create a raid5 device with 4 disks > and 1 spare: > I tried first with the opensuse tool but no success as I want, so I > tried mdadm > > Try: > mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 > /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 > > leads to > Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] > sdb2[1] sda3[0] > 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > 2 spares - I don't understand that. > > > kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3 > /dev/sdb2 /dev/sdc5 /dev/sdd5 > leads to > md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > 1 spare - but why - I expect 4 active disks and 1 spare > > kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3 > /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 > leads to > md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1] > sda3[0] > 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_] > > That's what I want, but I reached it more or less by random. > Where is my "think-error" (in german). > > I use > kspace9:~ # mdadm --version > mdadm - v3.0.3 - 22nd October 2009 > > Any hints would be nice When you create a RAID 5 array, it starts degraded, and a resync is performed from the first N-1 drives to the last one. If you create a 5-drive RAID-5, this shows up as 4 drives and a spare, but once the resync is finished it's 5 active drives. Going back to your first attempt, it'll show as 3 drives and 2 spares, but once the initial resync is finished, it'll be 4 drives and 1 spare. mdadm --detail /dev/md0 will show more information to confirm that this is what is happening. Hope this helps. Cheers, John. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? 2011-06-30 11:30 ` John Robinson @ 2011-06-30 12:32 ` Phil Turmel 2011-06-30 12:52 ` misunderstanding of spare and raid devices? - and one question more Karsten Römke 0 siblings, 1 reply; 19+ messages in thread From: Phil Turmel @ 2011-06-30 12:32 UTC (permalink / raw) To: Karsten Römke; +Cc: John Robinson, linux-raid Hi Karsten, On 06/30/2011 07:30 AM, John Robinson wrote: > On 30/06/2011 11:51, Karsten Römke wrote: >> Hello, >> I'm searching some hours / minutes to create a raid5 device with 4 disks >> and 1 spare: >> I tried first with the opensuse tool but no success as I want, so I >> tried mdadm >> >> Try: >> mdadm --create /dev/md0 --level=5 --raid-devices=4 --spare-devices=1 >> /dev/sda3 /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 >> >> leads to >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >> md0 : active (auto-read-only) raid5 sdd5[5](S) sde5[4](S) sdc5[2] >> sdb2[1] sda3[0] >> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] >> >> 2 spares - I don't understand that. Just to clarify for you, as your comment below suggests some confusion as to the role of a spare: When the resync finished on this, if you had let it, you would have had three drives' capacity, with parity interspersed, on four drives. The fifth drive would have been idle, but ready to replace any of the other four without intervention from you. >> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda3 >> /dev/sdb2 /dev/sdc5 /dev/sdd5 >> leads to >> md0 : active (auto-read-only) raid5 sdd5[4](S) sdc5[2] sdb2[1] sda3[0] >> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] >> >> 1 spare - but why - I expect 4 active disks and 1 spare >> >> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3 >> /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 >> leads to >> md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1] >> sda3[0] >> 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_] This will end up with four drives' capacity, with parity interspersed, on five drives. No spare. >> That's what I want, but I reached it more or less by random. >> Where is my "think-error" (in german). I hope this helps you decide which layout is the one you really want. If you think you want the first layout, you should also consider raid6 (dual redundancy). There's a performance penalty, but your data would be significantly safer. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 12:32 ` Phil Turmel @ 2011-06-30 12:52 ` Karsten Römke 2011-06-30 13:34 ` Phil Turmel 0 siblings, 1 reply; 19+ messages in thread From: Karsten Römke @ 2011-06-30 12:52 UTC (permalink / raw) To: linux-raid Hi Phil, your explanations are well for me. Am 30.06.2011 14:32, schrieb Phil Turmel: > Hi Karsten, > > Just to clarify for you, as your comment below suggests some confusion as to the role of a spare: > > When the resync finished on this, if you had let it, you would have had three drives' capacity, with parity interspersed, on four drives. The fifth drive would have been idle, but ready to replace any of the other four without intervention from you. Yes, I understand in exact that way - but I was wondering why I see 2 Spares > >>> 1 spare - but why - I expect 4 active disks and 1 spare >>> >>> kspace9:~ # mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda3 >>> /dev/sdb2 /dev/sdc5 /dev/sdd5 /dev/sde5 >>> leads to >>> md0 : active (auto-read-only) raid5 sde5[5](S) sdd5[3] sdc5[2] sdb2[1] >>> sda3[0] >>> 18345728 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_] > > This will end up with four drives' capacity, with parity interspersed, on five drives. No spare. > >>> That's what I want, but I reached it more or less by random. >>> Where is my "think-error" (in german). No - that's not what I want, but it seems first to be the right way. After my posting before put the raid back to lvm I do mdadm --detail and see, that the capacity cant't match, I have around 16 GB, I expected 12 GB - so I decided to stop my experiments - until I get a hint, which comes very fast. > > I hope this helps you decide which layout is the one you really want. If you think you want the first layout, you should also consider raid6 (dual redundancy). There's a performance penalty, but your data would be significantly safer. I have to say, I haved looked at raid 6 only at a glance. Are there any experiences in which percentage the performance penalty is to expect? Thanks Karsten > > Phil > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 12:52 ` misunderstanding of spare and raid devices? - and one question more Karsten Römke @ 2011-06-30 13:34 ` Phil Turmel 2011-06-30 14:05 ` Karsten Römke 2011-06-30 14:21 ` Karsten Römke 0 siblings, 2 replies; 19+ messages in thread From: Phil Turmel @ 2011-06-30 13:34 UTC (permalink / raw) To: Karsten Römke; +Cc: linux-raid On 06/30/2011 08:52 AM, Karsten Römke wrote: [...] >> This will end up with four drives' capacity, with parity interspersed, on five drives. No spare. >> >>>> That's what I want, but I reached it more or less by random. >>>> Where is my "think-error" (in german). > No - that's not what I want, but it seems first to be the right way. > After my posting before put the raid back to lvm I do mdadm --detail > and see, that the capacity cant't match, I have around 16 GB, I expected > 12 GB - so I decided to stop my experiments - until I get a hint, which > comes very fast. So the first layout is the one you wanted. Each drive is ~4GB ? Or is this just a test setup? >> I hope this helps you decide which layout is the one you really want. > If you think you want the first layout, you should also consider raid6 (dual redundancy). > There's a performance penalty, but your data would be significantly safer. > I have to say, I haved looked at raid 6 only at a glance. > Are there any experiences in which percentage the performance penalty is to expect? I don't have percentages to share, no. They would vary a lot based on number of disks and type of CPU. As an estimate though, you can expect raid6 to be about as fast as raid5 when reading from a non-degraded array. Certain read workloads could even be faster, as the data is spread over more spindles. It will be slower to write in all cases. The extra "Q" parity for raid6 is quite complex to calculate. In a single disk failure situation, both raid5 and raid6 will use the "P" parity to reconstruct the missing information, so their single-degraded read performance will be comparable. With two disk failures, raid6 performance plummets, as every read requires a complete inverse "Q" solution. Of course, two disk failures in raid5 stops your system. So running at a crawl, with data intact, is better than no data. You should also consider the odds of failure during rebuild, which is a serious concern for large raid5 arrays. This was discussed recently on this list: http://marc.info/?l=linux-raid&m=130754284831666&w=2 If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 13:34 ` Phil Turmel @ 2011-06-30 14:05 ` Karsten Römke 2011-06-30 14:21 ` Karsten Römke 1 sibling, 0 replies; 19+ messages in thread From: Karsten Römke @ 2011-06-30 14:05 UTC (permalink / raw) To: linux-raid Hi Phil > > So the first layout is the one you wanted. Each drive is ~4GB ? Or is this just a test setup? It's not a test setup. Historical reasons. I started whith Linux around 1995 and use software raid a long time. So I have this 4GB partition a long time and when I decide to upgrade storage or a hd says goodby, I use a new 4GB partition... Later I put more raid-arrays under a lvm, so I have no trouble with space on a single partition. >> Are there any experiences in which percentage the performance penalty is to expect? > > I don't have percentages to share, no. They would vary a lot based on number of disks > and type of CPU. As an estimate though, you can expect raid6 to be about as fast as > raid5 when reading from a non-degraded array. Certain read workloads could even be faster, > as the data is spread over more spindles. It will be slower to write in all cases. > The extra "Q" parity for raid6 is quite complex to calculate. In a single disk failure situation, > both raid5 and raid6 will use the "P" parity to reconstruct the missing information, so > their single-degraded read performance will be comparable. With two disk failures, > raid6 performance plummets, as every read requires a complete inverse "Q" solution. > Of course, two disk failures in raid5 stops your system. So running at a crawl, with data intact, is better than no data. That's the reason to think about a spare disc > You should also consider the odds of failure during rebuild, which is a serious concern for large raid5 arrays. > This was discussed recently on this list: > > http://marc.info/?l=linux-raid&m=130754284831666&w=2 > If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. > > Phil > I think there are free cycles, so I should try it. Thanks Karsten ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 13:34 ` Phil Turmel 2011-06-30 14:05 ` Karsten Römke @ 2011-06-30 14:21 ` Karsten Römke 2011-06-30 14:44 ` Phil Turmel 2011-06-30 21:28 ` NeilBrown 1 sibling, 2 replies; 19+ messages in thread From: Karsten Römke @ 2011-06-30 14:21 UTC (permalink / raw) To: linux-raid Hi Phil > > If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. > > Phil > I started the raid 6 array and get: Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec when I started the raid 5 array I get md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec so I have to expect a three times less write speed - or is this calculation to simple ? Karsten ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 14:21 ` Karsten Römke @ 2011-06-30 14:44 ` Phil Turmel 2011-07-02 8:34 ` Karsten Römke 2011-06-30 21:28 ` NeilBrown 1 sibling, 1 reply; 19+ messages in thread From: Phil Turmel @ 2011-06-30 14:44 UTC (permalink / raw) To: Karsten Römke; +Cc: linux-raid On 06/30/2011 10:21 AM, Karsten Römke wrote: > Hi Phil >> >> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. >> >> Phil >> > I started the raid 6 array and get: > > Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec > > when I started the raid 5 array I get > > md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec > > so I have to expect a three times less write speed - or is this calculation > to simple ? That's a bigger difference than I would have expected for resync, which works in full stripes. If you have a workload with many small random writes, this slowdown is quite possible. Is your CPU maxed out while writing to the raid6? Can you run some speed tests? dd streaming read or write in one window, with "iostat -xm 1" in another is a decent test of peak performance. bonnie++, dbench, and iozone are good for more generic workload simulation. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 14:44 ` Phil Turmel @ 2011-07-02 8:34 ` Karsten Römke 2011-07-02 9:42 ` David Brown 0 siblings, 1 reply; 19+ messages in thread From: Karsten Römke @ 2011-07-02 8:34 UTC (permalink / raw) To: Phil Turmel; +Cc: linux-raid Hi Phil, I have done some tests and appended the results, maybe they are of interest for somebody. As a conclusion I would say raid5 and raid6 make in my situation nearly no difference. Thanks to all for hints and explanation Karsten -------------------------------------------------------------------------------------- first - just copy a dir with a size of 2,9 GB, it was copied once before so I think there are still data buffered? OLDER war im Cache? hatte Dir vorher kopiert kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid5/ ; date Fr 1. Jul 16:15:57 CEST 2011 Fr 1. Jul 16:16:26 CEST 2011 kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid6/ ; date Fr 1. Jul 16:17:27 CEST 2011 Fr 1. Jul 16:17:58 CEST 2011 -------------------------------------------------------------------------------------- now a test with bonnie, I found this example online and the parameters seems senseful to me (I've never done performance tests on hd's before, so I searched for an example) bonnie, found -n 0 : file creation 0 -u 0 : root -r : memory in megabyte (calculated to 7999) -s : file size calculated to 15998 -f : fast, skip per char IO-tests -b : no write buffering -d : set directory kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" | bc -l) -f -b -d /raid5 Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP kspace9 15998M 96365 20 48302 12 149445 18 113.7 0 kspace9,15998M,,,96365,20,48302,12,,,149445,18,113.7,0,,,,,,,,,,,,, kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" | bc -l) -f -b -d /raid6 Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP kspace9 15998M 100321 22 48617 13 131651 16 120.2 1 kspace9,15998M,,,100321,22,48617,13,,,131651,16,120.2,1,,,,,,,,,,,,, =================================================================================== results for old raid 1: a test of old raid 1 which I have done unintended, because I forgot to mount the raid array :-) mounten vergessen --------------------------------------------------------------------- -> vergleichsresultate :-) kspace9:~ # date ; cp -r /home/roemke/HHertzTex/OLDER/ /raid5/ ; date Fr 1. Jul 16:07:32 CEST 2011 ^^^ not raid 5, old raid 1, forgot to mount Fr 1. Jul 16:08:39 CEST 2011 aehnlich (similiar) test mit bonnie++ kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" | bc -l) -f -b -d /raid5 <-- not raid 5, still the older raid 1 Using uid:0, gid:0. Writing intelligently...done Rewriting...done Reading intelligently...done start 'em...done...done...done... Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP kspace9 15998M 62977 9 34410 9 101979 13 66.7 0 kspace9,15998M,,,62977,9,34410,9,,,101979,13,66.7,0,,,,,,,,,,,,, ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-02 8:34 ` Karsten Römke @ 2011-07-02 9:42 ` David Brown 0 siblings, 0 replies; 19+ messages in thread From: David Brown @ 2011-07-02 9:42 UTC (permalink / raw) To: linux-raid On 02/07/11 10:34, Karsten Römke wrote: > Hi Phil, > I have done some tests and appended the results, maybe they are of interest > for somebody. As a conclusion I would say raid5 and raid6 make in my > situation > nearly no difference. > Thanks to all for hints and explanation > Karsten > If raid6 doesn't have any noticeable performance costs compared to raid5 for your usage, then you should definitely use raid6 rather than raid5 + spare. Think of it as raid5 + spare with the rebuild done in advance! mvh., David > > > -------------------------------------------------------------------------------------- > > first - just copy a dir with a size of 2,9 GB, it was copied once before > so I > think there are still data buffered? > > OLDER war im Cache? hatte Dir vorher kopiert > kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid5/ ; date > Fr 1. Jul 16:15:57 CEST 2011 > Fr 1. Jul 16:16:26 CEST 2011 > > kspace9:~ # date ; cp -a /home/roemke/HHertzTex/OLDER/* /raid6/ ; date > Fr 1. Jul 16:17:27 CEST 2011 > Fr 1. Jul 16:17:58 CEST 2011 > > -------------------------------------------------------------------------------------- > > now a test with bonnie, I found this example online and the parameters > seems > senseful to me (I've never done performance tests on hd's before, so I > searched > for an example) > bonnie, found > -n 0 : file creation 0 > -u 0 : root > -r : memory in megabyte (calculated to 7999) > -s : file size calculated to 15998 > -f : fast, skip per char IO-tests > -b : no write buffering > -d : set directory > > kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print > $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" | > bc -l) -f -b -d /raid5 > Using uid:0, gid:0. > Writing intelligently...done > Rewriting...done > Reading intelligently...done > start 'em...done...done...done... > Version 1.03d ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > kspace9 15998M 96365 20 48302 12 149445 18 113.7 0 > kspace9,15998M,,,96365,20,48302,12,,,149445,18,113.7,0,,,,,,,,,,,,, > > kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print > $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk > '{print $2}'`*2" | bc -l) -f -b -d /raid6 > Using uid:0, gid:0. > Writing intelligently...done > Rewriting...done > Reading intelligently...done > start 'em...done...done...done... > Version 1.03d ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > kspace9 15998M 100321 22 48617 13 131651 16 120.2 1 > kspace9,15998M,,,100321,22,48617,13,,,131651,16,120.2,1,,,,,,,,,,,,, > > > =================================================================================== > > results for old raid 1: > a test of old raid 1 which I have done unintended, because I forgot to > mount > the raid array :-) > mounten vergessen > --------------------------------------------------------------------- > -> vergleichsresultate :-) > > kspace9:~ # date ; cp -r /home/roemke/HHertzTex/OLDER/ /raid5/ ; date > Fr 1. Jul 16:07:32 CEST 2011 ^^^ not raid 5, old raid 1, forgot to mount > Fr 1. Jul 16:08:39 CEST 2011 > aehnlich (similiar) > > > test mit bonnie++ > kspace9:~ # bonnie++ -n 0 -u 0 -r `free -m | grep 'Mem:' | awk '{print > $2}'` -s $(echo "scale=0;`free -m | grep 'Mem:' | awk '{print $2}'`*2" | > bc -l) -f -b -d /raid5 <-- not raid 5, still the older raid 1 > Using uid:0, gid:0. > Writing intelligently...done > Rewriting...done > Reading intelligently...done > start 'em...done...done...done... > Version 1.03d ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > kspace9 15998M 62977 9 34410 9 101979 13 66.7 0 > kspace9,15998M,,,62977,9,34410,9,,,101979,13,66.7,0,,,,,,,,,,,,, > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 14:21 ` Karsten Römke 2011-06-30 14:44 ` Phil Turmel @ 2011-06-30 21:28 ` NeilBrown 2011-07-01 7:23 ` David Brown 1 sibling, 1 reply; 19+ messages in thread From: NeilBrown @ 2011-06-30 21:28 UTC (permalink / raw) To: Karsten Römke; +Cc: linux-raid On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke <k.roemke@gmx.de> wrote: > Hi Phil > > > > If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. > > > > Phil > > > I started the raid 6 array and get: > > Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec ^^^^^^ Note: resync > > when I started the raid 5 array I get > > md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] > 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec ^^^^^^^^ Note: recovery. > > so I have to expect a three times less write speed - or is this calculation > to simple ? > You are comparing two different things, neither of which is write speed. If you want to measure write speed, you should try writing and measure that. When you create a RAID5 mdadm deliberately triggers recovery rather than resync as it is likely to be faster. This is why you see a missed device and an extra spare. I don't remember why it doesn't with RAID6. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-06-30 21:28 ` NeilBrown @ 2011-07-01 7:23 ` David Brown 2011-07-01 8:50 ` Robin Hill 0 siblings, 1 reply; 19+ messages in thread From: David Brown @ 2011-07-01 7:23 UTC (permalink / raw) To: linux-raid On 30/06/2011 23:28, NeilBrown wrote: > On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de> wrote: > >> Hi Phil >>> >>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. >>> >>> Phil >>> >> I started the raid 6 array and get: >> >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] >> 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] >> [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec > ^^^^^^ > Note: resync > >> >> when I started the raid 5 array I get >> >> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] >> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] >> [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec > ^^^^^^^^ > Note: recovery. > >> >> so I have to expect a three times less write speed - or is this calculation >> to simple ? >> > > You are comparing two different things, neither of which is write speed. > If you want to measure write speed, you should try writing and measure that. > > When you create a RAID5 mdadm deliberately triggers recovery rather than > resync as it is likely to be faster. This is why you see a missed device and > an extra spare. I don't remember why it doesn't with RAID6. > What's the difference between a "resync" and a "recovery"? Is it that a "resync" will read the whole stripe, check if it is valid, and if it is not it then generates the parity, while a "recovery" will always generate the parity? If that's the case, then one reason it might not do that with raid6 is if the code is common with the raid5 to raid6 grow case. Then a "resync" would leave the raid5 parity untouched, so that the set keeps some redundancy, whereas a "recovery" would temporarily leave the stripe unprotected. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-01 7:23 ` David Brown @ 2011-07-01 8:50 ` Robin Hill 2011-07-01 10:18 ` David Brown 0 siblings, 1 reply; 19+ messages in thread From: Robin Hill @ 2011-07-01 8:50 UTC (permalink / raw) To: David Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2690 bytes --] On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote: > On 30/06/2011 23:28, NeilBrown wrote: > > On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de> wrote: > > > >> Hi Phil > >>> > >>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. > >>> > >>> Phil > >>> > >> I started the raid 6 array and get: > >> > >> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] > >> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] > >> 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > >> [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec > > ^^^^^^ > > Note: resync > > > >> > >> when I started the raid 5 array I get > >> > >> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] > >> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > >> [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec > > ^^^^^^^^ > > Note: recovery. > > > >> > >> so I have to expect a three times less write speed - or is this calculation > >> to simple ? > >> > > > > You are comparing two different things, neither of which is write speed. > > If you want to measure write speed, you should try writing and measure that. > > > > When you create a RAID5 mdadm deliberately triggers recovery rather than > > resync as it is likely to be faster. This is why you see a missed device and > > an extra spare. I don't remember why it doesn't with RAID6. > > > > What's the difference between a "resync" and a "recovery"? Is it that a > "resync" will read the whole stripe, check if it is valid, and if it is > not it then generates the parity, while a "recovery" will always > generate the parity? > From the names, recovery would mean that it's reading from N-1 disks, and recreating data/parity to rebuild the final disk (as when it recovers from a drive failure), whereas resync will be reading from all N disks and checking/recreating the parity (as when you're running a repair on the array). The main reason I can see for doing a resync on RAID6 rather than a recovery is if the data reconstruction from the Q parity is far slower that the construction of the Q parity itself (I've no idea how the mathematics works out for this). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-01 8:50 ` Robin Hill @ 2011-07-01 10:18 ` David Brown 2011-07-01 11:29 ` Robin Hill 0 siblings, 1 reply; 19+ messages in thread From: David Brown @ 2011-07-01 10:18 UTC (permalink / raw) To: linux-raid On 01/07/2011 10:50, Robin Hill wrote: > On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote: > >> On 30/06/2011 23:28, NeilBrown wrote: >>> On Thu, 30 Jun 2011 16:21:57 +0200 Karsten Römke<k.roemke@gmx.de> wrote: >>> >>>> Hi Phil >>>>> >>>>> If your CPU has free cycles, I suggest you run raid6 instead of raid5+spare. >>>>> >>>>> Phil >>>>> >>>> I started the raid 6 array and get: >>>> >>>> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >>>> md0 : active raid6 sde5[4] sdd5[3] sdc5[2] sdb2[1] sda3[0] >>>> 13759296 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] >>>> [=================>...] resync = 87.4% (4013184/4586432) finish=0.4min speed=20180K/sec >>> ^^^^^^ >>> Note: resync >>> >>>> >>>> when I started the raid 5 array I get >>>> >>>> md0 : active raid5 sdd5[4] sde5[5](S) sdc5[2] sdb2[1] sda3[0] >>>> 13759296 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] >>>> [=>...................] recovery = 6.2% (286656/4586432) finish=0.9min speed=71664K/sec >>> ^^^^^^^^ >>> Note: recovery. >>> >>>> >>>> so I have to expect a three times less write speed - or is this calculation >>>> to simple ? >>>> >>> >>> You are comparing two different things, neither of which is write speed. >>> If you want to measure write speed, you should try writing and measure that. >>> >>> When you create a RAID5 mdadm deliberately triggers recovery rather than >>> resync as it is likely to be faster. This is why you see a missed device and >>> an extra spare. I don't remember why it doesn't with RAID6. >>> >> >> What's the difference between a "resync" and a "recovery"? Is it that a >> "resync" will read the whole stripe, check if it is valid, and if it is >> not it then generates the parity, while a "recovery" will always >> generate the parity? >> > From the names, recovery would mean that it's reading from N-1 disks, > and recreating data/parity to rebuild the final disk (as when it > recovers from a drive failure), whereas resync will be reading from all > N disks and checking/recreating the parity (as when you're running a > repair on the array). > > The main reason I can see for doing a resync on RAID6 rather than a > recovery is if the data reconstruction from the Q parity is far slower > that the construction of the Q parity itself (I've no idea how the > mathematics works out for this). > Well, data reconstruction from Q parity /is/ more demanding than constructing the Q parity in the first place (the mathematics is the part that I know about). That's why a two-disk degraded raid6 array is significantly slower (or, more accurately, significantly more cpu-intensive) than a one-disk degraded raid6 array. But that doesn't make a difference here - you are rebuilding one or two disks, so you have to use the data you've got whether you are doing a resync or a recovery. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-01 10:18 ` David Brown @ 2011-07-01 11:29 ` Robin Hill 2011-07-01 12:45 ` David Brown 0 siblings, 1 reply; 19+ messages in thread From: Robin Hill @ 2011-07-01 11:29 UTC (permalink / raw) To: David Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2209 bytes --] On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote: > On 01/07/2011 10:50, Robin Hill wrote: > > On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote: > > > >> What's the difference between a "resync" and a "recovery"? Is it that a > >> "resync" will read the whole stripe, check if it is valid, and if it is > >> not it then generates the parity, while a "recovery" will always > >> generate the parity? > >> > > From the names, recovery would mean that it's reading from N-1 disks, > > and recreating data/parity to rebuild the final disk (as when it > > recovers from a drive failure), whereas resync will be reading from all > > N disks and checking/recreating the parity (as when you're running a > > repair on the array). > > > > The main reason I can see for doing a resync on RAID6 rather than a > > recovery is if the data reconstruction from the Q parity is far slower > > that the construction of the Q parity itself (I've no idea how the > > mathematics works out for this). > > > > Well, data reconstruction from Q parity /is/ more demanding than > constructing the Q parity in the first place (the mathematics is the > part that I know about). That's why a two-disk degraded raid6 array is > significantly slower (or, more accurately, significantly more > cpu-intensive) than a one-disk degraded raid6 array. > > But that doesn't make a difference here - you are rebuilding one or two > disks, so you have to use the data you've got whether you are doing a > resync or a recovery. > Yes, but in a resync all the data you have available is the data blocks, and you're reconstructing all the P and Q parity blocks. With a recovery, the data you have available is some of the data blocks and some of the P & Q parity blocks, so for some stripes you'll be reconstructing the parity and for others you'll be regenerating the data using the parity (and for some you'll be doing one of each). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-01 11:29 ` Robin Hill @ 2011-07-01 12:45 ` David Brown 2011-07-01 13:02 ` NeilBrown 0 siblings, 1 reply; 19+ messages in thread From: David Brown @ 2011-07-01 12:45 UTC (permalink / raw) To: linux-raid On 01/07/2011 13:29, Robin Hill wrote: > On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote: > >> On 01/07/2011 10:50, Robin Hill wrote: >>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote: >>> >>>> What's the difference between a "resync" and a "recovery"? Is it that a >>>> "resync" will read the whole stripe, check if it is valid, and if it is >>>> not it then generates the parity, while a "recovery" will always >>>> generate the parity? >>>> >>> From the names, recovery would mean that it's reading from N-1 disks, >>> and recreating data/parity to rebuild the final disk (as when it >>> recovers from a drive failure), whereas resync will be reading from all >>> N disks and checking/recreating the parity (as when you're running a >>> repair on the array). >>> >>> The main reason I can see for doing a resync on RAID6 rather than a >>> recovery is if the data reconstruction from the Q parity is far slower >>> that the construction of the Q parity itself (I've no idea how the >>> mathematics works out for this). >>> >> >> Well, data reconstruction from Q parity /is/ more demanding than >> constructing the Q parity in the first place (the mathematics is the >> part that I know about). That's why a two-disk degraded raid6 array is >> significantly slower (or, more accurately, significantly more >> cpu-intensive) than a one-disk degraded raid6 array. >> >> But that doesn't make a difference here - you are rebuilding one or two >> disks, so you have to use the data you've got whether you are doing a >> resync or a recovery. >> > Yes, but in a resync all the data you have available is the data > blocks, and you're reconstructing all the P and Q parity blocks. With a > recovery, the data you have available is some of the data blocks and some > of the P& Q parity blocks, so for some stripes you'll be reconstructing > the parity and for others you'll be regenerating the data using the > parity (and for some you'll be doing one of each). > If were that simple, then the resync (as used by RAID6 creates) would not be so much slower the recovery used in a RAID5 build... With a resync, you first check if the parity blocks are correct (by generating them from the data blocks and comparing them to the read parity blocks). If they are not correct, you write out the parity blocks. With a recovery, you /know/ that one block is incorrect and re-generate that (from the data blocks if it is a parity block, or using the parities if it is a data block). Consider the two cases raid5 and raid6 separately. When you build your raid5 array, there is nothing worth keeping in the data - the aim is simply to make the stripes consistent. There are two possible routes - consider the data blocks to be "correct" and do a resync to make sure the parity blocks match, or consider the first n-1 disks to be "correct" and do a recovery to make sure the n'th disk matches. For recovery, that means reading n-1 blocks in a stripe, doing a big xor, and writing out the remaining block (whether it is data or parity). For rsync, it means reading all n blocks, and checking the xor. If there is no match (which will be the norm when building an array), then the correct parity is calculated and written out. Thus an rsync takes longer than a recovery, and a recovery is used. When you build your raid6 array, you have the same two choices. For an rsync, you have to read all n blocks, calculate P and Q, compare them, then (as there will be no match) write out P and Q. In comparison to the raid5 recovery, you've done a couple of unnecessary block reads and compares, and the time-consuming Q calculation and write. But if you chose recovery, then you'd be assuming the first n-2 blocks are correct and re-calculating the last two blocks. This avoids the extra reads and compares, but if the two parity blocks are within the first n-2 blocks read, then the recovery calculations will be much slower. Hence an rsync is faster for raid6. I suppose the raid6 build could be optimised a little by skipping the extra reads when you know in advance that they will not match. But either that is already being done, or it is considered a small issue that is not worth changing (since it only has an effect during the initial build). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: misunderstanding of spare and raid devices? - and one question more 2011-07-01 12:45 ` David Brown @ 2011-07-01 13:02 ` NeilBrown 0 siblings, 0 replies; 19+ messages in thread From: NeilBrown @ 2011-07-01 13:02 UTC (permalink / raw) To: David Brown; +Cc: linux-raid On Fri, 01 Jul 2011 14:45:00 +0200 David Brown <david@westcontrol.com> wrote: > On 01/07/2011 13:29, Robin Hill wrote: > > On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote: > > > >> On 01/07/2011 10:50, Robin Hill wrote: > >>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote: > >>> > >>>> What's the difference between a "resync" and a "recovery"? Is it that a > >>>> "resync" will read the whole stripe, check if it is valid, and if it is > >>>> not it then generates the parity, while a "recovery" will always > >>>> generate the parity? > >>>> > >>> From the names, recovery would mean that it's reading from N-1 disks, > >>> and recreating data/parity to rebuild the final disk (as when it > >>> recovers from a drive failure), whereas resync will be reading from all > >>> N disks and checking/recreating the parity (as when you're running a > >>> repair on the array). > >>> > >>> The main reason I can see for doing a resync on RAID6 rather than a > >>> recovery is if the data reconstruction from the Q parity is far slower > >>> that the construction of the Q parity itself (I've no idea how the > >>> mathematics works out for this). > >>> > >> > >> Well, data reconstruction from Q parity /is/ more demanding than > >> constructing the Q parity in the first place (the mathematics is the > >> part that I know about). That's why a two-disk degraded raid6 array is > >> significantly slower (or, more accurately, significantly more > >> cpu-intensive) than a one-disk degraded raid6 array. > >> > >> But that doesn't make a difference here - you are rebuilding one or two > >> disks, so you have to use the data you've got whether you are doing a > >> resync or a recovery. > >> > > Yes, but in a resync all the data you have available is the data > > blocks, and you're reconstructing all the P and Q parity blocks. With a > > recovery, the data you have available is some of the data blocks and some > > of the P& Q parity blocks, so for some stripes you'll be reconstructing > > the parity and for others you'll be regenerating the data using the > > parity (and for some you'll be doing one of each). > > > > If were that simple, then the resync (as used by RAID6 creates) would > not be so much slower the recovery used in a RAID5 build... > > With a resync, you first check if the parity blocks are correct (by > generating them from the data blocks and comparing them to the read > parity blocks). If they are not correct, you write out the parity > blocks. With a recovery, you /know/ that one block is incorrect and > re-generate that (from the data blocks if it is a parity block, or using > the parities if it is a data block). > > Consider the two cases raid5 and raid6 separately. > > When you build your raid5 array, there is nothing worth keeping in the > data - the aim is simply to make the stripes consistent. There are two > possible routes - consider the data blocks to be "correct" and do a > resync to make sure the parity blocks match, or consider the first n-1 > disks to be "correct" and do a recovery to make sure the n'th disk > matches. For recovery, that means reading n-1 blocks in a stripe, doing > a big xor, and writing out the remaining block (whether it is data or > parity). For rsync, it means reading all n blocks, and checking the > xor. If there is no match (which will be the norm when building an > array), then the correct parity is calculated and written out. Thus an > rsync takes longer than a recovery, and a recovery is used. > > When you build your raid6 array, you have the same two choices. For an > rsync, you have to read all n blocks, calculate P and Q, compare them, > then (as there will be no match) write out P and Q. In comparison to > the raid5 recovery, you've done a couple of unnecessary block reads and > compares, and the time-consuming Q calculation and write. But if you > chose recovery, then you'd be assuming thve first n-2 blocks are correct > and re-calculating the last two blocks. This avoids the extra reads and > compares, but if the two parity blocks are within the first n-2 blocks > read, then the recovery calculations will be much slower. Hence an > rsync is faster for raid6. > > I suppose the raid6 build could be optimised a little by skipping the > extra reads when you know in advance that they will not match. But > either that is already being done, or it is considered a small issue > that is not worth changing (since it only has an effect during the > initial build). > Almost everything you say is correct. However I'm not convinced that a raid6 resync is faster than a raid6 recovery (on devices where P and Q are not mostly correct). I suspect it is just an historical oversight that RAID6 doesn't force a recovery for the initial create. In any one would like to test it is easy to force a recovery by specifying missing devices: mdadm -C /dev/md0 -l6 -n6 /dev/sd[abcd] missing missing -x2 /dev/sd[ef] and easy to force a resync by using --force mdadm -C /dev/md0 -l5 -n5 /dev/sd[abcde] --force It is only really a valid test if you know that the P and Q that resync will read are not going to be correct most of the time. NeilBrown ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-07-02 9:42 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke 2011-06-30 10:58 ` Robin Hill 2011-06-30 13:09 ` Karsten Römke 2011-06-30 11:30 ` John Robinson 2011-06-30 12:32 ` Phil Turmel 2011-06-30 12:52 ` misunderstanding of spare and raid devices? - and one question more Karsten Römke 2011-06-30 13:34 ` Phil Turmel 2011-06-30 14:05 ` Karsten Römke 2011-06-30 14:21 ` Karsten Römke 2011-06-30 14:44 ` Phil Turmel 2011-07-02 8:34 ` Karsten Römke 2011-07-02 9:42 ` David Brown 2011-06-30 21:28 ` NeilBrown 2011-07-01 7:23 ` David Brown 2011-07-01 8:50 ` Robin Hill 2011-07-01 10:18 ` David Brown 2011-07-01 11:29 ` Robin Hill 2011-07-01 12:45 ` David Brown 2011-07-01 13:02 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).