* RAID 5 - One drive dropped while replacing another [not found] <AANLkTinXTYds442gPrs9a9vKtWTo4OcDHDEzvO0njvyv@mail.gmail.com> @ 2011-02-01 23:27 ` Bryan Wintermute 2011-02-01 23:36 ` Roman Mamedov 0 siblings, 1 reply; 16+ messages in thread From: Bryan Wintermute @ 2011-02-01 23:27 UTC (permalink / raw) To: linux-raid Hi, I have a RAID5 setup with 15 drives. One drive died, so I purchased a replacement and added it to the array. During this process, however, another drive dropped. Upon further inspection, the second failed drive has some bad sectors that appears to be keeping mdadm from completing the recovery. If I remove the replacement drive, thus keeping mdadm from recovering, the array functions and most of the data is intact (save for the occasional missing files and folders). Mdadm is able to recover for about 30 seconds to 1 minute before it drops the drive and quits the recovery. Is there anything I can do to get around these bad sectors or force mdadm to ignore them to at least complete the recovery? I don't care about losing some of the data, but it'd be nice to not lose all of it. I'm running Ubuntu 10.04 LTS x64 with mdadm - v2.6.7.1 - 15th October 2008 Please let me know if you need any more information. Thank you for your time and any help, Bryan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-01 23:27 ` RAID 5 - One drive dropped while replacing another Bryan Wintermute @ 2011-02-01 23:36 ` Roman Mamedov 2011-02-02 6:20 ` Leslie Rhorer 2011-02-02 14:21 ` hansbkk 0 siblings, 2 replies; 16+ messages in thread From: Roman Mamedov @ 2011-02-01 23:36 UTC (permalink / raw) To: Bryan Wintermute; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 768 bytes --] On Tue, 1 Feb 2011 15:27:50 -0800 Bryan Wintermute <bryanwintermute@gmail.com> wrote: > I have a RAID5 setup with 15 drives. Looks like you got the problem you were so desperately asking for, with this crazy setup. :( > Is there anything I can do to get around these bad sectors or force mdadm > to ignore them to at least complete the recovery? I suppose the second failed drive is still mostly alive, just has some unreadable areas? If so, I suggest that you get another new clean drive, and while your mdadm array is stopped, copy whatever you can with e.g. dd_rescue from the semi-dead drive to this new one. Then remove the bad drive from the system, and start the array with the new drive instead of the bad one. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: RAID 5 - One drive dropped while replacing another 2011-02-01 23:36 ` Roman Mamedov @ 2011-02-02 6:20 ` Leslie Rhorer 2011-02-02 14:21 ` hansbkk 1 sibling, 0 replies; 16+ messages in thread From: Leslie Rhorer @ 2011-02-02 6:20 UTC (permalink / raw) To: 'Roman Mamedov', 'Bryan Wintermute'; +Cc: linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Roman Mamedov > Sent: Tuesday, February 01, 2011 5:36 PM > To: Bryan Wintermute > Cc: linux-raid@vger.kernel.org > Subject: Re: RAID 5 - One drive dropped while replacing another > > On Tue, 1 Feb 2011 15:27:50 -0800 > Bryan Wintermute <bryanwintermute@gmail.com> wrote: > > > I have a RAID5 setup with 15 drives. > > Looks like you got the problem you were so desperately asking for, with > this > crazy setup. :( > > > Is there anything I can do to get around these bad sectors or force > mdadm > > to ignore them to at least complete the recovery? > > I suppose the second failed drive is still mostly alive, just has some > unreadable areas? If so, I suggest that you get another new clean drive, > and > while your mdadm array is stopped, copy whatever you can with e.g. > dd_rescue > from the semi-dead drive to this new one. Then remove the bad drive from > the > system, and start the array with the new drive instead of the bad one. Before asking this, I would first ask, "How dead is the first dead drive?" Using dd_rescue on the "dead" drives might recover more data. Or not. It might be time to drag out the backups. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-01 23:36 ` Roman Mamedov 2011-02-02 6:20 ` Leslie Rhorer @ 2011-02-02 14:21 ` hansbkk 2011-02-02 14:28 ` Roman Mamedov ` (2 more replies) 1 sibling, 3 replies; 16+ messages in thread From: hansbkk @ 2011-02-02 14:21 UTC (permalink / raw) To: Roman Mamedov; +Cc: Bryan Wintermute, linux-raid On Wed, Feb 2, 2011 at 6:36 AM, Roman Mamedov <rm@romanrm.ru> wrote: > >> I have a RAID5 setup with 15 drives. > > Looks like you got the problem you were so desperately asking for, with this > crazy setup. :( Please give some more details as to what's so crazy about this. I would think RAID6 would have made more sense, possibly with an additional spare if these are large drives (over a few hundred GB?) Or is there an upper limit as to the number of drives that's advisable for any array? If so, then what do people reckon a reasonable limit should be for a RAID6 made up of 2TB drives? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 14:21 ` hansbkk @ 2011-02-02 14:28 ` Roman Mamedov 2011-02-02 15:28 ` hansbkk 2011-02-02 14:29 ` Mathias Burén 2011-02-02 14:47 ` Robin Hill 2 siblings, 1 reply; 16+ messages in thread From: Roman Mamedov @ 2011-02-02 14:28 UTC (permalink / raw) To: hansbkk; +Cc: Bryan Wintermute, linux-raid [-- Attachment #1: Type: text/plain, Size: 936 bytes --] On Wed, 2 Feb 2011 21:21:20 +0700 hansbkk@gmail.com wrote: > On Wed, Feb 2, 2011 at 6:36 AM, Roman Mamedov <rm@romanrm.ru> wrote: > > > >> I have a RAID5 setup with 15 drives. > > > > Looks like you got the problem you were so desperately asking for, with > > this crazy setup. :( > > Please give some more details as to what's so crazy about this. > > I would think RAID6 would have made more sense, possibly with an > additional spare if these are large drives (over a few hundred GB?) Exactly, RAID6 would make an order of magnitude more sense. A 15-drive RAID5 array is just one step (one drive failure) from becoming a 14-drive RAID0 array (reliability-wise). Would you also ask "what's wrong with having a 14-drive RAID0"? See the link below for some array failure probability calculations: http://louwrentius.com/blog/2010/08/raid-5-vs-raid-6-or-do-you-care-about-your-data/ -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 14:28 ` Roman Mamedov @ 2011-02-02 15:28 ` hansbkk [not found] ` <AANLkTikm5unULgkUBM__d8N9XPReu9BtjijAHt9zzvaP@mail.gmail.com> 2011-02-02 17:25 ` Leslie Rhorer 0 siblings, 2 replies; 16+ messages in thread From: hansbkk @ 2011-02-02 15:28 UTC (permalink / raw) To: Roman Mamedov, Robin Hill; +Cc: Bryan Wintermute, linux-raid On Wed, Feb 2, 2011 at 9:28 PM, Roman Mamedov <rm@romanrm.ru> wrote: > Exactly, RAID6 would make an order of magnitude more sense. > A 15-drive RAID5 array is just one step (one drive failure) from becoming a > 14-drive RAID0 array (reliability-wise). > Would you also ask "what's wrong with having a 14-drive RAID0"? Thanks Roman, I just wanted to check that's what you meant. On Wed, Feb 2, 2011 at 9:47 PM, Robin Hill <robin@robinhill.me.uk> wrote: >> Or is there an upper limit as to the number of drives that's advisable >> for any array? >> > I'm sure there's advice out there on this one - probably a recommended > minimum percentage of capacity used for redundancy. I've not looked > though - I tend to go with gut feeling & err on the side of caution. > >> If so, then what do people reckon a reasonable limit should be for a >> RAID6 made up of 2TB drives? >> > As the drive capacities go up, you need to be thinking more carefully > about redundancy - with a 2TB drive, your rebuild time is probably over > a day. Rebuild also tends to put more load on drives than normal, so is > more likely to cause a secondary (or even tertiary) failure. I'd be > looking at RAID6 regardless, and throwing in a hot spare if there's more > than 5 data drives. If there's more than 10 then I'd be going with > multiple arrays. Thanks for the detailed reply Robin. I'm also sure there's advice "out there", but I figure there's no more authoritative place to explore this topic than here; I hope people don't mind the tangent. So keeping the drive size fixed at 2TB for the sake of argument, do people agree with the following as a conservative rule of thumb? Obviously adjustable depending on financial resources available and the importance of keeping the data online, given the fact that restoring this much data from backups would take a loooong time. This example is for a money-poor environment that could live with a day or two of downtime if necessary. less than 6 drives => RAID5 6-8 drives ==> RAID6 9-12 drives ==> RAID6+spare over 12 drives, start spanning multiple arrays (I use LVM in any case) On Wed, Feb 2, 2011 at 9:29 PM, Mathias Burén <mathias.buren@gmail.com> wrote: > With 15 drives, where only 1 can fail (RAID5) without data loss.. it's > a quite high risk that 2 (or more) drives will fail within a short > period of time. If you have less drives, this chance decreases. For > large amount of drives I recommend RAID10 personally (or RAID1+0, > whichever you prefer). > > RAID6 + 1 hot spare is also nice, and cheaper. (for ~10 drives) Mathias, RAID1+0 (not talking about "mdm RAID10" here) would only protect my data if the "right pair" of drives failed at the same time, depending on luck, whereas RAID6 would allow **any** two (and RAID6+spare any *three*) drives to fail without my losing data. So I thought I'd always prefer RAID6. Or were you perhaps thinking of something fancy using "spare pools" or whatever they're called to allow for multiple spares to fill in for any failures at the underlying RAID1 layer? Now that I think about it, that seems like a good idea if it could be made to work, as the simple mirrors do repair much faster. But of course the greatly reduced usable disk space ratio make this pretty expensive. . . > with a 2TB drive, your rebuild time is probably over a day. On my lower-end systems, a RAID6 over 2TB drives takes about 10-11 hours per failed disk to rebuild, and that's using embedded bitmaps and with nothing else going on. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <AANLkTikm5unULgkUBM__d8N9XPReu9BtjijAHt9zzvaP@mail.gmail.com>]
* Re: RAID 5 - One drive dropped while replacing another [not found] ` <AANLkTikm5unULgkUBM__d8N9XPReu9BtjijAHt9zzvaP@mail.gmail.com> @ 2011-02-02 16:29 ` hansbkk 2011-02-02 21:15 ` David Brown 0 siblings, 1 reply; 16+ messages in thread From: hansbkk @ 2011-02-02 16:29 UTC (permalink / raw) To: Scott E. Armitage; +Cc: Roman Mamedov, Robin Hill, Bryan Wintermute, linux-raid On Wed, Feb 2, 2011 at 11:03 PM, Scott E. Armitage <launchpad@scott.armitage.name> wrote: > RAID1+0 can lose up to half the drives in the array, as long as no single > mirror loses all it's drives. Instead of only being able to survive "the > right pair", it's quite the opposite: RAID1+0 will only fail if "the wrong > pair" of drives fail. AFAICT it''s a glass half-full/half-empty thing. Maybe it's just my personality, but I don't like leaving such things to chance. Maybe if I had more than two drives per array, but that would be **very** inefficient (ie expensive usable space ratio). However, following up on the "spare-group" idea, I'd like confirmation please that this scenario would work: From the man page: mdadm may move a spare drive from one array to another if they are in the same spare-group and if the destination array has a failed drive but no spares. Given all component drives are the same size, mdadm.conf contains ARRAY /dev/md0 level=raid1 num-devices=2 spare-group=bigraid10 ARRAY /dev/md1 level=raid1 num-device=2 spare-group=bigraid10 etc I then add any number of spares to any of the RAID1 arrays (which under RAID 1+0 would be in turn components of the RAID0 span one layer up - personally I'd use LVM for this) the follow/monitor mode feature would allocate these spares as whatever RAID1 array needed them. Does this make sense? If so I would recognize this as being more fault-tolerant than RAID6, with the big advantage being fast rebuild times - performance advantages too, especially on writes - but obviously at a relatively higher cost. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 16:29 ` hansbkk @ 2011-02-02 21:15 ` David Brown 0 siblings, 0 replies; 16+ messages in thread From: David Brown @ 2011-02-02 21:15 UTC (permalink / raw) To: linux-raid On 02/02/11 17:29, hansbkk@gmail.com wrote: > On Wed, Feb 2, 2011 at 11:03 PM, Scott E. Armitage > <launchpad@scott.armitage.name> wrote: >> RAID1+0 can lose up to half the drives in the array, as long as no single >> mirror loses all it's drives. Instead of only being able to survive "the >> right pair", it's quite the opposite: RAID1+0 will only fail if "the wrong >> pair" of drives fail. > > AFAICT it''s a glass half-full/half-empty thing. Maybe it's just my > personality, but I don't like leaving such things to chance. Maybe if > I had more than two drives per array, but that would be **very** > inefficient (ie expensive usable space ratio). > > However, following up on the "spare-group" idea, I'd like confirmation > please that this scenario would work: > > From the man page: > > mdadm may move a spare drive from one array to another if they are in > the same spare-group and if the destination array has a failed drive > but no spares. > > Given all component drives are the same size, mdadm.conf contains > > ARRAY /dev/md0 level=raid1 num-devices=2 spare-group=bigraid10 > ARRAY /dev/md1 level=raid1 num-device=2 spare-group=bigraid10 > etc > > I then add any number of spares to any of the RAID1 arrays (which > under RAID 1+0 would be in turn components of the RAID0 span one layer > up - personally I'd use LVM for this) the follow/monitor mode feature > would allocate these spares as whatever RAID1 array needed them. > > Does this make sense? > > If so I would recognize this as being more fault-tolerant than RAID6, > with the big advantage being fast rebuild times - performance > advantages too, especially on writes - but obviously at a relatively > higher cost. You have to be precise about what you mean by fault-tolerant. With RAID6, /any/ two drives can fail and your system is still running. Hot spares don't change that - they just minimise the time before one of the failed drives is replaced. If you have a set of RAID1 pairs that are striped together (by LVM or RAID0), then you can only tolerate a single failed drive. You /might/ tolerate more failures. For example, if you have 4 pairs, then a random second failure has a 7/8 chance of being on a different pair, and therefore safe. If you crunch the numbers, it's possible that the average or expected number of failures you can tolerate is more than 2. But for the guaranteed worst-case scenario, your set can only tolerate a single drive failure. Again, hot spares don't change that - they only reduce your degraded (and therefore risky) time. ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: RAID 5 - One drive dropped while replacing another 2011-02-02 15:28 ` hansbkk [not found] ` <AANLkTikm5unULgkUBM__d8N9XPReu9BtjijAHt9zzvaP@mail.gmail.com> @ 2011-02-02 17:25 ` Leslie Rhorer 2011-02-02 17:51 ` hansbkk 1 sibling, 1 reply; 16+ messages in thread From: Leslie Rhorer @ 2011-02-02 17:25 UTC (permalink / raw) To: hansbkk; +Cc: linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of hansbkk@gmail.com > Sent: Wednesday, February 02, 2011 9:29 AM > To: Roman Mamedov; Robin Hill > Cc: Bryan Wintermute; linux-raid@vger.kernel.org > Subject: Re: RAID 5 - One drive dropped while replacing another > > On Wed, Feb 2, 2011 at 9:28 PM, Roman Mamedov <rm@romanrm.ru> wrote: > > > Exactly, RAID6 would make an order of magnitude more sense. > > A 15-drive RAID5 array is just one step (one drive failure) from > becoming a > > 14-drive RAID0 array (reliability-wise). > > > Would you also ask "what's wrong with having a 14-drive RAID0"? > > Thanks Roman, I just wanted to check that's what you meant. > > > On Wed, Feb 2, 2011 at 9:47 PM, Robin Hill <robin@robinhill.me.uk> wrote: > > >> Or is there an upper limit as to the number of drives that's advisable > >> for any array? > >> > > I'm sure there's advice out there on this one - probably a recommended > > minimum percentage of capacity used for redundancy. I've not looked > > though - I tend to go with gut feeling & err on the side of caution. > > > >> If so, then what do people reckon a reasonable limit should be for a > >> RAID6 made up of 2TB drives? That depends on many factors. The bottom line question is, "how safe does the live system need to be?" If taking the system down to recover from backups is not an unreasonable liability, then there is no hard limit to the number of drives. For that matter, if being down for a long period is not considered an unacceptable hardship, or if one is running high availability mirrored systems, then a 20 disk RAID0 might be reasonable. > > As the drive capacities go up, you need to be thinking more carefully > > about redundancy - with a 2TB drive, your rebuild time is probably over > > a day. Rebuild also tends to put more load on drives than normal, so is > > more likely to cause a secondary (or even tertiary) failure. I'd be > > looking at RAID6 regardless, and throwing in a hot spare if there's more > > than 5 data drives. If there's more than 10 then I'd be going with > > multiple arrays. > > Thanks for the detailed reply Robin. I'm also sure there's advice "out > there", but I figure there's no more authoritative place to explore > this topic than here; I hope people don't mind the tangent. > > So keeping the drive size fixed at 2TB for the sake of argument, do > people agree with the following as a conservative rule of thumb? > Obviously adjustable depending on financial resources available and > the importance of keeping the data online, given the fact that > restoring this much data from backups would take a loooong time. This > example is for a money-poor environment that could live with a day or > two of downtime if necessary. > > less than 6 drives => RAID5 > 6-8 drives ==> RAID6 > 9-12 drives ==> RAID6+spare > over 12 drives, start spanning multiple arrays (I use LVM in any case) That's pretty conservative, yes, for middle of the road availability. For a system whose necessary availability is not too high, it is considerable overkill. For a system whose availability is critical, it's not conservative enough. > On Wed, Feb 2, 2011 at 9:29 PM, Mathias Burén <mathias.buren@gmail.com> > wrote: > > With 15 drives, where only 1 can fail (RAID5) without data loss.. it's > > a quite high risk that 2 (or more) drives will fail within a short > > period of time. If you have less drives, this chance decreases. For > > large amount of drives I recommend RAID10 personally (or RAID1+0, > > whichever you prefer). > > > > RAID6 + 1 hot spare is also nice, and cheaper. (for ~10 drives) > > Mathias, RAID1+0 (not talking about "mdm RAID10" here) would only > protect my data if the "right pair" of drives failed at the same time, That assumes the RAID1 array elements only have 2 members. With 3 members, the reliability goes way up. Of course, so does the cost. > depending on luck, whereas RAID6 would allow **any** two (and > RAID6+spare any *three*) drives to fail without my losing data. So I That's specious. RAID6 + spare only allows two overlapping failures. If the failures don't overlap, the even RAID5 without a spare can tolerate an unlimited number of failures. All the hot spare does is allow for immediate instigation of the rebuild, reducing the probability of a drive failure during the period of degradation. It doesn't increase the resiliency of the array. > thought I'd always prefer RAID6. Or were you perhaps thinking of > something fancy using "spare pools" or whatever they're called to > allow for multiple spares to fill in for any failures at the > underlying RAID1 layer? Now that I think about it, that seems like a > good idea if it could be made to work, as the simple mirrors do repair > much faster. But of course the greatly reduced usable disk space ratio > make this pretty expensive. . . There are lots of strategies for increasing resiliency. The compromise is always a three-way competition between cost, speed, and availability. > > with a 2TB drive, your rebuild time is probably over a day. > > On my lower-end systems, a RAID6 over 2TB drives takes about 10-11 > hours per failed disk to rebuild, and that's using embedded bitmaps > and with nothing else going on. I've never had one rebuild from a bare drive that fast. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 17:25 ` Leslie Rhorer @ 2011-02-02 17:51 ` hansbkk 2011-02-02 20:56 ` Leslie Rhorer 0 siblings, 1 reply; 16+ messages in thread From: hansbkk @ 2011-02-02 17:51 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid Thanks for your considered response Leslie On Thu, Feb 3, 2011 at 12:25 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote: >> So keeping the drive size fixed at 2TB for the sake of argument, do >> people agree with the following as a conservative rule of thumb? >> Obviously adjustable depending on financial resources available and >> the importance of keeping the data online, given the fact that >> restoring this much data from backups would take a loooong time. This >> example is for a money-poor environment that could live with a day or >> two of downtime if necessary. >> >> less than 6 drives => RAID5 >> 6-8 drives ==> RAID6 >> 9-12 drives ==> RAID6+spare >> over 12 drives, start spanning multiple arrays (I use LVM in any case) > > That's pretty conservative, yes, for middle of the road > availability. For a system whose necessary availability is not too high, it > is considerable overkill. For a system whose availability is critical, it's > not conservative enough. So maybe for my "money-poor environment that could live with a day or two of downtime" I'll add a drive or two as my own personal rule of thumb. Thanks for the feedback. > That assumes the RAID1 array elements only have 2 members. With 3 > members, the reliability goes way up. Of course, so does the cost. Prohibitively so for my situation, at least for the large storage volumes. My OS boot partitions are replicated on every drive, so some of them have 20+ members, but at 10GB per 2TB, not expensive 8-) >> depending on luck, whereas RAID6 would allow **any** two (and >> RAID6+spare any *three*) drives to fail without my losing data. So I > > That's specious. RAID6 + spare only allows two overlapping failures. Well yes, but my environment doesn't have pager notification to a "hot-spare sysadmin" standing by ready to jump in. In fact the replacement hardware would need to be requested/purchase-ordered etc, so in that case the spare does make a difference to resilience doesn't it. If I had the replacement drive handy I'd just make it a hot spare rather than keeping it on the shelf anyway. >> On my lower-end systems, a RAID6 over 2TB drives takes about 10-11 >> hours per failed disk to rebuild, and that's using embedded bitmaps >> and with nothing else going on. > > I've never had one rebuild from a bare drive that fast. This wasn't a bare drive, but a re-add after I'd been doing some grub2 and maintenance work on another array from SystemRescueCD, not sure why the member failed. It's not a particularly fast platform, consumer SATA2 Hitachi drives attached to mobo Intel controller, ICH7 I believe. cat /proc/mdstat was showing around 60k, while the RAID1s rebuild at around double that. Would the fact that it was at less than 30% capacity make a difference? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: RAID 5 - One drive dropped while replacing another 2011-02-02 17:51 ` hansbkk @ 2011-02-02 20:56 ` Leslie Rhorer 0 siblings, 0 replies; 16+ messages in thread From: Leslie Rhorer @ 2011-02-02 20:56 UTC (permalink / raw) To: hansbkk; +Cc: linux-raid > > That's pretty conservative, yes, for middle of the road > > availability. For a system whose necessary availability is not too > high, it > > is considerable overkill. For a system whose availability is critical, > it's > > not conservative enough. > > So maybe for my "money-poor environment that could live with a day or > two of downtime" I'll add a drive or two as my own personal rule of > thumb. Thanks for the feedback. Yeah, as long as you have good backups for all critical data and you can live with the thought of being at least partially down for an extended period while the data restores from your backups, then whatever level of redundancy you can afford should be fine. If it is critical you be able to have some portion of the data immediately available at all times, then a more aggressive (and probably expensive) solution is in order. You might also have to give up some performance for greater reliability. > > That assumes the RAID1 array elements only have 2 members. With > 3 > > members, the reliability goes way up. Of course, so does the cost. > > Prohibitively so for my situation, at least for the large storage > volumes. I can relate. > My OS boot partitions are replicated on every drive, so some > of them have 20+ members, but at 10GB per 2TB, not expensive 8-) That really sounds like overkill. I'm a big fan of keeping my data storage completely separate from my boot media. Indeed, the boot drives are internal to the servers, while the data drives reside in RAID chassis. Small drives are really cheap (in my case, free), so I simply chunked a couple of old drives into each server chassis, partitioned them into root, boot, and swap partitions, and bound them into three RAID1 arrays. Every few days, I back up the files on the / and /boot arrays onto the data array using tar. Rsync would also be a good candidate. The space utilization, as you say, is pretty trivial, as is the drain on the server resources. > >> depending on luck, whereas RAID6 would allow **any** two (and > >> RAID6+spare any *three*) drives to fail without my losing data. So I > > > > That's specious. RAID6 + spare only allows two overlapping > failures. > > Well yes, but my environment doesn't have pager notification to a > "hot-spare sysadmin" standing by ready to jump in. In fact the > replacement hardware would need to be requested/purchase-ordered etc, > so in that case the spare does make a difference to resilience doesn't > it. It improves availability, not really resilience. It also of course impacts reliability. > If I had the replacement drive handy I'd just make it a hot spare > rather than keeping it on the shelf anyway. Oh, yeah, I'm not disparaging the hot spare. It's just if two members suffer overlapping failures, then the array is without any redundancy until the resync is complete. > >> On my lower-end systems, a RAID6 over 2TB drives takes about 10-11 > >> hours per failed disk to rebuild, and that's using embedded bitmaps > >> and with nothing else going on. > > > > I've never had one rebuild from a bare drive that fast. > > This wasn't a bare drive, but a re-add after I'd been doing some grub2 > and maintenance work on another array from SystemRescueCD, not sure > why the member failed. That's a very different matter. A re-add of a failed member can be very brief. When the system has to read every last byte of data from the live array, calculate parity, and then write the calculated data back to the blank drive, it can easily take close to or even more than a day per TB, depending on the system load. > It's not a particularly fast platform, consumer > SATA2 Hitachi drives attached to mobo Intel controller, ICH7 I > believe. cat /proc/mdstat was showing around 60k, while the RAID1s > rebuild at around double that. > > Would the fact that it was at less than 30% capacity make a difference? That, I'm not sure. I'm not certain of the mechanics of mdadm, and I have never done a drive replacement on a system that lightly filled. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 14:21 ` hansbkk 2011-02-02 14:28 ` Roman Mamedov @ 2011-02-02 14:29 ` Mathias Burén 2011-02-02 14:47 ` Robin Hill 2 siblings, 0 replies; 16+ messages in thread From: Mathias Burén @ 2011-02-02 14:29 UTC (permalink / raw) To: hansbkk; +Cc: Roman Mamedov, Bryan Wintermute, linux-raid On 2 February 2011 14:21, <hansbkk@gmail.com> wrote: > On Wed, Feb 2, 2011 at 6:36 AM, Roman Mamedov <rm@romanrm.ru> wrote: >> >>> I have a RAID5 setup with 15 drives. >> >> Looks like you got the problem you were so desperately asking for, with this >> crazy setup. :( > > Please give some more details as to what's so crazy about this. > > I would think RAID6 would have made more sense, possibly with an > additional spare if these are large drives (over a few hundred GB?) > > Or is there an upper limit as to the number of drives that's advisable > for any array? > > If so, then what do people reckon a reasonable limit should be for a > RAID6 made up of 2TB drives? > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > With 15 drives, where only 1 can fail (RAID5) without data loss.. it's a quite high risk that 2 (or more) drives will fail within a short period of time. If you have less drives, this chance decreases. For large amount of drives I recommend RAID10 personally (or RAID1+0, whichever you prefer). RAID6 + 1 hot spare is also nice, and cheaper. (for ~10 drives) // Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 14:21 ` hansbkk 2011-02-02 14:28 ` Roman Mamedov 2011-02-02 14:29 ` Mathias Burén @ 2011-02-02 14:47 ` Robin Hill 2011-02-02 16:24 ` David Brown 2 siblings, 1 reply; 16+ messages in thread From: Robin Hill @ 2011-02-02 14:47 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1943 bytes --] On Wed Feb 02, 2011 at 09:21:20PM +0700, hansbkk@gmail.com wrote: > On Wed, Feb 2, 2011 at 6:36 AM, Roman Mamedov <rm@romanrm.ru> wrote: > > > >> I have a RAID5 setup with 15 drives. > > > > Looks like you got the problem you were so desperately asking for, with this > > crazy setup. :( > > Please give some more details as to what's so crazy about this. > Just the number of drives in a single RAID5 array I think. I'd be looking at RAID6 well before I got to 10 drives. > I would think RAID6 would have made more sense, possibly with an > additional spare if these are large drives (over a few hundred GB?) > With 15, RAID6 + spare would probably be what I'd go with (depending on drive size of course, and whether you have cold spares handy). For very large drives, multiple arrays would be safer. > Or is there an upper limit as to the number of drives that's advisable > for any array? > I'm sure there's advice out there on this one - probably a recommended minimum percentage of capacity used for redundancy. I've not looked though - I tend to go with gut feeling & err on the side of caution. > If so, then what do people reckon a reasonable limit should be for a > RAID6 made up of 2TB drives? > As the drive capacities go up, you need to be thinking more carefully about redundancy - with a 2TB drive, your rebuild time is probably over a day. Rebuild also tends to put more load on drives than normal, so is more likely to cause a secondary (or even tertiary) failure. I'd be looking at RAID6 regardless, and throwing in a hot spare if there's more than 5 data drives. If there's more than 10 then I'd be going with multiple arrays. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 14:47 ` Robin Hill @ 2011-02-02 16:24 ` David Brown 2011-02-02 16:48 ` hansbkk 0 siblings, 1 reply; 16+ messages in thread From: David Brown @ 2011-02-02 16:24 UTC (permalink / raw) To: linux-raid On 02/02/2011 15:47, Robin Hill wrote: > On Wed Feb 02, 2011 at 09:21:20PM +0700, hansbkk@gmail.com wrote: > >> On Wed, Feb 2, 2011 at 6:36 AM, Roman Mamedov<rm@romanrm.ru> wrote: >>> >>>> I have a RAID5 setup with 15 drives. >>> >>> Looks like you got the problem you were so desperately asking for, with this >>> crazy setup. :( >> >> Please give some more details as to what's so crazy about this. >> > Just the number of drives in a single RAID5 array I think. I'd be > looking at RAID6 well before I got to 10 drives. > >> I would think RAID6 would have made more sense, possibly with an >> additional spare if these are large drives (over a few hundred GB?) >> > With 15, RAID6 + spare would probably be what I'd go with (depending on > drive size of course, and whether you have cold spares handy). For very > large drives, multiple arrays would be safer. > >> Or is there an upper limit as to the number of drives that's advisable >> for any array? >> > I'm sure there's advice out there on this one - probably a recommended > minimum percentage of capacity used for redundancy. I've not looked > though - I tend to go with gut feeling& err on the side of caution. > >> If so, then what do people reckon a reasonable limit should be for a >> RAID6 made up of 2TB drives? >> > As the drive capacities go up, you need to be thinking more carefully > about redundancy - with a 2TB drive, your rebuild time is probably over > a day. Rebuild also tends to put more load on drives than normal, so is > more likely to cause a secondary (or even tertiary) failure. I'd be > looking at RAID6 regardless, and throwing in a hot spare if there's more > than 5 data drives. If there's more than 10 then I'd be going with > multiple arrays. > If the load due to rebuild is a problem, it can make sense to split the raid into parts. If you've got the money, you can start with a set of raid1 pairs and then build raid5 (or even raid6) on top of that. With raid 1 + 5, you can survive any 3 drive failures, and generally more than that unless you are very unlucky in the combinations. However, rebuilds are very fast - they are just a direct copy from one disk to its neighbour, and thus are less of a load on the rest of the system. Of course, there is a cost - if you have 15 2TB drives, with one being a warm spare shared amongst the raid1 pairs, you have only 6 x 2TB storage. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 16:24 ` David Brown @ 2011-02-02 16:48 ` hansbkk 2011-02-02 21:22 ` David Brown 0 siblings, 1 reply; 16+ messages in thread From: hansbkk @ 2011-02-02 16:48 UTC (permalink / raw) To: David Brown; +Cc: linux-raid On Wed, Feb 2, 2011 at 11:24 PM, David Brown <david@westcontrol.com> wrote: > Of course, there is a cost - if you have 15 2TB drives, with one being a > warm spare shared amongst the raid1 pairs, you have only 6 x 2TB storage. Please correct me if I got the math (or anything else) wrong. If one didn't implement RAID5 (rather just using RAID0 or LVM) over the RAID1s, and were OK with the 6x2TB usable space out of 15 drives, then you'd have three spares (actually hot spares right?), allowing for *any four* drives to fail (but still as long as it wasn't a matched pair within the _much shorter_ rebuild time window.) While a RAID6+spare, where any three can fail (and unlike the above any two can fail within the (admittedly longer) rebuild window, gives *double* the usable space - 12x2TB. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: RAID 5 - One drive dropped while replacing another 2011-02-02 16:48 ` hansbkk @ 2011-02-02 21:22 ` David Brown 0 siblings, 0 replies; 16+ messages in thread From: David Brown @ 2011-02-02 21:22 UTC (permalink / raw) To: linux-raid On 02/02/11 17:48, hansbkk@gmail.com wrote: > On Wed, Feb 2, 2011 at 11:24 PM, David Brown<david@westcontrol.com> wrote: >> Of course, there is a cost - if you have 15 2TB drives, with one being a >> warm spare shared amongst the raid1 pairs, you have only 6 x 2TB storage. > > Please correct me if I got the math (or anything else) wrong. > > If one didn't implement RAID5 (rather just using RAID0 or LVM) over > the RAID1s, and were OK with the 6x2TB usable space out of 15 drives, > then you'd have three spares (actually hot spares right?), allowing > for *any four* drives to fail (but still as long as it wasn't a > matched pair within the _much shorter_ rebuild time window.) > Your maths is okay - it's your understanding of failures that is wrong :-) If you use raid0 over your raid1 pairs, and you lose both halves of a raid1 pair, you data is gone. You can have a dozen hot spares if you want, it still doesn't matter. The point of having multiple redundancy is to handle multiple failures at the same time, or while doing a rebuild. Hot spares are there so that a rebuild can start automatically without someone feeding a new drive in the machine, thus minimising your risk window. But they don't improve the redundancy themselves. The /really/ good thing about a hot spare is that the administrator doesn't have to remove the faulty disk until the array is rebuilt, and safely redundant again. That way there is no disaster when he removes the wrong disk... > While a RAID6+spare, where any three can fail (and unlike the above > any two can fail within the (admittedly longer) rebuild window, gives > *double* the usable space - 12x2TB. Again, you are incorrect about the number of drives that can fail. RAID6 + spare means any /two/ drives can fail. You are correct about having a lot more usable space, but you have lower redundancy and a lot longer rebuild times. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-02-02 21:22 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <AANLkTinXTYds442gPrs9a9vKtWTo4OcDHDEzvO0njvyv@mail.gmail.com> 2011-02-01 23:27 ` RAID 5 - One drive dropped while replacing another Bryan Wintermute 2011-02-01 23:36 ` Roman Mamedov 2011-02-02 6:20 ` Leslie Rhorer 2011-02-02 14:21 ` hansbkk 2011-02-02 14:28 ` Roman Mamedov 2011-02-02 15:28 ` hansbkk [not found] ` <AANLkTikm5unULgkUBM__d8N9XPReu9BtjijAHt9zzvaP@mail.gmail.com> 2011-02-02 16:29 ` hansbkk 2011-02-02 21:15 ` David Brown 2011-02-02 17:25 ` Leslie Rhorer 2011-02-02 17:51 ` hansbkk 2011-02-02 20:56 ` Leslie Rhorer 2011-02-02 14:29 ` Mathias Burén 2011-02-02 14:47 ` Robin Hill 2011-02-02 16:24 ` David Brown 2011-02-02 16:48 ` hansbkk 2011-02-02 21:22 ` David Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).