* Safe disk replace @ 2012-09-04 4:14 Chris Dunlop 2012-09-04 10:28 ` David Brown 0 siblings, 1 reply; 13+ messages in thread From: Chris Dunlop @ 2012-09-04 4:14 UTC (permalink / raw) To: linux-raid G'day, What is the best way to replace a fully-functional or minimally-failing (e.g. occasional bad sectors) disk in a live array whilst maintaining as much redundancy as possible during the process? It seems the standard way to replace a disk is to fail out the unwanted disk, add the new disk, then wait for the array to rebuild. However this means during the rebuild you've lost some or all of your redundancy, depending on the raid level of the array. This can be a significant issue, e.g. if you're replacing a 4 TB disk it could mean 10 to 20 hours or much more of heightened risk, depending on the rebuild bandwidth available. Another way would be to add in the new disk and grow the array, wait for the rebuild, then fail out and remove the old disk, shrink the array, and again wait for the rebuild. However once again you lose (some of) your redundancy from the time you've failed the old disk till the rebuild completes; again, potentially many hours. Unless there's some way of telling md to shrink the array off the unwanted device before removing it, and md is smart enough to retain full redundancy during the process? Another way might be to fail out the old drive, create a raid-1 between the old and new drives whilst doing some dance with dd and the original raid metadata and the new raid-1 metadata to make it appear the raid-1 was the original raid member, "re-add" the raid-1 device to the original raid, wait for the rebuild of both the raid-1 and the original raid, fail out the raid-1, do a reverse dd dance to make the new disk look like a primary member of the original raid, then "re-add" the new disk into the original raid. This would mean you only lose redundancy for the windows where the original raid has a failed-out member, i.e. seconds, if done properly. Is this method possible and, if sufficient care is taken, sensible? If it's possible, is this something that could or should be built into md to automate the process and perhaps reduce or completely eliminate the window of reduced redundancy? ...or, indeed, is this something that's already built into md and I need to do some significant self-flagellation with the clue bat? Cheers, Chris. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 4:14 Safe disk replace Chris Dunlop @ 2012-09-04 10:28 ` David Brown 2012-09-04 12:26 ` Mikael Abrahamsson 0 siblings, 1 reply; 13+ messages in thread From: David Brown @ 2012-09-04 10:28 UTC (permalink / raw) To: Chris Dunlop; +Cc: linux-raid On 04/09/2012 06:14, Chris Dunlop wrote: > G'day, > > What is the best way to replace a fully-functional or minimally-failing > (e.g. occasional bad sectors) disk in a live array whilst maintaining as > much redundancy as possible during the process? > > It seems the standard way to replace a disk is to fail out the unwanted > disk, add the new disk, then wait for the array to rebuild. However this > means during the rebuild you've lost some or all of your redundancy, > depending on the raid level of the array. This can be a significant issue, > e.g. if you're replacing a 4 TB disk it could mean 10 to 20 hours or much > more of heightened risk, depending on the rebuild bandwidth available. > > Another way would be to add in the new disk and grow the array, wait for > the rebuild, then fail out and remove the old disk, shrink the array, and > again wait for the rebuild. However once again you lose (some of) your > redundancy from the time you've failed the old disk till the rebuild > completes; again, potentially many hours. Unless there's some way of > telling md to shrink the array off the unwanted device before removing it, > and md is smart enough to retain full redundancy during the process? > > Another way might be to fail out the old drive, create a raid-1 between > the old and new drives whilst doing some dance with dd and the original > raid metadata and the new raid-1 metadata to make it appear the raid-1 was > the original raid member, "re-add" the raid-1 device to the original raid, > wait for the rebuild of both the raid-1 and the original raid, fail out > the raid-1, do a reverse dd dance to make the new disk look like a primary > member of the original raid, then "re-add" the new disk into the original > raid. This would mean you only lose redundancy for the windows where the > original raid has a failed-out member, i.e. seconds, if done properly. > > Is this method possible and, if sufficient care is taken, sensible? > > If it's possible, is this something that could or should be built into md > to automate the process and perhaps reduce or completely eliminate the > window of reduced redundancy? > > ...or, indeed, is this something that's already built into md and I need > to do some significant self-flagellation with the clue bat? > > Cheers, > > Chris. It looks like you've thought through most of the possibilities here. I don't think there is a "best" way to do this sort of replacement, as it depends a bit on the circumstances - what sort of array you have from before, whether you have a spare disk slot, etc. The "raid1" copy you mention will one day be possible with "hot replace" <http://neil.brown.name/blog/20110216044002#2> I don't know how far along this idea is at the moment. I know that it is possible to get much of that effect today if you use single-disk raid1 "mirrors" as the basis for raid5/6/whatever instead of building it directly on disks or partitions. Then it would be easy to add a new disk to a "mirror", wait for it to sync, then remove the old disk. It is, I believe, possible to turn an existing drive/partition into part of a raid1 without metadata, but I am not sure of the details. But that could be used to deal with an existing raid5/6 array. First, make sure you have a write-intent bitmap. Then remove a disk, make a no-metadata raid1 with it, then put it back into the array. There are a lot of details to get right here, so you would want to practice it first! Bad sectors or read failures in the original disk could quickly cause complications here. If you have a raid5 array and want to replace a disk safely, it is relatively easy. Get another extra disk (and this can be a USB disk, a networked disk, etc., if you don't mind the slower speed) and grow your array to an asymmetric raid6 (layout "left-symmetric-6", I believe). This puts the extra parity on the extra disk, and does not change the layout of the rest of the array. Once the grow/rebuild is complete, you can remove the old disk, replace it with the new one, and re-sync. Convert back to normal raid5 (which does not need to change the rest of the array), and remove the extra disk. Again, practice this before doing it on live disks - and make sure you have a good backup. Raid can help protect data from disk errors, but not from human errors! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 10:28 ` David Brown @ 2012-09-04 12:26 ` Mikael Abrahamsson 2012-09-04 15:33 ` Robin Hill 0 siblings, 1 reply; 13+ messages in thread From: Mikael Abrahamsson @ 2012-09-04 12:26 UTC (permalink / raw) To: David Brown; +Cc: Chris Dunlop, linux-raid On Tue, 4 Sep 2012, David Brown wrote: > The "raid1" copy you mention will one day be possible with "hot replace" > <http://neil.brown.name/blog/20110216044002#2> > > I don't know how far along this idea is at the moment. https://lwn.net/Articles/465048/ "hot-replace support for RAID4/5/6: In order to activate hot-replace you need to mark the device as 'replaceable'. This happens automatically when a write error is recorded in a bad-block log (if you happen to have one). It can be achieved manually by echo replaceable > /sys/block/mdXX/md/dev-YYY/state This makes YYY, in XX, replaceable." I don't know if it actually made it into 3.2, I believe I saw somewhere that it was available for 3.3, but Neil Brown should know more. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 12:26 ` Mikael Abrahamsson @ 2012-09-04 15:33 ` Robin Hill 2012-09-04 16:34 ` Mikael Abrahamsson ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Robin Hill @ 2012-09-04 15:33 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1839 bytes --] On Tue Sep 04, 2012 at 02:26:24PM +0200, Mikael Abrahamsson wrote: > On Tue, 4 Sep 2012, David Brown wrote: > > > The "raid1" copy you mention will one day be possible with "hot replace" > > <http://neil.brown.name/blog/20110216044002#2> > > > > I don't know how far along this idea is at the moment. > > https://lwn.net/Articles/465048/ > > "hot-replace support for RAID4/5/6: > > In order to activate hot-replace you need to mark the device as > 'replaceable'. This happens automatically when a write error is recorded > in a bad-block log (if you happen to have one). > > It can be achieved manually by > echo replaceable > /sys/block/mdXX/md/dev-YYY/state > > This makes YYY, in XX, replaceable." > > I don't know if it actually made it into 3.2, I believe I saw somewhere > that it was available for 3.3, but Neil Brown should know more. > I'm currently upgrading my RAID-6 arrays via hot-replacement. The process I followed (to replace device YYY in array mdXX) is: - add the new disk to the array as a spare - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state That kicks off the recovery (a straight disk-to-disk copy from YYY to the new disk). After the rebuild is complete, YYY gets failed in the array, so can be safely removed: - mdadm -r /dev/mdXX /dev/mdYYY That's worked fine so far, and looks to run at the single disk write speed. There were no errors on the old disks though, so I've not seen how that gets handled (it _should_ just do a parity-based recovery from the remaining disks and continue). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 15:33 ` Robin Hill @ 2012-09-04 16:34 ` Mikael Abrahamsson 2012-09-04 17:12 ` Robin Hill 2012-09-05 14:25 ` John Drescher 2012-09-06 3:28 ` Chris Dunlop 2 siblings, 1 reply; 13+ messages in thread From: Mikael Abrahamsson @ 2012-09-04 16:34 UTC (permalink / raw) To: Robin Hill; +Cc: linux-raid On Tue, 4 Sep 2012, Robin Hill wrote: > I'm currently upgrading my RAID-6 arrays via hot-replacement. The > process I followed (to replace device YYY in array mdXX) is: > - add the new disk to the array as a spare > - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state What kernel version are you using? -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 16:34 ` Mikael Abrahamsson @ 2012-09-04 17:12 ` Robin Hill 0 siblings, 0 replies; 13+ messages in thread From: Robin Hill @ 2012-09-04 17:12 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Robin Hill, linux-raid [-- Attachment #1: Type: text/plain, Size: 768 bytes --] On Tue Sep 04, 2012 at 06:34:39PM +0200, Mikael Abrahamsson wrote: > On Tue, 4 Sep 2012, Robin Hill wrote: > > > I'm currently upgrading my RAID-6 arrays via hot-replacement. The > > process I followed (to replace device YYY in array mdXX) is: > > - add the new disk to the array as a spare > > - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state > > What kernel version are you using? > 3.4.9 at the moment. A quick search on the list suggests that this functionality went in at 3.3 though. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 15:33 ` Robin Hill 2012-09-04 16:34 ` Mikael Abrahamsson @ 2012-09-05 14:25 ` John Drescher 2012-09-05 19:35 ` John Drescher 2012-09-06 3:28 ` Chris Dunlop 2 siblings, 1 reply; 13+ messages in thread From: John Drescher @ 2012-09-05 14:25 UTC (permalink / raw) To: linux-raid > I'm currently upgrading my RAID-6 arrays via hot-replacement. The > process I followed (to replace device YYY in array mdXX) is: > - add the new disk to the array as a spare > - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state > > That kicks off the recovery (a straight disk-to-disk copy from YYY to > the new disk). After the rebuild is complete, YYY gets failed in the > array, so can be safely removed: > - mdadm -r /dev/mdXX /dev/mdYYY > Thanks for the info. I wanted this feature for years at work.. I am testing this now on my test box. Here I have 13 x 250GB SATA 1 drives. Yes these are 8+ years old.. md1 : active raid6 sda2[13](R) sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU] [>....................] recovery = 3.4% (8401408/243147776) finish=75.9min speed=51540K/sec Speeds are faster than failing a drive but I would do this more for the lower chance of failure more than the improved performance: md1 : active raid6 sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [_UUUUUUUUUUU] [>....................] recovery = 1.2% (3134952/243147776) finish=100.1min speed=39954K/sec John ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-05 14:25 ` John Drescher @ 2012-09-05 19:35 ` John Drescher 2012-09-05 19:46 ` John Drescher 2012-09-05 20:32 ` Robin Hill 0 siblings, 2 replies; 13+ messages in thread From: John Drescher @ 2012-09-05 19:35 UTC (permalink / raw) To: linux-raid On Wed, Sep 5, 2012 at 10:25 AM, John Drescher <drescherjm@gmail.com> wrote: >> I'm currently upgrading my RAID-6 arrays via hot-replacement. The >> process I followed (to replace device YYY in array mdXX) is: >> - add the new disk to the array as a spare >> - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state >> >> That kicks off the recovery (a straight disk-to-disk copy from YYY to >> the new disk). After the rebuild is complete, YYY gets failed in the >> array, so can be safely removed: >> - mdadm -r /dev/mdXX /dev/mdYYY >> > > Thanks for the info. I wanted this feature for years at work.. > > I am testing this now on my test box. Here I have 13 x 250GB SATA 1 > drives. Yes these are 8+ years old.. > > md1 : active raid6 sda2[13](R) sdk2[17] sdj2[18] sdf2[16] sdm2[19] > sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/12] [UUUUUUUUUUUU] > [>....................] recovery = 3.4% (8401408/243147776) > finish=75.9min speed=51540K/sec > > > Speeds are faster than failing a drive but I would do this more for > the lower chance of failure more than the improved performance: > > md1 : active raid6 sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] > sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/11] [_UUUUUUUUUUU] > [>....................] recovery = 1.2% (3134952/243147776) > finish=100.1min speed=39954K/sec > I found something interesting. I issued want_replacement without spares. localhost md # echo want_replacement > dev-sdd2/state localhost md # cat /proc/mdstat Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] 1048512 blocks [10/10] [UUUUUUUUUU] md1 : active raid6 sdb2[20] sdk2[17] sda2[13] sdj2[18] sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdc2[1](F) 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUUUUUUU] Then I added the failed disk from a previous round as a spare. localhost md # mdadm --manage /dev/md1 --remove /dev/sdc2 mdadm: hot removed /dev/sdc2 from /dev/md1 localhost md # mdadm --zero-superblock /dev/sdc2 localhost md # mdadm --manage /dev/md1 --add /dev/sdc2 mdadm: added /dev/sdc2 localhost md # cat /proc/mdstat Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] 1048512 blocks [10/10] [UUUUUUUUUU] md1 : active raid6 sdc2[22](R) sdb2[20] sdk2[17] sda2[13] sdj2[18] sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/11] [UUUUUUUUUUUU] [>....................] recovery = 0.6% (1592256/243147776) finish=119.2min speed=33746K/sec Now its taking much longer and it says 12/11 instead of 12/12. John ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-05 19:35 ` John Drescher @ 2012-09-05 19:46 ` John Drescher 2012-09-05 20:32 ` Robin Hill 1 sibling, 0 replies; 13+ messages in thread From: John Drescher @ 2012-09-05 19:46 UTC (permalink / raw) To: linux-raid On Wed, Sep 5, 2012 at 3:35 PM, John Drescher <drescherjm@gmail.com> wrote: > On Wed, Sep 5, 2012 at 10:25 AM, John Drescher <drescherjm@gmail.com> wrote: >>> I'm currently upgrading my RAID-6 arrays via hot-replacement. The >>> process I followed (to replace device YYY in array mdXX) is: >>> - add the new disk to the array as a spare >>> - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state >>> >>> That kicks off the recovery (a straight disk-to-disk copy from YYY to >>> the new disk). After the rebuild is complete, YYY gets failed in the >>> array, so can be safely removed: >>> - mdadm -r /dev/mdXX /dev/mdYYY >>> >> >> Thanks for the info. I wanted this feature for years at work.. >> >> I am testing this now on my test box. Here I have 13 x 250GB SATA 1 >> drives. Yes these are 8+ years old.. >> >> md1 : active raid6 sda2[13](R) sdk2[17] sdj2[18] sdf2[16] sdm2[19] >> sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] >> 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 >> [12/12] [UUUUUUUUUUUU] >> [>....................] recovery = 3.4% (8401408/243147776) >> finish=75.9min speed=51540K/sec >> >> >> Speeds are faster than failing a drive but I would do this more for >> the lower chance of failure more than the improved performance: >> >> md1 : active raid6 sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] >> sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] >> 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 >> [12/11] [_UUUUUUUUUUU] >> [>....................] recovery = 1.2% (3134952/243147776) >> finish=100.1min speed=39954K/sec >> > > I found something interesting. I issued want_replacement without spares. > > localhost md # echo want_replacement > dev-sdd2/state > localhost md # cat /proc/mdstat > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > [linear] [multipath] > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > 1048512 blocks [10/10] [UUUUUUUUUU] > > md1 : active raid6 sdb2[20] sdk2[17] sda2[13] sdj2[18] sdf2[16] > sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > sdc2[1](F) > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/11] [UUUUUUUUUUUU] > > Then I added the failed disk from a previous round as a spare. > > localhost md # mdadm --manage /dev/md1 --remove /dev/sdc2 > mdadm: hot removed /dev/sdc2 from /dev/md1 > localhost md # mdadm --zero-superblock /dev/sdc2 > localhost md # mdadm --manage /dev/md1 --add /dev/sdc2 > mdadm: added /dev/sdc2 > > localhost md # cat /proc/mdstat > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > [linear] [multipath] > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > 1048512 blocks [10/10] [UUUUUUUUUU] > > md1 : active raid6 sdc2[22](R) sdb2[20] sdk2[17] sda2[13] sdj2[18] > sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/11] [UUUUUUUUUUUU] > [>....................] recovery = 0.6% (1592256/243147776) > finish=119.2min speed=33746K/sec > > > Now its taking much longer and it says 12/11 instead of 12/12. > I am not sure why it is taking longer this time, however from the drive activity lights on the lsi sas cards it appears that only 2 drives are active in the copy so the raid appears to be doing the correct thing except for the minor difference in the 12/11 versus 12/12. John ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-05 19:35 ` John Drescher 2012-09-05 19:46 ` John Drescher @ 2012-09-05 20:32 ` Robin Hill 2012-09-06 12:59 ` John Drescher 2012-09-10 1:01 ` NeilBrown 1 sibling, 2 replies; 13+ messages in thread From: Robin Hill @ 2012-09-05 20:32 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 4152 bytes --] On Wed Sep 05, 2012 at 03:35:29PM -0400, John Drescher wrote: > On Wed, Sep 5, 2012 at 10:25 AM, John Drescher <drescherjm@gmail.com> wrote: > >> I'm currently upgrading my RAID-6 arrays via hot-replacement. The > >> process I followed (to replace device YYY in array mdXX) is: > >> - add the new disk to the array as a spare > >> - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state > >> > >> That kicks off the recovery (a straight disk-to-disk copy from YYY to > >> the new disk). After the rebuild is complete, YYY gets failed in the > >> array, so can be safely removed: > >> - mdadm -r /dev/mdXX /dev/mdYYY > >> > > > > Thanks for the info. I wanted this feature for years at work.. > > > > I am testing this now on my test box. Here I have 13 x 250GB SATA 1 > > drives. Yes these are 8+ years old.. > > > > md1 : active raid6 sda2[13](R) sdk2[17] sdj2[18] sdf2[16] sdm2[19] > > sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > [12/12] [UUUUUUUUUUUU] > > [>....................] recovery = 3.4% (8401408/243147776) > > finish=75.9min speed=51540K/sec > > > > > > Speeds are faster than failing a drive but I would do this more for > > the lower chance of failure more than the improved performance: > > > > md1 : active raid6 sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] > > sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > [12/11] [_UUUUUUUUUUU] > > [>....................] recovery = 1.2% (3134952/243147776) > > finish=100.1min speed=39954K/sec > > > > I found something interesting. I issued want_replacement without spares. > > localhost md # echo want_replacement > dev-sdd2/state > localhost md # cat /proc/mdstat > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > [linear] [multipath] > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > 1048512 blocks [10/10] [UUUUUUUUUU] > > md1 : active raid6 sdb2[20] sdk2[17] sda2[13] sdj2[18] sdf2[16] > sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > sdc2[1](F) > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/11] [UUUUUUUUUUUU] > > Then I added the failed disk from a previous round as a spare. > > localhost md # mdadm --manage /dev/md1 --remove /dev/sdc2 > mdadm: hot removed /dev/sdc2 from /dev/md1 > localhost md # mdadm --zero-superblock /dev/sdc2 > localhost md # mdadm --manage /dev/md1 --add /dev/sdc2 > mdadm: added /dev/sdc2 > > localhost md # cat /proc/mdstat > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > [linear] [multipath] > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > 1048512 blocks [10/10] [UUUUUUUUUU] > > md1 : active raid6 sdc2[22](R) sdb2[20] sdk2[17] sda2[13] sdj2[18] > sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > [12/11] [UUUUUUUUUUUU] > [>....................] recovery = 0.6% (1592256/243147776) > finish=119.2min speed=33746K/sec > > > Now its taking much longer and it says 12/11 instead of 12/12. > The problem's actually at the point it finishes the recovery. When it fails the replaced disk, it treats it as a failure of an in-array disk. You get the failure email and the array shows as degraded, even though it has the full number of working devices. Your 12/11 would have shown even before you started doing the second replacement. It doesn't seem to cause any problems in use though, and it gets corrected after a reboot. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-05 20:32 ` Robin Hill @ 2012-09-06 12:59 ` John Drescher 2012-09-10 1:01 ` NeilBrown 1 sibling, 0 replies; 13+ messages in thread From: John Drescher @ 2012-09-06 12:59 UTC (permalink / raw) To: linux-raid > The problem's actually at the point it finishes the recovery. When it > fails the replaced disk, it treats it as a failure of an in-array disk. > You get the failure email and the array shows as degraded, even though > it has the full number of working devices. Your 12/11 would have shown > even before you started doing the second replacement. It doesn't seem to > cause any problems in use though, and it gets corrected after a reboot. > Thanks. You are correct. It did show 12/11 before the replacement happened and even after it finished. John ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-05 20:32 ` Robin Hill 2012-09-06 12:59 ` John Drescher @ 2012-09-10 1:01 ` NeilBrown 1 sibling, 0 replies; 13+ messages in thread From: NeilBrown @ 2012-09-10 1:01 UTC (permalink / raw) To: Robin Hill; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 5980 bytes --] On Wed, 5 Sep 2012 21:32:03 +0100 Robin Hill <robin@robinhill.me.uk> wrote: > On Wed Sep 05, 2012 at 03:35:29PM -0400, John Drescher wrote: > > > On Wed, Sep 5, 2012 at 10:25 AM, John Drescher <drescherjm@gmail.com> wrote: > > >> I'm currently upgrading my RAID-6 arrays via hot-replacement. The > > >> process I followed (to replace device YYY in array mdXX) is: > > >> - add the new disk to the array as a spare > > >> - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state > > >> > > >> That kicks off the recovery (a straight disk-to-disk copy from YYY to > > >> the new disk). After the rebuild is complete, YYY gets failed in the > > >> array, so can be safely removed: > > >> - mdadm -r /dev/mdXX /dev/mdYYY > > >> > > > > > > Thanks for the info. I wanted this feature for years at work.. > > > > > > I am testing this now on my test box. Here I have 13 x 250GB SATA 1 > > > drives. Yes these are 8+ years old.. > > > > > > md1 : active raid6 sda2[13](R) sdk2[17] sdj2[18] sdf2[16] sdm2[19] > > > sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > > [12/12] [UUUUUUUUUUUU] > > > [>....................] recovery = 3.4% (8401408/243147776) > > > finish=75.9min speed=51540K/sec > > > > > > > > > Speeds are faster than failing a drive but I would do this more for > > > the lower chance of failure more than the improved performance: > > > > > > md1 : active raid6 sdk2[17] sdj2[18] sdf2[16] sdm2[19] sdl2[14] > > > sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] sdb2[20] sdc2[1] > > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > > [12/11] [_UUUUUUUUUUU] > > > [>....................] recovery = 1.2% (3134952/243147776) > > > finish=100.1min speed=39954K/sec > > > > > > > I found something interesting. I issued want_replacement without spares. > > > > localhost md # echo want_replacement > dev-sdd2/state > > localhost md # cat /proc/mdstat > > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > > [linear] [multipath] > > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > > 1048512 blocks [10/10] [UUUUUUUUUU] > > > > md1 : active raid6 sdb2[20] sdk2[17] sda2[13] sdj2[18] sdf2[16] > > sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > > sdc2[1](F) > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > [12/11] [UUUUUUUUUUUU] > > > > Then I added the failed disk from a previous round as a spare. > > > > localhost md # mdadm --manage /dev/md1 --remove /dev/sdc2 > > mdadm: hot removed /dev/sdc2 from /dev/md1 > > localhost md # mdadm --zero-superblock /dev/sdc2 > > localhost md # mdadm --manage /dev/md1 --add /dev/sdc2 > > mdadm: added /dev/sdc2 > > > > localhost md # cat /proc/mdstat > > Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [raid0] > > [linear] [multipath] > > md0 : active raid1 sda1[10](S) sdj1[0] sdk1[2] sdf1[11](S) sdb1[12](S) > > sdg1[9] sdh1[8] sdl1[7] sdm1[6] sde1[5] sdd1[4] sdi1[3] sdc1[1] > > 1048512 blocks [10/10] [UUUUUUUUUU] > > > > md1 : active raid6 sdc2[22](R) sdb2[20] sdk2[17] sda2[13] sdj2[18] > > sdf2[16] sdm2[19] sdl2[14] sdi2[12] sdg2[15] sde2[5] sdd2[4] sdh2[21] > > 2431477760 blocks super 1.2 level 6, 512k chunk, algorithm 2 > > [12/11] [UUUUUUUUUUUU] > > [>....................] recovery = 0.6% (1592256/243147776) > > finish=119.2min speed=33746K/sec > > > > > > Now its taking much longer and it says 12/11 instead of 12/12. > > > The problem's actually at the point it finishes the recovery. When it > fails the replaced disk, it treats it as a failure of an in-array disk. > You get the failure email and the array shows as degraded, even though > it has the full number of working devices. Your 12/11 would have shown > even before you started doing the second replacement. It doesn't seem to > cause any problems in use though, and it gets corrected after a reboot. > > Cheers, > Robin Thanks for the bug report. This patch should fix it. NeilBrown From d72d7b15e100fc0f9ac95999f39360f44e7b875d Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Mon, 10 Sep 2012 11:00:32 +1000 Subject: [PATCH] md/raid5: fix calculate of 'degraded' when a replacement becomes active. When a replacement device becomes active, we mark the device that it replaces as 'faulty' so that it can subsequently get removed. However 'calc_degraded' only pays attention to the primary device, not the replacement, so the array appears to become degraded, which is wrong. So teach 'calc_degraded' to consider any replacement if a primary device is faulty. Reported-by: Robin Hill <robin@robinhill.me.uk> Reported-by: John Drescher <drescherjm@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7c8151a..919327a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -419,6 +419,8 @@ static int calc_degraded(struct r5conf *conf) degraded = 0; for (i = 0; i < conf->previous_raid_disks; i++) { struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) + rdev = rcu_dereference(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded++; else if (test_bit(In_sync, &rdev->flags)) @@ -443,6 +445,8 @@ static int calc_degraded(struct r5conf *conf) degraded2 = 0; for (i = 0; i < conf->raid_disks; i++) { struct md_rdev *rdev = rcu_dereference(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) + rdev = rcu_dereference(conf->disks[i].replacement); if (!rdev || test_bit(Faulty, &rdev->flags)) degraded2++; else if (test_bit(In_sync, &rdev->flags)) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Safe disk replace 2012-09-04 15:33 ` Robin Hill 2012-09-04 16:34 ` Mikael Abrahamsson 2012-09-05 14:25 ` John Drescher @ 2012-09-06 3:28 ` Chris Dunlop 2 siblings, 0 replies; 13+ messages in thread From: Chris Dunlop @ 2012-09-06 3:28 UTC (permalink / raw) To: linux-raid On Tue, Sep 04, 2012 at 04:33:42PM +0100, Robin Hill wrote: > On Tue Sep 04, 2012 at 02:26:24PM +0200, Mikael Abrahamsson wrote: >> On Tue, 4 Sep 2012, David Brown wrote: >> >>> The "raid1" copy you mention will one day be possible with "hot replace" >>> <http://neil.brown.name/blog/20110216044002#2> >>> >>> I don't know how far along this idea is at the moment. >> >> https://lwn.net/Articles/465048/ >> >> "hot-replace support for RAID4/5/6: >> >> In order to activate hot-replace you need to mark the device as >> 'replaceable'. This happens automatically when a write error is recorded >> in a bad-block log (if you happen to have one). >> >> It can be achieved manually by >> echo replaceable > /sys/block/mdXX/md/dev-YYY/state >> >> This makes YYY, in XX, replaceable." >> >> I don't know if it actually made it into 3.2, I believe I saw somewhere >> that it was available for 3.3, but Neil Brown should know more. > > I'm currently upgrading my RAID-6 arrays via hot-replacement. The > process I followed (to replace device YYY in array mdXX) is: > - add the new disk to the array as a spare > - echo want_replacement > /sys/block/mdXX/md/dev-YYY/state > > That kicks off the recovery (a straight disk-to-disk copy from YYY to > the new disk). After the rebuild is complete, YYY gets failed in the > array, so can be safely removed: > - mdadm -r /dev/mdXX /dev/mdYYY > > That's worked fine so far, and looks to run at the single disk write > speed. There were no errors on the old disks though, so I've not seen > how that gets handled (it _should_ just do a parity-based recovery from > the remaining disks and continue). Thanks all, this is exactly what I was looking for! Cheers, Chris ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-09-10 1:01 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-04 4:14 Safe disk replace Chris Dunlop 2012-09-04 10:28 ` David Brown 2012-09-04 12:26 ` Mikael Abrahamsson 2012-09-04 15:33 ` Robin Hill 2012-09-04 16:34 ` Mikael Abrahamsson 2012-09-04 17:12 ` Robin Hill 2012-09-05 14:25 ` John Drescher 2012-09-05 19:35 ` John Drescher 2012-09-05 19:46 ` John Drescher 2012-09-05 20:32 ` Robin Hill 2012-09-06 12:59 ` John Drescher 2012-09-10 1:01 ` NeilBrown 2012-09-06 3:28 ` Chris Dunlop
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).