* raid 5 to raid 6 reshape gone bad
@ 2011-11-13 2:56 Travis Brown
2011-11-13 3:33 ` Keith Keller
2011-11-13 3:35 ` NeilBrown
0 siblings, 2 replies; 9+ messages in thread
From: Travis Brown @ 2011-11-13 2:56 UTC (permalink / raw)
To: linux-raid
I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome.
Thanks,
Travis
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2]
5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_]
[>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec
/dev/md126:
Version : 0.91
Creation Time : Wed Nov 10 20:19:03 2010
Raid Level : raid6
Array Size : 5856728064 (5585.41 GiB 5997.29 GB)
Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 126
Persistence : Superblock is persistent
Update Time : Sat Nov 12 21:55:46 2011
State : clean, degraded, recovering
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric-6
Chunk Size : 512K
Reshape Status : 0% complete
New Layout : left-symmetric
UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d
Events : 0.124051
Number Major Minor RaidDevice State
0 8 67 0 active sync /dev/sde3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 83 3 active sync /dev/sdf3
4 8 51 4 spare rebuilding /dev/sdd3
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 2:56 raid 5 to raid 6 reshape gone bad Travis Brown @ 2011-11-13 3:33 ` Keith Keller 2011-11-13 3:55 ` Keith Keller 2011-11-13 3:55 ` NeilBrown 2011-11-13 3:35 ` NeilBrown 1 sibling, 2 replies; 9+ messages in thread From: Keith Keller @ 2011-11-13 3:33 UTC (permalink / raw) To: linux-raid On 2011-11-13, Travis Brown <teb@jetcom.org> wrote: > I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] > 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] Do you have hot-swap drives? If so, simply mark sdd3 as faulty with mdadm, remove it from the array, then physically remove and replace it and use mdadm to add the new drive. The raid6 should start to rebuild automatically. If you don't have hot-swap drives, you should still be able to use mdadm to set faulty and remove sdd3, shut down, replace sdd, and reboot. Make sure your /etc/mdadm.conf file is up to date and accurate before shutting down. Alternatively, make sure mdadm *won't* probe for raid arrays, and instead assemble it manually after booting; that might be a more sure way to do it. Either way, with four working disks your new raid6 can still suffer one more drive failure and still be usable. Of course, insert the standard disclaimer about RAID not being a replacement for backups. :) If it won't take too long, you might consider doing and/or refreshing your backups before attempting anything. (The hot-swap method is reasonably safe as long as you're *positive* you're removing the correct disk.) --keith -- kkeller@wombat.san-francisco.ca.us ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 3:33 ` Keith Keller @ 2011-11-13 3:55 ` Keith Keller 2011-11-13 3:55 ` NeilBrown 1 sibling, 0 replies; 9+ messages in thread From: Keith Keller @ 2011-11-13 3:55 UTC (permalink / raw) To: linux-raid On 2011-11-13, Keith Keller <kkeller@wombat.san-francisco.ca.us> wrote: > Do you have hot-swap drives? If so, simply mark sdd3 as faulty with > mdadm, remove it from the array, then physically remove and replace it > and use mdadm to add the new drive. The raid6 should start to rebuild > automatically. ...or, better still, don't listen to me and take Neil's advice. :) --keith -- kkeller@wombat.san-francisco.ca.us ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 3:33 ` Keith Keller 2011-11-13 3:55 ` Keith Keller @ 2011-11-13 3:55 ` NeilBrown 1 sibling, 0 replies; 9+ messages in thread From: NeilBrown @ 2011-11-13 3:55 UTC (permalink / raw) To: Keith Keller; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1994 bytes --] On Sat, 12 Nov 2011 19:33:24 -0800 Keith Keller <kkeller@wombat.san-francisco.ca.us> wrote: > On 2011-11-13, Travis Brown <teb@jetcom.org> wrote: > > I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > > md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] > > 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] > > Do you have hot-swap drives? If so, simply mark sdd3 as faulty with > mdadm, remove it from the array, then physically remove and replace it > and use mdadm to add the new drive. The raid6 should start to rebuild > automatically. Not good advice. This is about reshaping, not rebuilding. I'm sorry, but I think you've completely missed the point. NeilBrown > > If you don't have hot-swap drives, you should still be able to use mdadm > to set faulty and remove sdd3, shut down, replace sdd, and reboot. Make > sure your /etc/mdadm.conf file is up to date and accurate before > shutting down. Alternatively, make sure mdadm *won't* probe for raid > arrays, and instead assemble it manually after booting; that might be a > more sure way to do it. Either way, with four working disks your new > raid6 can still suffer one more drive failure and still be usable. > > Of course, insert the standard disclaimer about RAID not being a > replacement for backups. :) If it won't take too long, you might > consider doing and/or refreshing your backups before attempting > anything. (The hot-swap method is reasonably safe as long as you're > *positive* you're removing the correct disk.) > > --keith > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 2:56 raid 5 to raid 6 reshape gone bad Travis Brown 2011-11-13 3:33 ` Keith Keller @ 2011-11-13 3:35 ` NeilBrown 2011-11-13 13:11 ` Travis Brown 2011-11-13 14:53 ` Travis Brown 1 sibling, 2 replies; 9+ messages in thread From: NeilBrown @ 2011-11-13 3:35 UTC (permalink / raw) To: Travis Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3404 bytes --] On Sat, 12 Nov 2011 21:56:56 -0500 Travis Brown <teb@jetcom.org> wrote: > I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. > > Thanks, > Travis > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] > 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] > [>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec 1/ Don't Panic. You seem to have achieved this step quite effectively - congratulations. 2/ Stop the array cleanly. Not having a backup will only cause possible corruption if the machine crashes while the reshape is happening. The reshape has stopped so there is no chance for corruption. But you still need to cleanly stop the array. (A subsequent version of mdadm may allow you to continue the reshape without the stop/restart step, but we aren't there yet). 3/ Make sure you have a version of mdadm which is at least 3.2. I would suggest the latest:3.2.2. You particularly need the --invalid-backup flag. 4/ Reassemble the array with e.g. mdadm --assemble /dev/md126 --backup=/some/file \ --invalid-backup /dev/sd[bcdef]3 The backup file does not need to exist (I think). Maybe create an empty file and use that just to be safe. The "--invalid-backup" flag says to mdadm "Yes, I know the backup file is currently invalid and you cannot restore anything from it. I happen to know that there is no need to restore anything because I did a clean shutdown. Just use the backup file for making new backups as you continue the reshape". NeilBrown > > /dev/md126: > Version : 0.91 > Creation Time : Wed Nov 10 20:19:03 2010 > Raid Level : raid6 > Array Size : 5856728064 (5585.41 GiB 5997.29 GB) > Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 126 > Persistence : Superblock is persistent > > Update Time : Sat Nov 12 21:55:46 2011 > State : clean, degraded, recovering > Active Devices : 4 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric-6 > Chunk Size : 512K > > Reshape Status : 0% complete > New Layout : left-symmetric > > UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d > Events : 0.124051 > > Number Major Minor RaidDevice State > 0 8 67 0 active sync /dev/sde3 > 1 8 19 1 active sync /dev/sdb3 > 2 8 35 2 active sync /dev/sdc3 > 3 8 83 3 active sync /dev/sdf3 > 4 8 51 4 spare rebuilding /dev/sdd3-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 3:35 ` NeilBrown @ 2011-11-13 13:11 ` Travis Brown 2011-11-13 13:50 ` Travis Brown 2011-11-13 14:53 ` Travis Brown 1 sibling, 1 reply; 9+ messages in thread From: Travis Brown @ 2011-11-13 13:11 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Thanks Neil. Luckily, this array isn't /root, but it doesn't have /home. That means that only way I can effectively stop is is probably going to be to reboot the machine. Since the kernel auto assembles the RAID, I think I'll have to boot from a rescue disk to do this? Is there any way to have mdadm/the kernel look for a backup file on startup? If I have to do it from a rescue disk, the machine will effectively be unusable until the raid finishes it's reshaping. Thanks, Travis On Nov 12, 2011, at 10:35 PM, NeilBrown wrote: > On Sat, 12 Nov 2011 21:56:56 -0500 Travis Brown <teb@jetcom.org> wrote: > >> I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. >> >> Thanks, >> Travis >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >> md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] >> 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] >> [>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec > > 1/ Don't Panic. > You seem to have achieved this step quite effectively - congratulations. > > 2/ Stop the array cleanly. Not having a backup will only cause possible > corruption if the machine crashes while the reshape is happening. The > reshape has stopped so there is no chance for corruption. But you still > need to cleanly stop the array. > (A subsequent version of mdadm may allow you to continue the reshape > without the stop/restart step, but we aren't there yet). > > 3/ Make sure you have a version of mdadm which is at least 3.2. I would > suggest the latest:3.2.2. You particularly need the --invalid-backup > flag. > > 4/ Reassemble the array with e.g. > > mdadm --assemble /dev/md126 --backup=/some/file \ > --invalid-backup /dev/sd[bcdef]3 > > > The backup file does not need to exist (I think). Maybe create an empty > file and use that just to be safe. > The "--invalid-backup" flag says to mdadm "Yes, I know the backup file is > currently invalid and you cannot restore anything from it. I happen to > know that there is no need to restore anything because I did a clean > shutdown. Just use the backup file for making new backups as you continue > the reshape". > > NeilBrown > > >> >> /dev/md126: >> Version : 0.91 >> Creation Time : Wed Nov 10 20:19:03 2010 >> Raid Level : raid6 >> Array Size : 5856728064 (5585.41 GiB 5997.29 GB) >> Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) >> Raid Devices : 5 >> Total Devices : 5 >> Preferred Minor : 126 >> Persistence : Superblock is persistent >> >> Update Time : Sat Nov 12 21:55:46 2011 >> State : clean, degraded, recovering >> Active Devices : 4 >> Working Devices : 5 >> Failed Devices : 0 >> Spare Devices : 1 >> >> Layout : left-symmetric-6 >> Chunk Size : 512K >> >> Reshape Status : 0% complete >> New Layout : left-symmetric >> >> UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d >> Events : 0.124051 >> >> Number Major Minor RaidDevice State >> 0 8 67 0 active sync /dev/sde3 >> 1 8 19 1 active sync /dev/sdb3 >> 2 8 35 2 active sync /dev/sdc3 >> 3 8 83 3 active sync /dev/sdf3 >> 4 8 51 4 spare rebuilding /dev/sdd3-- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 13:11 ` Travis Brown @ 2011-11-13 13:50 ` Travis Brown 0 siblings, 0 replies; 9+ messages in thread From: Travis Brown @ 2011-11-13 13:50 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid I was able to unmount /home by logging in as 'root' so I could stop the RAID, so that's a moot point for my immediate needs. Thanks, Travis On Nov 13, 2011, at 8:11 AM, Travis Brown wrote: > Thanks Neil. > > Luckily, this array isn't /root, but it doesn't have /home. That means that only way I can effectively stop is is probably going to be to reboot the machine. Since the kernel auto assembles the RAID, I think I'll have to boot from a rescue disk to do this? Is there any way to have mdadm/the kernel look for a backup file on startup? If I have to do it from a rescue disk, the machine will effectively be unusable until the raid finishes it's reshaping. > > Thanks, > Travis > > On Nov 12, 2011, at 10:35 PM, NeilBrown wrote: > >> On Sat, 12 Nov 2011 21:56:56 -0500 Travis Brown <teb@jetcom.org> wrote: >> >>> I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. >>> >>> Thanks, >>> Travis >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >>> md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] >>> 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] >>> [>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec >> >> 1/ Don't Panic. >> You seem to have achieved this step quite effectively - congratulations. >> >> 2/ Stop the array cleanly. Not having a backup will only cause possible >> corruption if the machine crashes while the reshape is happening. The >> reshape has stopped so there is no chance for corruption. But you still >> need to cleanly stop the array. >> (A subsequent version of mdadm may allow you to continue the reshape >> without the stop/restart step, but we aren't there yet). >> >> 3/ Make sure you have a version of mdadm which is at least 3.2. I would >> suggest the latest:3.2.2. You particularly need the --invalid-backup >> flag. >> >> 4/ Reassemble the array with e.g. >> >> mdadm --assemble /dev/md126 --backup=/some/file \ >> --invalid-backup /dev/sd[bcdef]3 >> >> >> The backup file does not need to exist (I think). Maybe create an empty >> file and use that just to be safe. >> The "--invalid-backup" flag says to mdadm "Yes, I know the backup file is >> currently invalid and you cannot restore anything from it. I happen to >> know that there is no need to restore anything because I did a clean >> shutdown. Just use the backup file for making new backups as you continue >> the reshape". >> >> NeilBrown >> >> >>> >>> /dev/md126: >>> Version : 0.91 >>> Creation Time : Wed Nov 10 20:19:03 2010 >>> Raid Level : raid6 >>> Array Size : 5856728064 (5585.41 GiB 5997.29 GB) >>> Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) >>> Raid Devices : 5 >>> Total Devices : 5 >>> Preferred Minor : 126 >>> Persistence : Superblock is persistent >>> >>> Update Time : Sat Nov 12 21:55:46 2011 >>> State : clean, degraded, recovering >>> Active Devices : 4 >>> Working Devices : 5 >>> Failed Devices : 0 >>> Spare Devices : 1 >>> >>> Layout : left-symmetric-6 >>> Chunk Size : 512K >>> >>> Reshape Status : 0% complete >>> New Layout : left-symmetric >>> >>> UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d >>> Events : 0.124051 >>> >>> Number Major Minor RaidDevice State >>> 0 8 67 0 active sync /dev/sde3 >>> 1 8 19 1 active sync /dev/sdb3 >>> 2 8 35 2 active sync /dev/sdc3 >>> 3 8 83 3 active sync /dev/sdf3 >>> 4 8 51 4 spare rebuilding /dev/sdd3-- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 3:35 ` NeilBrown 2011-11-13 13:11 ` Travis Brown @ 2011-11-13 14:53 ` Travis Brown 2011-11-14 6:07 ` NeilBrown 1 sibling, 1 reply; 9+ messages in thread From: Travis Brown @ 2011-11-13 14:53 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid So I have the RAID rebuilding and the the output of /proc/mdstat look ok (I think): root@bravo:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md126 : active raid6 sde3[0] sdd3[4] sdf3[3] sdb3[1] 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/3] [UU_U_] [>....................] reshape = 2.0% (39448064/1952242688) finish=7375.5min speed=4321K/sec md11 : active raid6 sde2[0] sdb2[1] sdd2[4] sdc2[2] sdf2[3] 3180672 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] md10 : active raid1 sde1[0] sdb1[1] sdd1[4] sdf1[3] sdc1[2] 208704 blocks [5/5] [UUUUU] unused devices: <none> But mdadm --detail /dev/md126 now shows I have a drive removed: /dev/md126: Version : 0.91 Creation Time : Wed Nov 10 20:19:03 2010 Raid Level : raid6 Array Size : 5856728064 (5585.41 GiB 5997.29 GB) Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) Raid Devices : 5 Total Devices : 4 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Sun Nov 13 09:50:52 2011 State : active, degraded, recovering Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric-6 Chunk Size : 512K Reshape Status : 2% complete New Layout : left-symmetric UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d Events : 0.172355 Number Major Minor RaidDevice State 0 8 67 0 active sync /dev/sde3 1 8 19 1 active sync /dev/sdb3 2 0 0 2 removed 3 8 83 3 active sync /dev/sdf3 4 8 51 4 spare rebuilding /dev/sdd3 Is that expected? Thanks again. I am pretty confident all my data is there, and I do have a (1-day old) backup of the important stuff, but the other 2.5TB of stuff isn't really /that/ important, but I don't want to have to explain to the wife why all her favorite episodes of NCIS that she recorded are gone :) Thanks, Travis On Nov 12, 2011, at 10:35 PM, NeilBrown wrote: > On Sat, 12 Nov 2011 21:56:56 -0500 Travis Brown <teb@jetcom.org> wrote: > >> I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. >> >> Thanks, >> Travis >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >> md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] >> 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] >> [>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec > > 1/ Don't Panic. > You seem to have achieved this step quite effectively - congratulations. > > 2/ Stop the array cleanly. Not having a backup will only cause possible > corruption if the machine crashes while the reshape is happening. The > reshape has stopped so there is no chance for corruption. But you still > need to cleanly stop the array. > (A subsequent version of mdadm may allow you to continue the reshape > without the stop/restart step, but we aren't there yet). > > 3/ Make sure you have a version of mdadm which is at least 3.2. I would > suggest the latest:3.2.2. You particularly need the --invalid-backup > flag. > > 4/ Reassemble the array with e.g. > > mdadm --assemble /dev/md126 --backup=/some/file \ > --invalid-backup /dev/sd[bcdef]3 > > > The backup file does not need to exist (I think). Maybe create an empty > file and use that just to be safe. > The "--invalid-backup" flag says to mdadm "Yes, I know the backup file is > currently invalid and you cannot restore anything from it. I happen to > know that there is no need to restore anything because I did a clean > shutdown. Just use the backup file for making new backups as you continue > the reshape". > > NeilBrown > > >> >> /dev/md126: >> Version : 0.91 >> Creation Time : Wed Nov 10 20:19:03 2010 >> Raid Level : raid6 >> Array Size : 5856728064 (5585.41 GiB 5997.29 GB) >> Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) >> Raid Devices : 5 >> Total Devices : 5 >> Preferred Minor : 126 >> Persistence : Superblock is persistent >> >> Update Time : Sat Nov 12 21:55:46 2011 >> State : clean, degraded, recovering >> Active Devices : 4 >> Working Devices : 5 >> Failed Devices : 0 >> Spare Devices : 1 >> >> Layout : left-symmetric-6 >> Chunk Size : 512K >> >> Reshape Status : 0% complete >> New Layout : left-symmetric >> >> UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d >> Events : 0.124051 >> >> Number Major Minor RaidDevice State >> 0 8 67 0 active sync /dev/sde3 >> 1 8 19 1 active sync /dev/sdb3 >> 2 8 35 2 active sync /dev/sdc3 >> 3 8 83 3 active sync /dev/sdf3 >> 4 8 51 4 spare rebuilding /dev/sdd3-- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid 5 to raid 6 reshape gone bad 2011-11-13 14:53 ` Travis Brown @ 2011-11-14 6:07 ` NeilBrown 0 siblings, 0 replies; 9+ messages in thread From: NeilBrown @ 2011-11-14 6:07 UTC (permalink / raw) To: Travis Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6961 bytes --] On Sun, 13 Nov 2011 09:53:52 -0500 Travis Brown <teb@jetcom.org> wrote: > So I have the RAID rebuilding and the the output of /proc/mdstat look ok (I think): > > root@bravo:~# cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > md126 : active raid6 sde3[0] sdd3[4] sdf3[3] sdb3[1] > 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/3] [UU_U_] > [>....................] reshape = 2.0% (39448064/1952242688) finish=7375.5min speed=4321K/sec > > md11 : active raid6 sde2[0] sdb2[1] sdd2[4] sdc2[2] sdf2[3] > 3180672 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] > > md10 : active raid1 sde1[0] sdb1[1] sdd1[4] sdf1[3] sdc1[2] > 208704 blocks [5/5] [UUUUU] > > unused devices: <none> > > > > But mdadm --detail /dev/md126 now shows I have a drive removed: That is unfortunate. It looks like sdc3 didn't make it back into the array when you re-assembled it. I wonder why not? Any kernel logs from when you started the array. Your RAID6 is now double-degraded so another failure would be bad. Should be unlikely though. As soon as the reshape finished you need to see about adding sdc3 back in to the array so that it rebuilds. The rebuild will be a lot faster than the reshape is. Best thing to do now is to just leave it reshaping. However if there are some really important files that you could conveniently back up, now might be a good time, just in case. NeilBrown > > /dev/md126: > Version : 0.91 > Creation Time : Wed Nov 10 20:19:03 2010 > Raid Level : raid6 > Array Size : 5856728064 (5585.41 GiB 5997.29 GB) > Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) > Raid Devices : 5 > Total Devices : 4 > Preferred Minor : 126 > Persistence : Superblock is persistent > > Update Time : Sun Nov 13 09:50:52 2011 > State : active, degraded, recovering > Active Devices : 3 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric-6 > Chunk Size : 512K > > Reshape Status : 2% complete > New Layout : left-symmetric > > UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d > Events : 0.172355 > > Number Major Minor RaidDevice State > 0 8 67 0 active sync /dev/sde3 > 1 8 19 1 active sync /dev/sdb3 > 2 0 0 2 removed > 3 8 83 3 active sync /dev/sdf3 > 4 8 51 4 spare rebuilding /dev/sdd3 > > Is that expected? > > Thanks again. I am pretty confident all my data is there, and I do have a (1-day old) backup of the important stuff, but the other 2.5TB of stuff isn't really /that/ important, but I don't want to have to explain to the wife why all her favorite episodes of NCIS that she recorded are gone :) > > Thanks, > Travis > > On Nov 12, 2011, at 10:35 PM, NeilBrown wrote: > > > On Sat, 12 Nov 2011 21:56:56 -0500 Travis Brown <teb@jetcom.org> wrote: > > > >> I was reshaping my 5 drive raid 5 with spare to a raid 6 array when the drive I was using for my backup went offline. If that's not murphy's law, I don't know what is. The array is still up and usable, but I'm afraid to reboot or doing anything to it, really. Suggestions on getting this thing back to usable are very welcome. > >> > >> Thanks, > >> Travis > >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] > >> md126 : active raid6 sde3[0] sdf3[3] sdb3[1] sdd3[4] sdc3[2] > >> 5856728064 blocks super 0.91 level 6, 512k chunk, algorithm 18 [5/4] [UUUU_] > >> [>....................] reshape = 0.9% (19267584/1952242688) finish=623878.3min speed=51K/sec > > > > 1/ Don't Panic. > > You seem to have achieved this step quite effectively - congratulations. > > > > 2/ Stop the array cleanly. Not having a backup will only cause possible > > corruption if the machine crashes while the reshape is happening. The > > reshape has stopped so there is no chance for corruption. But you still > > need to cleanly stop the array. > > (A subsequent version of mdadm may allow you to continue the reshape > > without the stop/restart step, but we aren't there yet). > > > > 3/ Make sure you have a version of mdadm which is at least 3.2. I would > > suggest the latest:3.2.2. You particularly need the --invalid-backup > > flag. > > > > 4/ Reassemble the array with e.g. > > > > mdadm --assemble /dev/md126 --backup=/some/file \ > > --invalid-backup /dev/sd[bcdef]3 > > > > > > The backup file does not need to exist (I think). Maybe create an empty > > file and use that just to be safe. > > The "--invalid-backup" flag says to mdadm "Yes, I know the backup file is > > currently invalid and you cannot restore anything from it. I happen to > > know that there is no need to restore anything because I did a clean > > shutdown. Just use the backup file for making new backups as you continue > > the reshape". > > > > NeilBrown > > > > > >> > >> /dev/md126: > >> Version : 0.91 > >> Creation Time : Wed Nov 10 20:19:03 2010 > >> Raid Level : raid6 > >> Array Size : 5856728064 (5585.41 GiB 5997.29 GB) > >> Used Dev Size : 1952242688 (1861.80 GiB 1999.10 GB) > >> Raid Devices : 5 > >> Total Devices : 5 > >> Preferred Minor : 126 > >> Persistence : Superblock is persistent > >> > >> Update Time : Sat Nov 12 21:55:46 2011 > >> State : clean, degraded, recovering > >> Active Devices : 4 > >> Working Devices : 5 > >> Failed Devices : 0 > >> Spare Devices : 1 > >> > >> Layout : left-symmetric-6 > >> Chunk Size : 512K > >> > >> Reshape Status : 0% complete > >> New Layout : left-symmetric > >> > >> UUID : 3fd8b303:7727aa3b:c5d110f2:f9137e1d > >> Events : 0.124051 > >> > >> Number Major Minor RaidDevice State > >> 0 8 67 0 active sync /dev/sde3 > >> 1 8 19 1 active sync /dev/sdb3 > >> 2 8 35 2 active sync /dev/sdc3 > >> 3 8 83 3 active sync /dev/sdf3 > >> 4 8 51 4 spare rebuilding /dev/sdd3-- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-11-14 6:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-13 2:56 raid 5 to raid 6 reshape gone bad Travis Brown 2011-11-13 3:33 ` Keith Keller 2011-11-13 3:55 ` Keith Keller 2011-11-13 3:55 ` NeilBrown 2011-11-13 3:35 ` NeilBrown 2011-11-13 13:11 ` Travis Brown 2011-11-13 13:50 ` Travis Brown 2011-11-13 14:53 ` Travis Brown 2011-11-14 6:07 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).