* Recommendations needed for RAID5 recovery
@ 2016-06-24 19:55 Peter Gebhard
2016-06-24 20:44 ` Another Sillyname
0 siblings, 1 reply; 9+ messages in thread
From: Peter Gebhard @ 2016-06-24 19:55 UTC (permalink / raw)
To: linux-raid
Hello,
I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery.
It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array.
Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’?
Thank you greatly in advance!
raid.status:
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
Name : <->:0
Creation Time : Tue Nov 29 17:33:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe
Update Time : Sun Jun 7 02:28:00 2015
Checksum : f9323080 - correct
Events : 96203
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
Name : <->:0
Creation Time : Tue Nov 29 17:33:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb
Update Time : Sun Jun 19 19:43:31 2016
Checksum : bb6a905f - correct
Events : 344993
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : .A.A ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
Name : <->:0
Creation Time : Tue Nov 29 17:33:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a1ea11e0:6465fe26:483f133d:680014b3
Update Time : Sun Jun 19 19:43:31 2016
Checksum : 738493f3 - correct
Events : 344993
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : .A.A ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
Name : <->:0
Creation Time : Tue Nov 29 17:33:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a
Update Time : Sun Jun 19 12:18:49 2016
Checksum : 9c6d24bf - correct
Events : 343949
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : .AAA ('A' == active, '.' == missing)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Recommendations needed for RAID5 recovery 2016-06-24 19:55 Recommendations needed for RAID5 recovery Peter Gebhard @ 2016-06-24 20:44 ` Another Sillyname 2016-06-24 21:37 ` John Stoffel 0 siblings, 1 reply; 9+ messages in thread From: Another Sillyname @ 2016-06-24 20:44 UTC (permalink / raw) To: Linux-RAID Peter Before attempting any recovery can I suggest that you get 4 x 2TB drives and dd the current drives so you have a backup. Then you can begin to think about performing the raid recovery in the knowledge you have a fallback position if it blows up. Regards Tony On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote: > Hello, > > I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery. > > It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array. > > Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’? > > Thank you greatly in advance! > > raid.status: > > /dev/sdd1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 > Name : <->:0 > Creation Time : Tue Nov 29 17:33:39 2011 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > Array Size : 5860535808 (5589.04 GiB 6001.19 GB) > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe > > Update Time : Sun Jun 7 02:28:00 2015 > Checksum : f9323080 - correct > Events : 96203 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 0 > Array State : AAAA ('A' == active, '.' == missing) > > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 > Name : <->:0 > Creation Time : Tue Nov 29 17:33:39 2011 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > Array Size : 5860535808 (5589.04 GiB 6001.19 GB) > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb > > Update Time : Sun Jun 19 19:43:31 2016 > Checksum : bb6a905f - correct > Events : 344993 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 3 > Array State : .A.A ('A' == active, '.' == missing) > > /dev/sdf1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 > Name : <->:0 > Creation Time : Tue Nov 29 17:33:39 2011 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > Array Size : 5860535808 (5589.04 GiB 6001.19 GB) > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : a1ea11e0:6465fe26:483f133d:680014b3 > > Update Time : Sun Jun 19 19:43:31 2016 > Checksum : 738493f3 - correct > Events : 344993 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 1 > Array State : .A.A ('A' == active, '.' == missing) > > /dev/sdg1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 > Name : <->:0 > Creation Time : Tue Nov 29 17:33:39 2011 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > Array Size : 5860535808 (5589.04 GiB 6001.19 GB) > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a > > Update Time : Sun Jun 19 12:18:49 2016 > Checksum : 9c6d24bf - correct > Events : 343949 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 2 > Array State : .AAA ('A' == active, '.' == missing) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-24 20:44 ` Another Sillyname @ 2016-06-24 21:37 ` John Stoffel 2016-06-25 11:43 ` Wols Lists 0 siblings, 1 reply; 9+ messages in thread From: John Stoffel @ 2016-06-24 21:37 UTC (permalink / raw) To: Another Sillyname; +Cc: Linux-RAID Another> Before attempting any recovery can I suggest that you get 4 x Another> 2TB drives and dd the current drives so you have a backup. Not dd, dd_rescue instead. But yes, try to get new hardware and clone all the suspect drives before you do anything else. Even just cloning the most recently bad drive might be enough to get you going again. John Another> Then you can begin to think about performing the raid recovery in the Another> knowledge you have a fallback position if it blows up. Another> Regards Another> Tony Another> On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote: >> Hello, >> >> I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery. >> >> It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array. >> >> Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’? >> >> Thank you greatly in advance! >> >> raid.status: >> >> /dev/sdd1: >> Magic : a92b4efc >> Version : 1.2 >> Feature Map : 0x0 >> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >> Name : <->:0 >> Creation Time : Tue Nov 29 17:33:39 2011 >> Raid Level : raid5 >> Raid Devices : 4 >> >> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >> Data Offset : 2048 sectors >> Super Offset : 8 sectors >> State : clean >> Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe >> >> Update Time : Sun Jun 7 02:28:00 2015 >> Checksum : f9323080 - correct >> Events : 96203 >> >> Layout : left-symmetric >> Chunk Size : 512K >> >> Device Role : Active device 0 >> Array State : AAAA ('A' == active, '.' == missing) >> >> /dev/sde1: >> Magic : a92b4efc >> Version : 1.2 >> Feature Map : 0x0 >> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >> Name : <->:0 >> Creation Time : Tue Nov 29 17:33:39 2011 >> Raid Level : raid5 >> Raid Devices : 4 >> >> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >> Data Offset : 2048 sectors >> Super Offset : 8 sectors >> State : clean >> Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb >> >> Update Time : Sun Jun 19 19:43:31 2016 >> Checksum : bb6a905f - correct >> Events : 344993 >> >> Layout : left-symmetric >> Chunk Size : 512K >> >> Device Role : Active device 3 >> Array State : .A.A ('A' == active, '.' == missing) >> >> /dev/sdf1: >> Magic : a92b4efc >> Version : 1.2 >> Feature Map : 0x0 >> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >> Name : <->:0 >> Creation Time : Tue Nov 29 17:33:39 2011 >> Raid Level : raid5 >> Raid Devices : 4 >> >> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >> Data Offset : 2048 sectors >> Super Offset : 8 sectors >> State : clean >> Device UUID : a1ea11e0:6465fe26:483f133d:680014b3 >> >> Update Time : Sun Jun 19 19:43:31 2016 >> Checksum : 738493f3 - correct >> Events : 344993 >> >> Layout : left-symmetric >> Chunk Size : 512K >> >> Device Role : Active device 1 >> Array State : .A.A ('A' == active, '.' == missing) >> >> /dev/sdg1: >> Magic : a92b4efc >> Version : 1.2 >> Feature Map : 0x0 >> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >> Name : <->:0 >> Creation Time : Tue Nov 29 17:33:39 2011 >> Raid Level : raid5 >> Raid Devices : 4 >> >> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >> Data Offset : 2048 sectors >> Super Offset : 8 sectors >> State : active >> Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a >> >> Update Time : Sun Jun 19 12:18:49 2016 >> Checksum : 9c6d24bf - correct >> Events : 343949 >> >> Layout : left-symmetric >> Chunk Size : 512K >> >> Device Role : Active device 2 >> Array State : .AAA ('A' == active, '.' == missing) >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html Another> -- Another> To unsubscribe from this list: send the line "unsubscribe linux-raid" in Another> the body of a message to majordomo@vger.kernel.org Another> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-24 21:37 ` John Stoffel @ 2016-06-25 11:43 ` Wols Lists 2016-06-25 16:49 ` Phil Turmel 0 siblings, 1 reply; 9+ messages in thread From: Wols Lists @ 2016-06-25 11:43 UTC (permalink / raw) Cc: Linux-RAID On 24/06/16 22:37, John Stoffel wrote: > > Another> Before attempting any recovery can I suggest that you get 4 x > Another> 2TB drives and dd the current drives so you have a backup. > > Not dd, dd_rescue instead. But yes, try to get new hardware and clone > all the suspect drives before you do anything else. Even just cloning > the most recently bad drive might be enough to get you going again. As I got told rather sharply :-) there's a big difference between dd and ddrescue. IFF dd completes successfully (it'll bomb on an error) then you have a "known good" copy. In other words the problem with dd is it won't work on a bad drive, but if it does work you're home and dry. ddrescue will ALWAYS work - but if it can't read a block it will leave an empty block in the copy! This is a bomb waiting to go off! In other words, ddrescue is great at recovering what you can from a damaged filesystem - less so at recovering a disk with a complicated setup on top. I know you're getting conflicting advice, but I'd try to get a good dd backup first. I don't know of any utility that will do an md integrity check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ... Oh - and make sure you new disks are proper raid - eg WD Red or Seagate NAS. And are your current disks proper raid? If not, fix the timeout problem and your life *may* be made a lot simpler ... Have you got spare SATA ports? If not, go out and get an add-in card! If you can force the array to assemble, and create a temporary six-drive array (the two dud ones being assembled with the --replace option to move them to two new ones) that may be your best bet at recovery. If md can get at a clean read from three drives for each block, then it'll be able to rebuild the missing block. Cheers, Wol > > John > > > > Another> Then you can begin to think about performing the raid recovery in the > Another> knowledge you have a fallback position if it blows up. > > Another> Regards > > Another> Tony > > Another> On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote: >>> Hello, >>> >>> I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery. >>> >>> It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array. >>> >>> Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’? >>> >>> Thank you greatly in advance! >>> >>> raid.status: >>> >>> /dev/sdd1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x0 >>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >>> Name : <->:0 >>> Creation Time : Tue Nov 29 17:33:39 2011 >>> Raid Level : raid5 >>> Raid Devices : 4 >>> >>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe >>> >>> Update Time : Sun Jun 7 02:28:00 2015 >>> Checksum : f9323080 - correct >>> Events : 96203 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 0 >>> Array State : AAAA ('A' == active, '.' == missing) >>> >>> /dev/sde1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x0 >>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >>> Name : <->:0 >>> Creation Time : Tue Nov 29 17:33:39 2011 >>> Raid Level : raid5 >>> Raid Devices : 4 >>> >>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb >>> >>> Update Time : Sun Jun 19 19:43:31 2016 >>> Checksum : bb6a905f - correct >>> Events : 344993 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 3 >>> Array State : .A.A ('A' == active, '.' == missing) >>> >>> /dev/sdf1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x0 >>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >>> Name : <->:0 >>> Creation Time : Tue Nov 29 17:33:39 2011 >>> Raid Level : raid5 >>> Raid Devices : 4 >>> >>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : clean >>> Device UUID : a1ea11e0:6465fe26:483f133d:680014b3 >>> >>> Update Time : Sun Jun 19 19:43:31 2016 >>> Checksum : 738493f3 - correct >>> Events : 344993 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 1 >>> Array State : .A.A ('A' == active, '.' == missing) >>> >>> /dev/sdg1: >>> Magic : a92b4efc >>> Version : 1.2 >>> Feature Map : 0x0 >>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3 >>> Name : <->:0 >>> Creation Time : Tue Nov 29 17:33:39 2011 >>> Raid Level : raid5 >>> Raid Devices : 4 >>> >>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) >>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB) >>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) >>> Data Offset : 2048 sectors >>> Super Offset : 8 sectors >>> State : active >>> Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a >>> >>> Update Time : Sun Jun 19 12:18:49 2016 >>> Checksum : 9c6d24bf - correct >>> Events : 343949 >>> >>> Layout : left-symmetric >>> Chunk Size : 512K >>> >>> Device Role : Active device 2 >>> Array State : .AAA ('A' == active, '.' == missing) >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > Another> -- > Another> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > Another> the body of a message to majordomo@vger.kernel.org > Another> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-25 11:43 ` Wols Lists @ 2016-06-25 16:49 ` Phil Turmel 2016-06-26 21:12 ` Wols Lists 0 siblings, 1 reply; 9+ messages in thread From: Phil Turmel @ 2016-06-25 16:49 UTC (permalink / raw) To: Wols Lists, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel Hi Wol, Peter, { Convention on kernel.org is to reply-to-all, bottom or interleave replies, and trim unnecessary context. CC list fixed up accordingly. } On 06/25/2016 07:43 AM, Wols Lists wrote: > I know you're getting conflicting advice, but I'd try to get a good dd > backup first. I don't know of any utility that will do an md integrity > check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ... Conflicting advice indeed. More conflict ahead: dd is totally useless for raid recovery in all cases. ddrescue may be of use in this case: If there is redundancy available for proper MD rewrite of UREs, you want to run the original devices with the UREs, so they'll get fixed. No need for dd. If there's no redundancy available, then you have to fix the UREs without knowing the correct content, and ddrescue will do that (putting zeroes in the copy). > Oh - and make sure you new disks are proper raid - eg WD Red or Seagate > NAS. And are your current disks proper raid? If not, fix the timeout > problem and your life *may* be made a lot simpler ... Yes, timeout mismatch is a common problem and absolutely *must* be addressed if you run a raid array. Some older posts of mine that help explain the issue are linked below. If you'd like advice on the status of your drives, post the output of: for x in /dev/sd[defg] ; do echo $x ; smartctl -iA -l scterc $x ; done > Have you got spare SATA ports? If not, go out and get an add-in card! If > you can force the array to assemble, and create a temporary six-drive > array (the two dud ones being assembled with the --replace option to > move them to two new ones) that may be your best bet at recovery. If md > can get at a clean read from three drives for each block, then it'll be > able to rebuild the missing block. No. The first drive that dropped out did so more than a year ago -- it's content is totally untrustworthy. It is only suitable for wipe and re-use if it is physically still OK. Which means that the the balance of the drives have no redundancy available to reconstruct data for any UREs remaining in the array. If there were, forced assembly of originals after any timeout mismatch fixes would be the correct solution. That would let remaining redundancy fix UREs while adding more redundancy (the #1 reason for choosing raid6 over raid5). Peter, I strongly recommend that you perform a forced assembly on the three drives, omitting the unit kicked out last year. (After fixing any timeout issue, if any. Very likely, btw.) Mount the filesystem read-only and backup the absolutely critical items. Do not use fsck yet. You may encounter UREs that causes some of these copies to fail, letting you know which files to not trust later. If you encounter enough failures to drop the array again, simply repeat the forced assembly and readonly mount and carry on. When you've gotten all you can that way, shut down the array and use ddrescue to duplicate all three drives. Take the originals out of the box, and force assemble the new drives. Run fsck to fix any remaining errors from zeroed blocks, then mount and backup anything else you need. If you need to keep costs down, it would be fairly low risk to just ddrescue the most recent failure onto the oldest (which will write over any UREs it currently has). Then forced assemble with it instead. And add a drive to the array to get back to a redundant operation. Consider adding another drive after that and reshaping to raid6. If your drives really are ok (timeout issue, not physical), then you could re-use one or more of the originals to get back to full operation. Use --zero-superblock on them to allow MD to use them again. Phil Readings for timeout mismatch: (whole threads if possible) http://marc.info/?l=linux-raid&m=139050322510249&w=2 http://marc.info/?l=linux-raid&m=135863964624202&w=2 http://marc.info/?l=linux-raid&m=135811522817345&w=1 http://marc.info/?l=linux-raid&m=133761065622164&w=2 http://marc.info/?l=linux-raid&m=132477199207506 http://marc.info/?l=linux-raid&m=133665797115876&w=2 http://marc.info/?l=linux-raid&m=142487508806844&w=3 http://marc.info/?l=linux-raid&m=144535576302583&w=2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-25 16:49 ` Phil Turmel @ 2016-06-26 21:12 ` Wols Lists 2016-06-26 22:18 ` Phil Turmel 0 siblings, 1 reply; 9+ messages in thread From: Wols Lists @ 2016-06-26 21:12 UTC (permalink / raw) To: Phil Turmel, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel On 25/06/16 17:49, Phil Turmel wrote: > Hi Wol, Peter, > > { Convention on kernel.org is to reply-to-all, bottom or interleave > replies, and trim unnecessary context. CC list fixed up accordingly. } Sorry, but the OP had already been trimmed, I trimmed a bit further... > > On 06/25/2016 07:43 AM, Wols Lists wrote: > >> I know you're getting conflicting advice, but I'd try to get a good dd >> backup first. I don't know of any utility that will do an md integrity >> check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ... > > Conflicting advice indeed. More conflict ahead: > > dd is totally useless for raid recovery in all cases. ddrescue may be > of use in this case: And if dd gets a copy without errors, what's the difference between that and a ddrescue? Surely they're identical? That said, it struck me you're probably better off using ddrescue, because ddrescue could get that copy in one. So if you can get it in one, it doesn't matter which you use, so you should use ddrescue because it saves a wasted attempt with dd. (I've just read the ddrescue man page. Recommended reading ... :-) > > Which means that the the balance of the drives have no redundancy > available to reconstruct data for any UREs remaining in the array. If > there were, forced assembly of originals after any timeout mismatch > fixes would be the correct solution. That would let remaining > redundancy fix UREs while adding more redundancy (the #1 reason for > choosing raid6 over raid5). > > Peter, I strongly recommend that you perform a forced assembly on the > three drives, omitting the unit kicked out last year. (After fixing any > timeout issue, if any. Very likely, btw.) Mount the filesystem > read-only and backup the absolutely critical items. Do not use fsck > yet. You may encounter UREs that causes some of these copies to fail, > letting you know which files to not trust later. If you encounter > enough failures to drop the array again, simply repeat the forced > assembly and readonly mount and carry on. > > When you've gotten all you can that way, shut down the array and use > ddrescue to duplicate all three drives. Take the originals out of the > box, and force assemble the new drives. Run fsck to fix any remaining > errors from zeroed blocks, then mount and backup anything else you need. > > If you need to keep costs down, it would be fairly low risk to just > ddrescue the most recent failure onto the oldest (which will write over > any UREs it currently has). Then forced assemble with it instead. > > And add a drive to the array to get back to a redundant operation. > Consider adding another drive after that and reshaping to raid6. If > your drives really are ok (timeout issue, not physical), then you could > re-use one or more of the originals to get back to full operation. Use > --zero-superblock on them to allow MD to use them again. > Hmm... Would it be an idea to get 4 by 3TB drives? That way he can do the backup straight on to a RAID6 array, and IF he gets a successful backup then the old drives are now redundant, for backups or whatever (3TB Reds or NASs are about £100 each...) Cheers, Wol -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-26 21:12 ` Wols Lists @ 2016-06-26 22:18 ` Phil Turmel 2016-06-27 9:06 ` Andreas Klauer 0 siblings, 1 reply; 9+ messages in thread From: Phil Turmel @ 2016-06-26 22:18 UTC (permalink / raw) To: Wols Lists, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel On 06/26/2016 05:12 PM, Wols Lists wrote: > On 25/06/16 17:49, Phil Turmel wrote: >> dd is totally useless for raid recovery in all cases. ddrescue may be >> of use in this case: > > And if dd gets a copy without errors, what's the difference between that > and a ddrescue? Surely they're identical? > > That said, it struck me you're probably better off using ddrescue, > because ddrescue could get that copy in one. So if you can get it in > one, it doesn't matter which you use, so you should use ddrescue because > it saves a wasted attempt with dd. (I've just read the ddrescue man > page. Recommended reading ... :-) dd will only copy a device that has no UREs. If it has no UREs, it'll work in the raid array even if there's no redundancy. So duplicating that device is pointless for pure recovery purposes. If you already know you have one or more UREs on a device and no other redundancy to reconstruct with, you go straight to ddrescue -- you know that dd won't work, so why bother? And in this case we know -- the drive was kicked out of the array. The OP hasn't provided any further information, so we don't know if this is the typical timeout mismatch calamity, or if there's more serious problems, but that doesn't change the advice. Finally, if you know you have UREs *and* you still have redundancy (raid6 or raid10,n3 degraded by one, f.e.), you want to keep the drive with the UREs in place until replaced so MD can perform reconstruction when it hits that spot. Again, no role for dd. Now, when the problem isn't just an array that needs to be reassembled but rather a true reconstruction from unknown parameters, the duplicate devices are needed because the experiments to discover the parameters can be destructive. In this case we need complete copies for the experiments, whether UREs are present or not. ddrescue again. (Knowing there are UREs makes it important to use the original devices during recreation after the experiments are done.) I don't know *any* scenario where dd is useful for raid recovery. > Hmm... > > Would it be an idea to get 4 by 3TB drives? That way he can do the > backup straight on to a RAID6 array, and IF he gets a successful backup > then the old drives are now redundant, for backups or whatever (3TB Reds > or NASs are about £100 each...) Many hobbyists and small business people have limited budgets. (Been there!) I try to avoid recommending expensive replacements unless the situation clearly calls for it. I specifically omitted any suggestion of where to perform backups since it may make sense for the OP to just copy important stuff onto an external USB device. It's entirely possible that the OP is researching timeout mismatch and patching up some green drives. It's not ideal, but such an array can operate with the work-arounds for years if it's regularly scrubbed. And it's cheap, which sometimes matters. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-26 22:18 ` Phil Turmel @ 2016-06-27 9:06 ` Andreas Klauer 2016-06-27 10:05 ` Mikael Abrahamsson 0 siblings, 1 reply; 9+ messages in thread From: Andreas Klauer @ 2016-06-27 9:06 UTC (permalink / raw) To: Phil Turmel Cc: Wols Lists, Peter Gebhard, Linux-RAID, Another Sillyname, John Stoffel On Sun, Jun 26, 2016 at 06:18:04PM -0400, Phil Turmel wrote: > I don't know *any* scenario where dd is useful for raid recovery. Basically you use dd to make a copy to play around with but with overlays (as described in the RAID wiki) you don't need those copies anymore. [as long as the device is intact] There is conv=noerror,sync however I found recently that it actually produces corrupt copies (changes offsets), still not sure what the conv=noerror,sync is actually supposed to be for. ( http://superuser.com/a/1075837/195171 ) So you shouldn't use dd on bad disks... I've had my fair share of issues with ddrescue as well, it likes to get stuck forever and not continue. --min-read-rate=<bytes> might help in such a case... (better to copy regions that still work quickly rather than spending hours on error correction ...). ddrescue produces a log (or a map) that records bad regions, so if you somehow need a device that still produces read errors for those, it can probably be magicked with the device mapper... Regards Andreas Klauer ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Recommendations needed for RAID5 recovery 2016-06-27 9:06 ` Andreas Klauer @ 2016-06-27 10:05 ` Mikael Abrahamsson 0 siblings, 0 replies; 9+ messages in thread From: Mikael Abrahamsson @ 2016-06-27 10:05 UTC (permalink / raw) To: Andreas Klauer; +Cc: Linux-RAID On Mon, 27 Jun 2016, Andreas Klauer wrote: > So you shouldn't use dd on bad disks... There is no downside to using ddrescue that I can see. If everything works ok, then you get a 0 error copy, and you got the exact same result as with dd. If you did have errors, then ddrescue will try to do the best of the situation and try to copy as much as possible from the drive You also get a list of the errored sectors that you can use for future reference. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-06-27 10:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-24 19:55 Recommendations needed for RAID5 recovery Peter Gebhard 2016-06-24 20:44 ` Another Sillyname 2016-06-24 21:37 ` John Stoffel 2016-06-25 11:43 ` Wols Lists 2016-06-25 16:49 ` Phil Turmel 2016-06-26 21:12 ` Wols Lists 2016-06-26 22:18 ` Phil Turmel 2016-06-27 9:06 ` Andreas Klauer 2016-06-27 10:05 ` Mikael Abrahamsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).