Recommendations needed for RAID5 recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Recommendations needed for RAID5 recovery
@ 2016-06-24 19:55 Peter Gebhard
  2016-06-24 20:44 ` Another Sillyname
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Gebhard @ 2016-06-24 19:55 UTC (permalink / raw)
  To: linux-raid

Hello,

I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery. 

It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array.

Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’?

Thank you greatly in advance!

raid.status:

/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
           Name : <->:0
  Creation Time : Tue Nov 29 17:33:39 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe

    Update Time : Sun Jun  7 02:28:00 2015
       Checksum : f9323080 - correct
         Events : 96203

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)

/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
           Name : <->:0
  Creation Time : Tue Nov 29 17:33:39 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb

    Update Time : Sun Jun 19 19:43:31 2016
       Checksum : bb6a905f - correct
         Events : 344993

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .A.A ('A' == active, '.' == missing)

/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
           Name : <->:0
  Creation Time : Tue Nov 29 17:33:39 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a1ea11e0:6465fe26:483f133d:680014b3

    Update Time : Sun Jun 19 19:43:31 2016
       Checksum : 738493f3 - correct
         Events : 344993

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .A.A ('A' == active, '.' == missing)

/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
           Name : <->:0
  Creation Time : Tue Nov 29 17:33:39 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a

    Update Time : Sun Jun 19 12:18:49 2016
       Checksum : 9c6d24bf - correct
         Events : 343949

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAA ('A' == active, '.' == missing)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-24 19:55 Recommendations needed for RAID5 recovery Peter Gebhard
@ 2016-06-24 20:44 ` Another Sillyname
  2016-06-24 21:37   ` John Stoffel
  0 siblings, 1 reply; 9+ messages in thread
From: Another Sillyname @ 2016-06-24 20:44 UTC (permalink / raw)
  To: Linux-RAID

Peter

Before attempting any recovery can I suggest that you get 4 x 2TB
drives and dd the current drives so you have a backup.

Then you can begin to think about performing the raid recovery in the
knowledge you have a fallback position if it blows up.

Regards

Tony

On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote:
> Hello,
>
> I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery.
>
> It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array.
>
> Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’?
>
> Thank you greatly in advance!
>
> raid.status:
>
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>            Name : <->:0
>   Creation Time : Tue Nov 29 17:33:39 2011
>      Raid Level : raid5
>    Raid Devices : 4
>
>  Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>      Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe
>
>     Update Time : Sun Jun  7 02:28:00 2015
>        Checksum : f9323080 - correct
>          Events : 96203
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 0
>    Array State : AAAA ('A' == active, '.' == missing)
>
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>            Name : <->:0
>   Creation Time : Tue Nov 29 17:33:39 2011
>      Raid Level : raid5
>    Raid Devices : 4
>
>  Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>      Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb
>
>     Update Time : Sun Jun 19 19:43:31 2016
>        Checksum : bb6a905f - correct
>          Events : 344993
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 3
>    Array State : .A.A ('A' == active, '.' == missing)
>
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>            Name : <->:0
>   Creation Time : Tue Nov 29 17:33:39 2011
>      Raid Level : raid5
>    Raid Devices : 4
>
>  Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>      Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : a1ea11e0:6465fe26:483f133d:680014b3
>
>     Update Time : Sun Jun 19 19:43:31 2016
>        Checksum : 738493f3 - correct
>          Events : 344993
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 1
>    Array State : .A.A ('A' == active, '.' == missing)
>
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>            Name : <->:0
>   Creation Time : Tue Nov 29 17:33:39 2011
>      Raid Level : raid5
>    Raid Devices : 4
>
>  Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>      Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>   Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a
>
>     Update Time : Sun Jun 19 12:18:49 2016
>        Checksum : 9c6d24bf - correct
>          Events : 343949
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>    Device Role : Active device 2
>    Array State : .AAA ('A' == active, '.' == missing)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-24 20:44 ` Another Sillyname
@ 2016-06-24 21:37   ` John Stoffel
  2016-06-25 11:43     ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: John Stoffel @ 2016-06-24 21:37 UTC (permalink / raw)
  To: Another Sillyname; +Cc: Linux-RAID


Another> Before attempting any recovery can I suggest that you get 4 x
Another> 2TB drives and dd the current drives so you have a backup.

Not dd, dd_rescue instead.  But yes, try to get new hardware and clone
all the suspect drives before you do anything else.  Even just cloning
the most recently bad drive might be enough to get you going again.

John



Another> Then you can begin to think about performing the raid recovery in the
Another> knowledge you have a fallback position if it blows up.

Another> Regards

Another> Tony

Another> On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote:
>> Hello,
>> 
>> I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery.
>> 
>> It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array.
>> 
>> Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’?
>> 
>> Thank you greatly in advance!
>> 
>> raid.status:
>> 
>> /dev/sdd1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>> Name : <->:0
>> Creation Time : Tue Nov 29 17:33:39 2011
>> Raid Level : raid5
>> Raid Devices : 4
>> 
>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>> Data Offset : 2048 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe
>> 
>> Update Time : Sun Jun  7 02:28:00 2015
>> Checksum : f9323080 - correct
>> Events : 96203
>> 
>> Layout : left-symmetric
>> Chunk Size : 512K
>> 
>> Device Role : Active device 0
>> Array State : AAAA ('A' == active, '.' == missing)
>> 
>> /dev/sde1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>> Name : <->:0
>> Creation Time : Tue Nov 29 17:33:39 2011
>> Raid Level : raid5
>> Raid Devices : 4
>> 
>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>> Data Offset : 2048 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb
>> 
>> Update Time : Sun Jun 19 19:43:31 2016
>> Checksum : bb6a905f - correct
>> Events : 344993
>> 
>> Layout : left-symmetric
>> Chunk Size : 512K
>> 
>> Device Role : Active device 3
>> Array State : .A.A ('A' == active, '.' == missing)
>> 
>> /dev/sdf1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>> Name : <->:0
>> Creation Time : Tue Nov 29 17:33:39 2011
>> Raid Level : raid5
>> Raid Devices : 4
>> 
>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>> Data Offset : 2048 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : a1ea11e0:6465fe26:483f133d:680014b3
>> 
>> Update Time : Sun Jun 19 19:43:31 2016
>> Checksum : 738493f3 - correct
>> Events : 344993
>> 
>> Layout : left-symmetric
>> Chunk Size : 512K
>> 
>> Device Role : Active device 1
>> Array State : .A.A ('A' == active, '.' == missing)
>> 
>> /dev/sdg1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>> Name : <->:0
>> Creation Time : Tue Nov 29 17:33:39 2011
>> Raid Level : raid5
>> Raid Devices : 4
>> 
>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>> Data Offset : 2048 sectors
>> Super Offset : 8 sectors
>> State : active
>> Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a
>> 
>> Update Time : Sun Jun 19 12:18:49 2016
>> Checksum : 9c6d24bf - correct
>> Events : 343949
>> 
>> Layout : left-symmetric
>> Chunk Size : 512K
>> 
>> Device Role : Active device 2
>> Array State : .AAA ('A' == active, '.' == missing)
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Another> --
Another> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
Another> the body of a message to majordomo@vger.kernel.org
Another> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-24 21:37   ` John Stoffel
@ 2016-06-25 11:43     ` Wols Lists
  2016-06-25 16:49       ` Phil Turmel
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-06-25 11:43 UTC (permalink / raw)
  Cc: Linux-RAID

On 24/06/16 22:37, John Stoffel wrote:
> 
> Another> Before attempting any recovery can I suggest that you get 4 x
> Another> 2TB drives and dd the current drives so you have a backup.
> 
> Not dd, dd_rescue instead.  But yes, try to get new hardware and clone
> all the suspect drives before you do anything else.  Even just cloning
> the most recently bad drive might be enough to get you going again.

As I got told rather sharply :-) there's a big difference between dd and
ddrescue.

IFF dd completes successfully (it'll bomb on an error) then you have a
"known good" copy. In other words the problem with dd is it won't work
on a bad drive, but if it does work you're home and dry.

ddrescue will ALWAYS work - but if it can't read a block it will leave
an empty block in the copy! This is a bomb waiting to go off! In other
words, ddrescue is great at recovering what you can from a damaged
filesystem - less so at recovering a disk with a complicated setup on top.

I know you're getting conflicting advice, but I'd try to get a good dd
backup first. I don't know of any utility that will do an md integrity
check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ...

Oh - and make sure you new disks are proper raid - eg WD Red or Seagate
NAS. And are your current disks proper raid? If not, fix the timeout
problem and your life *may* be made a lot simpler ...

Have you got spare SATA ports? If not, go out and get an add-in card! If
you can force the array to assemble, and create a temporary six-drive
array (the two dud ones being assembled with the --replace option to
move them to two new ones) that may be your best bet at recovery. If md
can get at a clean read from three drives for each block, then it'll be
able to rebuild the missing block.

Cheers,
Wol
> 
> John
> 
> 
> 
> Another> Then you can begin to think about performing the raid recovery in the
> Another> knowledge you have a fallback position if it blows up.
> 
> Another> Regards
> 
> Another> Tony
> 
> Another> On 24 June 2016 at 20:55, Peter Gebhard <pgeb@seas.upenn.edu> wrote:
>>> Hello,
>>>
>>> I have been asked to attempt data recovery on a RAID5 array which appears to have had two disk failures (in an array of four disks). I am gratefully hoping that some on this list could offer recommendations for my next steps. I have provided below the current state of the array per https://raid.wiki.kernel.org/index.php/RAID_Recovery.
>>>
>>> It appears from the output below that one of the disks (sdd1) failed last year and the admin did not notice this. Now, it appears a second disk (sdg1) has recently had read errors and was kicked out of the array.
>>>
>>> Should I try to restore the array using the recreate_array.pl script provided on the RAID_Recovery site? Should I then try to recreate the array and/or perform ‘fsck’?
>>>
>>> Thank you greatly in advance!
>>>
>>> raid.status:
>>>
>>> /dev/sdd1:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x0
>>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>>> Name : <->:0
>>> Creation Time : Tue Nov 29 17:33:39 2011
>>> Raid Level : raid5
>>> Raid Devices : 4
>>>
>>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>>> Data Offset : 2048 sectors
>>> Super Offset : 8 sectors
>>> State : clean
>>> Device UUID : ec1a6336:0a991298:5b409bf1:4585ccbe
>>>
>>> Update Time : Sun Jun  7 02:28:00 2015
>>> Checksum : f9323080 - correct
>>> Events : 96203
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 512K
>>>
>>> Device Role : Active device 0
>>> Array State : AAAA ('A' == active, '.' == missing)
>>>
>>> /dev/sde1:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x0
>>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>>> Name : <->:0
>>> Creation Time : Tue Nov 29 17:33:39 2011
>>> Raid Level : raid5
>>> Raid Devices : 4
>>>
>>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>>> Data Offset : 2048 sectors
>>> Super Offset : 8 sectors
>>> State : clean
>>> Device UUID : ed13b045:ca75ab96:83045f97:e4fd62cb
>>>
>>> Update Time : Sun Jun 19 19:43:31 2016
>>> Checksum : bb6a905f - correct
>>> Events : 344993
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 512K
>>>
>>> Device Role : Active device 3
>>> Array State : .A.A ('A' == active, '.' == missing)
>>>
>>> /dev/sdf1:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x0
>>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>>> Name : <->:0
>>> Creation Time : Tue Nov 29 17:33:39 2011
>>> Raid Level : raid5
>>> Raid Devices : 4
>>>
>>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>>> Data Offset : 2048 sectors
>>> Super Offset : 8 sectors
>>> State : clean
>>> Device UUID : a1ea11e0:6465fe26:483f133d:680014b3
>>>
>>> Update Time : Sun Jun 19 19:43:31 2016
>>> Checksum : 738493f3 - correct
>>> Events : 344993
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 512K
>>>
>>> Device Role : Active device 1
>>> Array State : .A.A ('A' == active, '.' == missing)
>>>
>>> /dev/sdg1:
>>> Magic : a92b4efc
>>> Version : 1.2
>>> Feature Map : 0x0
>>> Array UUID : bfb03f95:834b520d:60f773d8:ecb6b9e3
>>> Name : <->:0
>>> Creation Time : Tue Nov 29 17:33:39 2011
>>> Raid Level : raid5
>>> Raid Devices : 4
>>>
>>> Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB)
>>> Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
>>> Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
>>> Data Offset : 2048 sectors
>>> Super Offset : 8 sectors
>>> State : active
>>> Device UUID : a9737439:17f81210:484d4f4c:c3d34a8a
>>>
>>> Update Time : Sun Jun 19 12:18:49 2016
>>> Checksum : 9c6d24bf - correct
>>> Events : 343949
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 512K
>>>
>>> Device Role : Active device 2
>>> Array State : .AAA ('A' == active, '.' == missing)
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Another> --
> Another> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> Another> the body of a message to majordomo@vger.kernel.org
> Another> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-25 11:43     ` Wols Lists
@ 2016-06-25 16:49       ` Phil Turmel
  2016-06-26 21:12         ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Phil Turmel @ 2016-06-25 16:49 UTC (permalink / raw)
  To: Wols Lists, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel

Hi Wol, Peter,

{ Convention on kernel.org is to reply-to-all, bottom or interleave
replies, and trim unnecessary context.  CC list fixed up accordingly. }

On 06/25/2016 07:43 AM, Wols Lists wrote:

> I know you're getting conflicting advice, but I'd try to get a good dd
> backup first. I don't know of any utility that will do an md integrity
> check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ...

Conflicting advice indeed.  More conflict ahead:

dd is totally useless for raid recovery in all cases.  ddrescue may be
of use in this case:

If there is redundancy available for proper MD rewrite of UREs, you want
to run the original devices with the UREs, so they'll get fixed.  No
need for dd.  If there's no redundancy available, then you have to fix
the UREs without knowing the correct content, and ddrescue will do that
(putting zeroes in the copy).

> Oh - and make sure you new disks are proper raid - eg WD Red or Seagate
> NAS. And are your current disks proper raid? If not, fix the timeout
> problem and your life *may* be made a lot simpler ...

Yes, timeout mismatch is a common problem and absolutely *must* be
addressed if you run a raid array.  Some older posts of mine that help
explain the issue are linked below.

If you'd like advice on the status of your drives, post the output of:

for x in /dev/sd[defg] ; do echo $x ; smartctl -iA -l scterc $x ; done

> Have you got spare SATA ports? If not, go out and get an add-in card! If
> you can force the array to assemble, and create a temporary six-drive
> array (the two dud ones being assembled with the --replace option to
> move them to two new ones) that may be your best bet at recovery. If md
> can get at a clean read from three drives for each block, then it'll be
> able to rebuild the missing block.

No.  The first drive that dropped out did so more than a year ago --
it's content is totally untrustworthy.  It is only suitable for wipe and
re-use if it is physically still OK.

Which means that the the balance of the drives have no redundancy
available to reconstruct data for any UREs remaining in the array.  If
there were, forced assembly of originals after any timeout mismatch
fixes would be the correct solution.  That would let remaining
redundancy fix UREs while adding more redundancy (the #1 reason for
choosing raid6 over raid5).

Peter, I strongly recommend that you perform a forced assembly on the
three drives, omitting the unit kicked out last year.  (After fixing any
timeout issue, if any.  Very likely, btw.)  Mount the filesystem
read-only and backup the absolutely critical items.  Do not use fsck
yet.  You may encounter UREs that causes some of these copies to fail,
letting you know which files to not trust later.  If you encounter
enough failures to drop the array again, simply repeat the forced
assembly and readonly mount and carry on.

When you've gotten all you can that way, shut down the array and use
ddrescue to duplicate all three drives.  Take the originals out of the
box, and force assemble the new drives.  Run fsck to fix any remaining
errors from zeroed blocks, then mount and backup anything else you need.

If you need to keep costs down, it would be fairly low risk to just
ddrescue the most recent failure onto the oldest (which will write over
any UREs it currently has).  Then forced assemble with it instead.

And add a drive to the array to get back to a redundant operation.
Consider adding another drive after that and reshaping to raid6.  If
your drives really are ok (timeout issue, not physical), then you could
re-use one or more of the originals to get back to full operation.  Use
--zero-superblock on them to allow MD to use them again.

Phil

Readings for timeout mismatch:  (whole threads if possible)

http://marc.info/?l=linux-raid&m=139050322510249&w=2
http://marc.info/?l=linux-raid&m=135863964624202&w=2
http://marc.info/?l=linux-raid&m=135811522817345&w=1
http://marc.info/?l=linux-raid&m=133761065622164&w=2
http://marc.info/?l=linux-raid&m=132477199207506
http://marc.info/?l=linux-raid&m=133665797115876&w=2
http://marc.info/?l=linux-raid&m=142487508806844&w=3
http://marc.info/?l=linux-raid&m=144535576302583&w=2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-25 16:49       ` Phil Turmel
@ 2016-06-26 21:12         ` Wols Lists
  2016-06-26 22:18           ` Phil Turmel
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2016-06-26 21:12 UTC (permalink / raw)
  To: Phil Turmel, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel

On 25/06/16 17:49, Phil Turmel wrote:
> Hi Wol, Peter,
> 
> { Convention on kernel.org is to reply-to-all, bottom or interleave
> replies, and trim unnecessary context.  CC list fixed up accordingly. }

Sorry, but the OP had already been trimmed, I trimmed a bit further...
> 
> On 06/25/2016 07:43 AM, Wols Lists wrote:
> 
>> I know you're getting conflicting advice, but I'd try to get a good dd
>> backup first. I don't know of any utility that will do an md integrity
>> check on a ddrescue'd disk :-( so you'd need to do a fsck and hope ...
> 
> Conflicting advice indeed.  More conflict ahead:
> 
> dd is totally useless for raid recovery in all cases.  ddrescue may be
> of use in this case:

And if dd gets a copy without errors, what's the difference between that
and a ddrescue? Surely they're identical?

That said, it struck me you're probably better off using ddrescue,
because ddrescue could get that copy in one. So if you can get it in
one, it doesn't matter which you use, so you should use ddrescue because
it saves a wasted attempt with dd. (I've just read the ddrescue man
page. Recommended reading ... :-)


> 
> Which means that the the balance of the drives have no redundancy
> available to reconstruct data for any UREs remaining in the array.  If
> there were, forced assembly of originals after any timeout mismatch
> fixes would be the correct solution.  That would let remaining
> redundancy fix UREs while adding more redundancy (the #1 reason for
> choosing raid6 over raid5).
> 
> Peter, I strongly recommend that you perform a forced assembly on the
> three drives, omitting the unit kicked out last year.  (After fixing any
> timeout issue, if any.  Very likely, btw.)  Mount the filesystem
> read-only and backup the absolutely critical items.  Do not use fsck
> yet.  You may encounter UREs that causes some of these copies to fail,
> letting you know which files to not trust later.  If you encounter
> enough failures to drop the array again, simply repeat the forced
> assembly and readonly mount and carry on.
> 
> When you've gotten all you can that way, shut down the array and use
> ddrescue to duplicate all three drives.  Take the originals out of the
> box, and force assemble the new drives.  Run fsck to fix any remaining
> errors from zeroed blocks, then mount and backup anything else you need.
> 
> If you need to keep costs down, it would be fairly low risk to just
> ddrescue the most recent failure onto the oldest (which will write over
> any UREs it currently has).  Then forced assemble with it instead.
> 
> And add a drive to the array to get back to a redundant operation.
> Consider adding another drive after that and reshaping to raid6.  If
> your drives really are ok (timeout issue, not physical), then you could
> re-use one or more of the originals to get back to full operation.  Use
> --zero-superblock on them to allow MD to use them again.
> 
Hmm...

Would it be an idea to get 4 by 3TB drives? That way he can do the
backup straight on to a RAID6 array, and IF he gets a successful backup
then the old drives are now redundant, for backups or whatever (3TB Reds
or NASs are about £100 each...)

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-26 21:12         ` Wols Lists
@ 2016-06-26 22:18           ` Phil Turmel
  2016-06-27  9:06             ` Andreas Klauer
  0 siblings, 1 reply; 9+ messages in thread
From: Phil Turmel @ 2016-06-26 22:18 UTC (permalink / raw)
  To: Wols Lists, Peter Gebhard; +Cc: Linux-RAID, Another Sillyname, John Stoffel

On 06/26/2016 05:12 PM, Wols Lists wrote:
> On 25/06/16 17:49, Phil Turmel wrote:

>> dd is totally useless for raid recovery in all cases.  ddrescue may be
>> of use in this case:
> 
> And if dd gets a copy without errors, what's the difference between that
> and a ddrescue? Surely they're identical?
> 
> That said, it struck me you're probably better off using ddrescue,
> because ddrescue could get that copy in one. So if you can get it in
> one, it doesn't matter which you use, so you should use ddrescue because
> it saves a wasted attempt with dd. (I've just read the ddrescue man
> page. Recommended reading ... :-)

dd will only copy a device that has no UREs.  If it has no UREs, it'll
work in the raid array even if there's no redundancy.  So duplicating
that device is pointless for pure recovery purposes.

If you already know you have one or more UREs on a device and no other
redundancy to reconstruct with, you go straight to ddrescue -- you know
that dd won't work, so why bother?  And in this case we know -- the
drive was kicked out of the array.  The OP hasn't provided any further
information, so we don't know if this is the typical timeout mismatch
calamity, or if there's more serious problems, but that doesn't change
the advice.

Finally, if you know you have UREs *and* you still have redundancy
(raid6 or raid10,n3 degraded by one, f.e.), you want to keep the drive
with the UREs in place until replaced so MD can perform reconstruction
when it hits that spot.  Again, no role for dd.

Now, when the problem isn't just an array that needs to be reassembled
but rather a true reconstruction from unknown parameters, the duplicate
devices are needed because the experiments to discover the parameters
can be destructive.  In this case we need complete copies for the
experiments, whether UREs are present or not.  ddrescue again.  (Knowing
there are UREs makes it important to use the original devices during
recreation after the experiments are done.)

I don't know *any* scenario where dd is useful for raid recovery.

> Hmm...
> 
> Would it be an idea to get 4 by 3TB drives? That way he can do the
> backup straight on to a RAID6 array, and IF he gets a successful backup
> then the old drives are now redundant, for backups or whatever (3TB Reds
> or NASs are about £100 each...)

Many hobbyists and small business people have limited budgets.  (Been
there!)  I try to avoid recommending expensive replacements unless the
situation clearly calls for it.  I specifically omitted any suggestion
of where to perform backups since it may make sense for the OP to just
copy important stuff onto an external USB device.

It's entirely possible that the OP is researching timeout mismatch and
patching up some green drives.  It's not ideal, but such an array can
operate with the work-arounds for years if it's regularly scrubbed.  And
it's cheap, which sometimes matters.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-26 22:18           ` Phil Turmel
@ 2016-06-27  9:06             ` Andreas Klauer
  2016-06-27 10:05               ` Mikael Abrahamsson
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Klauer @ 2016-06-27  9:06 UTC (permalink / raw)
  To: Phil Turmel
  Cc: Wols Lists, Peter Gebhard, Linux-RAID, Another Sillyname,
	John Stoffel

On Sun, Jun 26, 2016 at 06:18:04PM -0400, Phil Turmel wrote:
> I don't know *any* scenario where dd is useful for raid recovery.

Basically you use dd to make a copy to play around with but 
with overlays (as described in the RAID wiki) you don't need 
those copies anymore. [as long as the device is intact]

There is conv=noerror,sync however I found recently that it 
actually produces corrupt copies (changes offsets), still not 
sure what the conv=noerror,sync is actually supposed to be for.

( http://superuser.com/a/1075837/195171 )

So you shouldn't use dd on bad disks...

I've had my fair share of issues with ddrescue as well, it likes 
to get stuck forever and not continue. --min-read-rate=<bytes> 
might help in such a case... (better to copy regions that still 
work quickly rather than spending hours on error correction ...).

ddrescue produces a log (or a map) that records bad regions, 
so if you somehow need a device that still produces read errors 
for those, it can probably be magicked with the device mapper...

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Recommendations needed for RAID5 recovery
  2016-06-27  9:06             ` Andreas Klauer
@ 2016-06-27 10:05               ` Mikael Abrahamsson
  0 siblings, 0 replies; 9+ messages in thread
From: Mikael Abrahamsson @ 2016-06-27 10:05 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Linux-RAID

On Mon, 27 Jun 2016, Andreas Klauer wrote:

> So you shouldn't use dd on bad disks...

There is no downside to using ddrescue that I can see. If everything works 
ok, then you get a 0 error copy, and you got the exact same result as with 
dd.

If you did have errors, then ddrescue will try to do the best of the 
situation and try to copy as much as possible from the drive You also get 
a list of the errored sectors that you can use for future reference.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-06-27 10:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 19:55 Recommendations needed for RAID5 recovery Peter Gebhard
2016-06-24 20:44 ` Another Sillyname
2016-06-24 21:37   ` John Stoffel
2016-06-25 11:43     ` Wols Lists
2016-06-25 16:49       ` Phil Turmel
2016-06-26 21:12         ` Wols Lists
2016-06-26 22:18           ` Phil Turmel
2016-06-27  9:06             ` Andreas Klauer
2016-06-27 10:05               ` Mikael Abrahamsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).