Recover from crash in RAID6 due to hardware failure

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* Recover from crash in RAID6 due to hardware failure
@ 2021-06-07  3:07 Carlos Maziero
       [not found] ` <4745ddd9-291b-00c7-8678-cac14905c188@att.net>
  0 siblings, 1 reply; 5+ messages in thread
From: Carlos Maziero @ 2021-06-07  3:07 UTC (permalink / raw)
  To: linux-raid

Hi all,

My Synology NAS (a very old DS508) awaken this morning with a severe
disk failure: 3 out of 5 disks failed at once. It uses RAID 6, thus I'm
not able to gracefully recover the data using the standard Synology
tools. I disassembled the box, cleaned the SATA slots and all electric
plugs/sockets and the disks are being recognized again, but the RAID
volume displays as crashed and its data is not accessible.

The dmesg log shows this:

[   38.762338] md: md2 stopped.
[   38.799568] md: bind<sda3>
[   38.802723] md: bind<sdb3>
[   38.807089] md: bind<sdc3>
[   38.811295] md: bind<sdd3>
[   38.816123] md: bind<sde3>
[   38.818918] md: kicking non-fresh sdd3 from array!
[   38.823766] md: unbind<sdd3>
[   38.826649] md: export_rdev(sdd3)
[   38.829978] md: kicking non-fresh sdc3 from array!
[   38.834773] md: unbind<sdc3>
[   38.837653] md: export_rdev(sdc3)
[   38.840976] md: kicking non-fresh sdb3 from array!
[   38.845770] md: unbind<sdb3>
[   38.848650] md: export_rdev(sdb3)
[   38.851973] md: kicking non-fresh sda3 from array!
[   38.856767] md: unbind<sda3>
[   38.859657] md: export_rdev(sda3)
[   38.876922] raid5: device sde3 operational as raid disk 4
[   38.883676] raid5: allocated 5262kB for md2
[   38.899483] 4: w=1 pa=0 pr=5 m=2 a=2 r=5 op1=0 op2=0
[   38.904454] raid5: raid level 6 set md2 active with 1 out of 5
devices, algorithm 2
[   38.912141] RAID5 conf printout:
[   38.915366]  --- rd:5 wd:1
[   38.918071]  disk 4, o:1, dev:sde3
[   38.921544] md2: detected capacity change from 0 to 8987271954432
[   38.973111]  md2: unknown partition table

Naively, I searched a bit and found that I could add the disks back to
the array, and did this:

    mdadm /dev/md2 --fail /dev/sda3 --remove /dev/sda3
    mdadm /dev/md2 --add /dev/sda3
    mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
    mdadm /dev/md2 --add /dev/sdb3
    mdadm /dev/md2 --fail /dev/sdc3 --remove /dev/sdc3
    mdadm /dev/md2 --add /dev/sdc3
    mdadm /dev/md2 --fail /dev/sdd3 --remove /dev/sdd3
    mdadm /dev/md2 --add /dev/sdd3

However, the disks where added as spares and the volume remained
crashed. Now I'm afraid that such commands have erased metadata and made
things worse... :-(

Is there a way to reconstruct the array and to recover its data, at
least partially? (The most relevant data was backed up, but there are
some TB of movies and music that I would be glad to recover). The NAS
was mostly used for file reading (serving media files to the local network).

Contents of /proc/mdstat (after the commands above):

Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] [raid4]
md2 : active raid6 sda3[0](S) sdb3[1](S) sdc3[2](S) sdd3[3](S) sde3[4]
      8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/1]
[____U]
     
md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4]
      2097088 blocks [5/5] [UUUUU]
     
md0 : active raid1 sda1[1] sdb1[2] sdc1[3] sdd1[0] sde1[4]
      2490176 blocks [5/5] [UUUUU]
     
unused devices: <none>

Output of mdadm --detail /dev/md2 (also after the commands above)

/dev/md2:
        Version : 1.2
  Creation Time : Sat Sep  5 12:46:57 2020
     Raid Level : raid6
     Array Size : 8776632768 (8370.05 GiB 8987.27 GB)
  Used Dev Size : 2925544256 (2790.02 GiB 2995.76 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Sun Jun  6 22:48:34 2021
          State : clean, FAILED
 Active Devices : 1
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 4

         Layout : left-symmetric
     Chunk Size : 64K

           Name : DiskStation:2  (local to host DiskStation)
           UUID : dcb32778:c99523d8:2efdeb61:54303019
         Events : 78

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       0        0        2      removed
       3       0        0        3      removed
       4       8       67        4      active sync   /dev/sde3

       0       8        3        -      spare   /dev/hda3
       1       8       19        -      spare   /dev/sdb3
       2       8       35        -      spare   /dev/hdc3
       3       8       51        -      spare   /dev/sdd3

Best regards and many thanks for any help!

Carlos


^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <4745ddd9-291b-00c7-8678-cac14905c188@att.net>]

[parent not found: <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>]

* Re: Recover from crash in RAID6 due to hardware failure
       [not found]   ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>
@ 2021-06-15  1:33     ` Leslie Rhorer
  2021-06-15  1:36     ` Leslie Rhorer
  1 sibling, 0 replies; 5+ messages in thread
From: Leslie Rhorer @ 2021-06-15  1:33 UTC (permalink / raw)
  To: Linux RAID

	There is a fair chance you can recover the data by recreating the array:

mdadm -S /dev/md2
mdadm -C -f -e 1.2 -n 6 -c 64K --level=6 -p left-symmetric /dev/md2 
/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3

On 6/8/2021 6:39 AM, Carlos Maziero wrote:
> Em 07/06/2021 07:27, Leslie Rhorer escreveu:
>> On 6/6/2021 10:07 PM, Carlos Maziero wrote:
>>
>>> However, the disks where added as spares and the volume remained
>>> crashed. Now I'm afraid that such commands have erased metadata and made
>>> things worse... :-(
>>
>>      Yeah.  Did you at any time Examine the drives and save the output?
>>
>> mdadm -E /dev/sd[a-e]3
>>
>>      If so, you have a little bit better chance.
> 
> Yes, but I did it only after the failure. The output for all disks is
> attached to this message.
> 
> 
>>> Is there a way to reconstruct the array and to recover its data, at
>>> least partially?
>>
>>      Maybe.  Do you know eaxctly which physical disk was in which RAID
>> position?  It seems likely the grouping was the same for the corrupted
>> array as for the other arrays, given the drives are partitioned.
> 
> Yes, disk sda was in slot 1, and so on. I physically labelled all slots
> and disks.
> 
> 
>>
>>      First off, try:
>>
>> mdadm -E /dev/sde3 > /etc/mdadm/RAIDfix
>>
>>      This should give you the details of the RAID array.  From this,
>> you should be able to re-create the array.  I would heartily recommend
>> getting some new drives and copying the data to them before
>> proceeding.  I would get a 12T drive and copy all of the partitions to
>> it:
>>
>> mkfs /dev/sdf  (or mkfs /dev/sdf1)
>> mount /dev/sdf /mnt (or mount /dev/sdf1 /mnt)
>> ddrescue /dev/sda3 /mnt/drivea /tmp/tmpdrivea
>> ddrescue /dev/sdb3 /mnt/driveb /tmp/tmpdriveb
>> ddrescue /dev/sdc3 /mnt/drivec /tmp/tmpdrivec
>> ddrescue /dev/sdd3 /mnt/drived /tmp/tmpdrived
>> ddrescue /dev/sde3 /mnt/drivee /tmp/tmpdrivee
>>
>>      You could skimp by getting an 8T drive, and then if drive e
>> doesn't fit, you could create the array without it, and you will be
>> pretty safe.  It's not what I would do, but if you are strapped for
>> cash...
> 
> OK, I will try to have a secondary disk for that and another computer,
> since the NAS has only 5 bays and I would need one more for doing such
> operations.
> 
> 
>>> Contents of /proc/mdstat (after the commands above):
>>>
>>> Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5]
>>> [raid4]
>>> md2 : active raid6 sda3[0](S) sdb3[1](S) sdc3[2](S) sdd3[3](S) sde3[4]
>>>         8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/1]
>>> [____U]
>>>        md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4]
>>>         2097088 blocks [5/5] [UUUUU]
>>>        md0 : active raid1 sda1[1] sdb1[2] sdc1[3] sdd1[0] sde1[4]
>>>         2490176 blocks [5/5] [UUUUU]
>>
>>      There is something odd here.  You say the disks failed, but
>> clearly they are in decent shape.  The first and second partitions on
>> all drives appear to be good.  Did the system recover the RAID1 arrays?
> 
> Apparently the failure was not in the disks, but in the NAS hardware. I
> opened it one week ago for RAM upgrading (replaced the old 512M card by
> a 1GB one), and maybe the slot connecting the main board to the SATA
> board presented a connectivity problem (but the NAS OS said nothing
> about). Anyway, I had 5 disks in a RAID 6 array and the logs showed 3
> disks failing at the same time, which is quite unusual. This is the
> reason I believe the disks are physically ok.
> 
> Thanks for your attention!
> 
> Carlos
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Recover from crash in RAID6 due to hardware failure
       [not found]   ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>
  2021-06-15  1:33     ` Leslie Rhorer
@ 2021-06-15  1:36     ` Leslie Rhorer
  2021-06-15  7:46       ` Roman Mamedov
  2021-06-15 11:28       ` Carlos Maziero
  1 sibling, 2 replies; 5+ messages in thread
From: Leslie Rhorer @ 2021-06-15  1:36 UTC (permalink / raw)
  To: Linux RAID

Oops!  'Sorry.  That should be:

mdadm -S /dev/md2
mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2 
/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3


	You only have five disks, not six.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Recover from crash in RAID6 due to hardware failure
  2021-06-15  1:36     ` Leslie Rhorer
@ 2021-06-15  7:46       ` Roman Mamedov
  2021-06-15 11:28       ` Carlos Maziero
  1 sibling, 0 replies; 5+ messages in thread
From: Roman Mamedov @ 2021-06-15  7:46 UTC (permalink / raw)
  To: Leslie Rhorer; +Cc: Linux RAID

On Mon, 14 Jun 2021 20:36:22 -0500
Leslie Rhorer <lesrhorer@att.net> wrote:

> Oops!  'Sorry.  That should be:
> 
> mdadm -S /dev/md2
> mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2 
> /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3
> 
> 
> 	You only have five disks, not six.

No --assume-clean?

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Recover from crash in RAID6 due to hardware failure
  2021-06-15  1:36     ` Leslie Rhorer
  2021-06-15  7:46       ` Roman Mamedov
@ 2021-06-15 11:28       ` Carlos Maziero
  1 sibling, 0 replies; 5+ messages in thread
From: Carlos Maziero @ 2021-06-15 11:28 UTC (permalink / raw)
  To: Leslie Rhorer, Linux RAID

Hi Leslie,

thanks for your suggestion! I succeeded to do it, although the path was
a bit longer:

a) remove logical volume (synology creates one, it prevents stopping the
array):

# ll /dev/mapper/
crw-------    1 root     root       10,  59 Sep  5  2020 control
brw-------    1 root     root      253,   0 Jun 10 11:57 vol1-origin

# dmsetup remove vol1-origin

b) stop the array:

# mdadm --stop /dev/md2
mdadm: stopped /dev/md2

c) recreate the array with the original layout:

# mdadm --verbose --create /dev/md2 --chunk=64 --level=6
--raid-devices=5 --metadata=1.2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
/dev/sde3

mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda3 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sat Sep  5 12:46:57 2020
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb3 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sat Sep  5 12:46:57 2020
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdc3 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sat Sep  5 12:46:57 2020
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdd3 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sat Sep  5 12:46:57 2020
mdadm: layout defaults to left-symmetric
mdadm: /dev/sde3 appears to be part of a raid array:
    level=raid6 devices=5 ctime=Sat Sep  5 12:46:57 2020
mdadm: size set to 2925544256K
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md2 started.

d) checking it:

# cat /proc/mdstat
Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] [raid4]
md2 : active raid6 sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
      8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/5]
[UUUUU]
      [=>...................]  resync =  6.8% (199953972/2925544256)
finish=2440.4min speed=18613K/sec

md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4]
      2097088 blocks [5/5] [UUUUU]

md0 : active raid1 sdc1[3] sdd1[0] sde1[4]
      2490176 blocks [5/3] [U__UU]

unused devices: <none>

After that, I fsck'ed and mounted it read-only, and now I'm happy
recovering my data... :-)

Thanks again!

Carlos

Em 14/06/2021 22:36, Leslie Rhorer escreveu:

> Oops!  'Sorry.  That should be:
>
> mdadm -S /dev/md2
> mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2
> /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3
>
>
>     You only have five disks, not six.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-15 11:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-07  3:07 Recover from crash in RAID6 due to hardware failure Carlos Maziero
     [not found] ` <4745ddd9-291b-00c7-8678-cac14905c188@att.net>
     [not found]   ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>
2021-06-15  1:33     ` Leslie Rhorer
2021-06-15  1:36     ` Leslie Rhorer
2021-06-15  7:46       ` Roman Mamedov
2021-06-15 11:28       ` Carlos Maziero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox