* Recover from crash in RAID6 due to hardware failure
@ 2021-06-07 3:07 Carlos Maziero
[not found] ` <4745ddd9-291b-00c7-8678-cac14905c188@att.net>
0 siblings, 1 reply; 5+ messages in thread
From: Carlos Maziero @ 2021-06-07 3:07 UTC (permalink / raw)
To: linux-raid
Hi all,
My Synology NAS (a very old DS508) awaken this morning with a severe
disk failure: 3 out of 5 disks failed at once. It uses RAID 6, thus I'm
not able to gracefully recover the data using the standard Synology
tools. I disassembled the box, cleaned the SATA slots and all electric
plugs/sockets and the disks are being recognized again, but the RAID
volume displays as crashed and its data is not accessible.
The dmesg log shows this:
[ 38.762338] md: md2 stopped.
[ 38.799568] md: bind<sda3>
[ 38.802723] md: bind<sdb3>
[ 38.807089] md: bind<sdc3>
[ 38.811295] md: bind<sdd3>
[ 38.816123] md: bind<sde3>
[ 38.818918] md: kicking non-fresh sdd3 from array!
[ 38.823766] md: unbind<sdd3>
[ 38.826649] md: export_rdev(sdd3)
[ 38.829978] md: kicking non-fresh sdc3 from array!
[ 38.834773] md: unbind<sdc3>
[ 38.837653] md: export_rdev(sdc3)
[ 38.840976] md: kicking non-fresh sdb3 from array!
[ 38.845770] md: unbind<sdb3>
[ 38.848650] md: export_rdev(sdb3)
[ 38.851973] md: kicking non-fresh sda3 from array!
[ 38.856767] md: unbind<sda3>
[ 38.859657] md: export_rdev(sda3)
[ 38.876922] raid5: device sde3 operational as raid disk 4
[ 38.883676] raid5: allocated 5262kB for md2
[ 38.899483] 4: w=1 pa=0 pr=5 m=2 a=2 r=5 op1=0 op2=0
[ 38.904454] raid5: raid level 6 set md2 active with 1 out of 5
devices, algorithm 2
[ 38.912141] RAID5 conf printout:
[ 38.915366] --- rd:5 wd:1
[ 38.918071] disk 4, o:1, dev:sde3
[ 38.921544] md2: detected capacity change from 0 to 8987271954432
[ 38.973111] md2: unknown partition table
Naively, I searched a bit and found that I could add the disks back to
the array, and did this:
mdadm /dev/md2 --fail /dev/sda3 --remove /dev/sda3
mdadm /dev/md2 --add /dev/sda3
mdadm /dev/md2 --fail /dev/sdb3 --remove /dev/sdb3
mdadm /dev/md2 --add /dev/sdb3
mdadm /dev/md2 --fail /dev/sdc3 --remove /dev/sdc3
mdadm /dev/md2 --add /dev/sdc3
mdadm /dev/md2 --fail /dev/sdd3 --remove /dev/sdd3
mdadm /dev/md2 --add /dev/sdd3
However, the disks where added as spares and the volume remained
crashed. Now I'm afraid that such commands have erased metadata and made
things worse... :-(
Is there a way to reconstruct the array and to recover its data, at
least partially? (The most relevant data was backed up, but there are
some TB of movies and music that I would be glad to recover). The NAS
was mostly used for file reading (serving media files to the local network).
Contents of /proc/mdstat (after the commands above):
Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] [raid4]
md2 : active raid6 sda3[0](S) sdb3[1](S) sdc3[2](S) sdd3[3](S) sde3[4]
8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/1]
[____U]
md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4]
2097088 blocks [5/5] [UUUUU]
md0 : active raid1 sda1[1] sdb1[2] sdc1[3] sdd1[0] sde1[4]
2490176 blocks [5/5] [UUUUU]
unused devices: <none>
Output of mdadm --detail /dev/md2 (also after the commands above)
/dev/md2:
Version : 1.2
Creation Time : Sat Sep 5 12:46:57 2020
Raid Level : raid6
Array Size : 8776632768 (8370.05 GiB 8987.27 GB)
Used Dev Size : 2925544256 (2790.02 GiB 2995.76 GB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent
Update Time : Sun Jun 6 22:48:34 2021
State : clean, FAILED
Active Devices : 1
Working Devices : 5
Failed Devices : 0
Spare Devices : 4
Layout : left-symmetric
Chunk Size : 64K
Name : DiskStation:2 (local to host DiskStation)
UUID : dcb32778:c99523d8:2efdeb61:54303019
Events : 78
Number Major Minor RaidDevice State
0 0 0 0 removed
1 0 0 1 removed
2 0 0 2 removed
3 0 0 3 removed
4 8 67 4 active sync /dev/sde3
0 8 3 - spare /dev/hda3
1 8 19 - spare /dev/sdb3
2 8 35 - spare /dev/hdc3
3 8 51 - spare /dev/sdd3
Best regards and many thanks for any help!
Carlos
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <4745ddd9-291b-00c7-8678-cac14905c188@att.net>]
[parent not found: <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>]
* Re: Recover from crash in RAID6 due to hardware failure [not found] ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com> @ 2021-06-15 1:33 ` Leslie Rhorer 2021-06-15 1:36 ` Leslie Rhorer 1 sibling, 0 replies; 5+ messages in thread From: Leslie Rhorer @ 2021-06-15 1:33 UTC (permalink / raw) To: Linux RAID There is a fair chance you can recover the data by recreating the array: mdadm -S /dev/md2 mdadm -C -f -e 1.2 -n 6 -c 64K --level=6 -p left-symmetric /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 On 6/8/2021 6:39 AM, Carlos Maziero wrote: > Em 07/06/2021 07:27, Leslie Rhorer escreveu: >> On 6/6/2021 10:07 PM, Carlos Maziero wrote: >> >>> However, the disks where added as spares and the volume remained >>> crashed. Now I'm afraid that such commands have erased metadata and made >>> things worse... :-( >> >> Yeah. Did you at any time Examine the drives and save the output? >> >> mdadm -E /dev/sd[a-e]3 >> >> If so, you have a little bit better chance. > > Yes, but I did it only after the failure. The output for all disks is > attached to this message. > > >>> Is there a way to reconstruct the array and to recover its data, at >>> least partially? >> >> Maybe. Do you know eaxctly which physical disk was in which RAID >> position? It seems likely the grouping was the same for the corrupted >> array as for the other arrays, given the drives are partitioned. > > Yes, disk sda was in slot 1, and so on. I physically labelled all slots > and disks. > > >> >> First off, try: >> >> mdadm -E /dev/sde3 > /etc/mdadm/RAIDfix >> >> This should give you the details of the RAID array. From this, >> you should be able to re-create the array. I would heartily recommend >> getting some new drives and copying the data to them before >> proceeding. I would get a 12T drive and copy all of the partitions to >> it: >> >> mkfs /dev/sdf (or mkfs /dev/sdf1) >> mount /dev/sdf /mnt (or mount /dev/sdf1 /mnt) >> ddrescue /dev/sda3 /mnt/drivea /tmp/tmpdrivea >> ddrescue /dev/sdb3 /mnt/driveb /tmp/tmpdriveb >> ddrescue /dev/sdc3 /mnt/drivec /tmp/tmpdrivec >> ddrescue /dev/sdd3 /mnt/drived /tmp/tmpdrived >> ddrescue /dev/sde3 /mnt/drivee /tmp/tmpdrivee >> >> You could skimp by getting an 8T drive, and then if drive e >> doesn't fit, you could create the array without it, and you will be >> pretty safe. It's not what I would do, but if you are strapped for >> cash... > > OK, I will try to have a secondary disk for that and another computer, > since the NAS has only 5 bays and I would need one more for doing such > operations. > > >>> Contents of /proc/mdstat (after the commands above): >>> >>> Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] >>> [raid4] >>> md2 : active raid6 sda3[0](S) sdb3[1](S) sdc3[2](S) sdd3[3](S) sde3[4] >>> 8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/1] >>> [____U] >>> md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4] >>> 2097088 blocks [5/5] [UUUUU] >>> md0 : active raid1 sda1[1] sdb1[2] sdc1[3] sdd1[0] sde1[4] >>> 2490176 blocks [5/5] [UUUUU] >> >> There is something odd here. You say the disks failed, but >> clearly they are in decent shape. The first and second partitions on >> all drives appear to be good. Did the system recover the RAID1 arrays? > > Apparently the failure was not in the disks, but in the NAS hardware. I > opened it one week ago for RAM upgrading (replaced the old 512M card by > a 1GB one), and maybe the slot connecting the main board to the SATA > board presented a connectivity problem (but the NAS OS said nothing > about). Anyway, I had 5 disks in a RAID 6 array and the logs showed 3 > disks failing at the same time, which is quite unusual. This is the > reason I believe the disks are physically ok. > > Thanks for your attention! > > Carlos > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from crash in RAID6 due to hardware failure [not found] ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com> 2021-06-15 1:33 ` Leslie Rhorer @ 2021-06-15 1:36 ` Leslie Rhorer 2021-06-15 7:46 ` Roman Mamedov 2021-06-15 11:28 ` Carlos Maziero 1 sibling, 2 replies; 5+ messages in thread From: Leslie Rhorer @ 2021-06-15 1:36 UTC (permalink / raw) To: Linux RAID Oops! 'Sorry. That should be: mdadm -S /dev/md2 mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 You only have five disks, not six. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from crash in RAID6 due to hardware failure 2021-06-15 1:36 ` Leslie Rhorer @ 2021-06-15 7:46 ` Roman Mamedov 2021-06-15 11:28 ` Carlos Maziero 1 sibling, 0 replies; 5+ messages in thread From: Roman Mamedov @ 2021-06-15 7:46 UTC (permalink / raw) To: Leslie Rhorer; +Cc: Linux RAID On Mon, 14 Jun 2021 20:36:22 -0500 Leslie Rhorer <lesrhorer@att.net> wrote: > Oops! 'Sorry. That should be: > > mdadm -S /dev/md2 > mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2 > /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 > > > You only have five disks, not six. No --assume-clean? -- With respect, Roman ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from crash in RAID6 due to hardware failure 2021-06-15 1:36 ` Leslie Rhorer 2021-06-15 7:46 ` Roman Mamedov @ 2021-06-15 11:28 ` Carlos Maziero 1 sibling, 0 replies; 5+ messages in thread From: Carlos Maziero @ 2021-06-15 11:28 UTC (permalink / raw) To: Leslie Rhorer, Linux RAID Hi Leslie, thanks for your suggestion! I succeeded to do it, although the path was a bit longer: a) remove logical volume (synology creates one, it prevents stopping the array): # ll /dev/mapper/ crw------- 1 root root 10, 59 Sep 5 2020 control brw------- 1 root root 253, 0 Jun 10 11:57 vol1-origin # dmsetup remove vol1-origin b) stop the array: # mdadm --stop /dev/md2 mdadm: stopped /dev/md2 c) recreate the array with the original layout: # mdadm --verbose --create /dev/md2 --chunk=64 --level=6 --raid-devices=5 --metadata=1.2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 mdadm: layout defaults to left-symmetric mdadm: layout defaults to left-symmetric mdadm: layout defaults to left-symmetric mdadm: /dev/sda3 appears to be part of a raid array: level=raid6 devices=5 ctime=Sat Sep 5 12:46:57 2020 mdadm: layout defaults to left-symmetric mdadm: /dev/sdb3 appears to be part of a raid array: level=raid6 devices=5 ctime=Sat Sep 5 12:46:57 2020 mdadm: layout defaults to left-symmetric mdadm: /dev/sdc3 appears to be part of a raid array: level=raid6 devices=5 ctime=Sat Sep 5 12:46:57 2020 mdadm: layout defaults to left-symmetric mdadm: /dev/sdd3 appears to be part of a raid array: level=raid6 devices=5 ctime=Sat Sep 5 12:46:57 2020 mdadm: layout defaults to left-symmetric mdadm: /dev/sde3 appears to be part of a raid array: level=raid6 devices=5 ctime=Sat Sep 5 12:46:57 2020 mdadm: size set to 2925544256K Continue creating array? yes mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md2 started. d) checking it: # cat /proc/mdstat Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5] [raid4] md2 : active raid6 sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0] 8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU] [=>...................] resync = 6.8% (199953972/2925544256) finish=2440.4min speed=18613K/sec md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4] 2097088 blocks [5/5] [UUUUU] md0 : active raid1 sdc1[3] sdd1[0] sde1[4] 2490176 blocks [5/3] [U__UU] unused devices: <none> After that, I fsck'ed and mounted it read-only, and now I'm happy recovering my data... :-) Thanks again! Carlos Em 14/06/2021 22:36, Leslie Rhorer escreveu: > Oops! 'Sorry. That should be: > > mdadm -S /dev/md2 > mdadm -C -f -e 1.2 -n 5 -c 64K --level=6 -p left-symmetric /dev/md2 > /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 > > > You only have five disks, not six. > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-15 11:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-07 3:07 Recover from crash in RAID6 due to hardware failure Carlos Maziero
[not found] ` <4745ddd9-291b-00c7-8678-cac14905c188@att.net>
[not found] ` <ed21aa89-e6a1-651d-cc23-9f4c72cf63e0@gmail.com>
2021-06-15 1:33 ` Leslie Rhorer
2021-06-15 1:36 ` Leslie Rhorer
2021-06-15 7:46 ` Roman Mamedov
2021-06-15 11:28 ` Carlos Maziero
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox