Need help Recover raid5 array

All of lore.kernel.org
 help / color / mirror / Atom feed

* Need help Recover raid5 array
@ 2021-12-19  4:31 Tony Bush
  2021-12-19 10:13 ` Wols Lists
  2021-12-19 11:58 ` Andreas Klauer
  0 siblings, 2 replies; 8+ messages in thread
From: Tony Bush @ 2021-12-19  4:31 UTC (permalink / raw)
  To: linux-raid

I have a small ubuntu server that I was upgrading the hardware to and
in the process lost my raid.  I changed the CPU, MOBO, RAM.  I added a
new-to-this-system SSD also to replace the current SSD(in a future
step).  I forgot that this new-to-this-system SSD had Windows 10 OS on
it and I believe it tried to boot while I was working on hooking up my
monitor.  So I think that it saw my raid drives and tried to fdisk
them.  I did mdadm directly to drive and not to a partition(big
mistake I know now).  So I think the drives were seen as corrupted and
fdisk corrected the formatting.  I lost my super blocks on 4 of 5
drives.  These are shucked external 10TB drives and one even shows up
with 'my drive' partition label and 2 files that came with those
drives.  I want to recover my raid and files but don't want to make it
worse.  I have not mounted the drives as writable.  I think the damage
should be limited, but I don't know mdadm well.  I have been digging
for a few days on options and most advice is generic and bad and I
feel would make it worse.  I don't know the original order the drives
were in.

1 drive is fully intact, probably due to a BIOS sata config not
enabling all drives when i first booted.

The size makes this impractical to dd onto new disks.  The drives were
99% full and I was about to add 2 new drives.  Now if i can recover
this, i will be starting a new array correctly and transfering files
to that.

To fix, I have been leaning toward making the drives ready only and
using an overlay file. Like here:
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
But i dont understand all the commands well enough to work this for my
situation.  Seems like since I don't know the original drive
arrangement that may be adding an additional level of complexity.  If
I can figure out the read only and overlay, I still don't know exactly
the right way to proceed on the mdadm front.  Please anyone who has a
handle on a situation like this, let me know what I should do.  Thanks

**Original command history for array:
sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
/dev/sdc /dev/sdd /dev/sde
cat /proc/mdstat
sudo mkfs.ext4 -F /dev/md0
sudo mkdir -p /media/raid
sudo mount /dev/md0 /media/raid
df -h -x devtmpfs -x tmpfs
cat /proc/mdstat
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
sudo update-initramfs -u

sudo  umount /dev/md0
sudo  umount /dev/md0 -f
sudo  umount /dev/md0
sudo  fsck.ext4 -f /dev/md0
sudo  fsck.ext4
sudo  fsck.ext4 -f /dev/md0 -p
sudo  fsck.ext4 -f /dev/md0 -p -y
sudo  fsck.ext4 -f /dev/md0 -y
sudo  resize2fs /dev/md0
sudo fdisk -l
sudo parted -a optimal /dev/sdf
sudo -i mdadm --add /dev/md0 /dev/sdf
watch cat /proc/mdstat
sudo mdadm --grow /dev/md0 --raid-devices=4
sudo thunar
watch cat /proc/mdstat
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf

cat /proc/mdstat
sudo mount /dev/md0 /media/raid
sudo mdadm --assemble --scan
sudo mount /dev/md0 /media/raid

sudo fdisk -l
sudo parted -s -a optimal /dev/sdb mklabel gpt
parted /dev/sdb
sudo parted /dev/sdb
sudo mdadm --add /dev/md0 /dev/sdb1
sudo mdadm --add /dev/md0 /dev/sdb
cat /proc/mdstat
mdadm --grow --raid-devices=4 /dev/md0
sudo mdadm --grow --raid-devices=4 /dev/md0
sudo mdadm --grow --raid-devices=5 /dev/md0
cat /proc/mdstat
sudo e2fsck -f /dev/md0
cat /proc/mdstat
sudo resize2fs /dev/md0
cat /proc/mdstat
sudo e2fsck -f /dev/md0
sudo resize2fs /dev/md0

**Here are some current details:
uname -a
Linux server 5.11.0-40-generic #44-Ubuntu SMP Wed Oct 20 16:16:42 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux

mdadm --version
mdadm - v4.1 - 2018-10-01

**
sudo smartctl -H -i -l scterc /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar He10/12
Device Model:     WDC WD100EZAZ-11TDBA0
Serial Number:    1EK7U77Z
LU WWN Device Id: 5 000cca 27eedd3d5
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 30 00:07:28 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

**
sudo smartctl -H -i -l scterc /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar He10/12
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    JEHXKMMM
LU WWN Device Id: 5 000cca 267db1416
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 30 00:08:34 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

**
sudo smartctl -H -i -l scterc /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar He10/12
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    2YHVAJ8D
LU WWN Device Id: 5 000cca 273da10a9
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 30 00:11:29 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

**
sudo smartctl -H -i -l scterc /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar He10/12
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    2YHVABZD
LU WWN Device Id: 5 000cca 273da1024
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 30 00:11:58 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

**
sudo smartctl -H -i -l scterc /dev/sde
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar He10/12
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    2YHV9GVD
LU WWN Device Id: 5 000cca 273da0cbc
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 30 00:12:53 2021 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

***************
sudo mdadm --examine /dev/sda
/dev/sda:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

sudo mdadm --examine /dev/sda1
/dev/sda1:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at   4294967295 (type ff)
Partition[1] :   4294967295 sectors at   4294967295 (type ff)
Partition[2] :   4294967295 sectors at   4294967295 (type ff)
Partition[3] :    740229375 sectors at   4294967295 (type ff)

sudo mdadm --examine /dev/sdb
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

sudo mdadm --examine /dev/sdb1
mdadm: cannot open /dev/sdb1: No such file or directory

sudo mdadm --examine /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

sudo mdadm --examine /dev/sdc1
mdadm: cannot open /dev/sdc1: No such file or directory

sudo mdadm --examine /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
           Name : bushserver:0  (local to host bushserver)
  Creation Time : Fri Nov 16 13:20:25 2018
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 19532616704 (9313.88 GiB 10000.70 GB)
     Array Size : 39065219072 (37255.50 GiB 40002.78 GB)
  Used Dev Size : 19532609536 (9313.87 GiB 10000.70 GB)
    Data Offset : 257024 sectors
   Super Offset : 8 sectors
   Unused Space : before=256944 sectors, after=7168 sectors
          State : clean
    Device UUID : 2abcf2dc:f786e3fd:d22b7da9:7e8eec53

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Nov 28 15:27:11 2021
  Bad Block Log : 512 entries available at offset 48 sectors
       Checksum : e27debbf - correct
         Events : 213198

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
$ sudo mdadm --examine /dev/sdd1
mdadm: cannot open /dev/sdd1: No such file or directory

sudo mdadm --examine /dev/sde
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sde1
mdadm: cannot open /dev/sde1: No such file or directory

****************************************************
sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 1
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 1

              Name : bushserver:0  (local to host bushserver)
              UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
            Events : 213198

    Number   Major   Minor   RaidDevice

       -       8       48        -        /dev/sdd
*******************************************************

./lsdrv
**Warning** The following utility(ies) failed to execute:
  sginfo
  pvs
  lvs
Some information may be missing.

PCI [nvme] 41:00.0 Non-Volatile memory controller: Phison Electronics
Corporation E12 NVMe Controller (rev 01)
└nvme nvme0 Force MP510                              {211182930001291838A6}
 └nvme0n1 447.13g [259:0] Empty/Unknown
  ├nvme0n1p1 431.03g [259:1] Empty/Unknown
  │└Mounted as /dev/nvme0n1p1 @ /
  ├nvme0n1p2 1.00k [259:2] Empty/Unknown
  └nvme0n1p5 15.87g [259:3] Empty/Unknown
PCI [ahci] 00:17.0 SATA controller: Intel Corporation
Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI
Mode] (rev 31)
├scsi 0:0:0:0 ATA      WDC WD100EZAZ-11
│└sda 9.10t [8:0] Empty/Unknown
│ └sda1 9.10t [8:1] Empty/Unknown
├scsi 1:0:0:0 ATA      WDC WD100EMAZ-00
│└sdb 9.10t [8:16] Empty/Unknown
├scsi 3:0:0:0 ATA      WDC WD100EMAZ-00
│└sdc 9.10t [8:32] Empty/Unknown
├scsi 4:0:0:0 ATA      WDC WD100EMAZ-00
│└sdd 9.10t [8:48] Empty/Unknown
│ └md0 0.00k [9:0] MD v1.2  () inactive, None (None) None
{00000000:-0000-00:00-0000-:000000000000}
│                  Empty/Unknown
└scsi 5:0:0:0 ATA      WDC WD100EMAZ-00
 └sde 9.10t [8:64] Empty/Unknown
PCI [ahci] 04:00.0 SATA controller: ASMedia Technology Inc. ASM1062
Serial ATA Controller (rev 02)
└scsi 6:x:x:x [Empty]
USB [usb-storage] Bus 001 Device 004: ID 1d6b:0104 Linux Foundation
Multifunction Composite Gadget {CAFEBABE}
└scsi 8:0:0:0 Linux    File-CD Gadget
 └sr0 1.00g [11:0] Empty/Unknown
Other Block Devices
├loop0 4.00k [7:0] Empty/Unknown
│└Mounted as /dev/loop0 @ /snap/bare/5
├loop1 144.60m [7:1] Empty/Unknown
│└Mounted as /dev/loop1 @ /snap/chromium/1810
├loop2 99.44m [7:2] Empty/Unknown
│└Mounted as /dev/loop2 @ /snap/core/11798
├loop3 99.44m [7:3] Empty/Unknown
│└Mounted as /dev/loop3 @ /snap/core/11993
├loop4 147.80m [7:4] Empty/Unknown
│└Mounted as /dev/loop4 @ /snap/chromium/1827
├loop5 55.49m [7:5] Empty/Unknown
│└Mounted as /dev/loop5 @ /snap/core18/2253
├loop6 55.50m [7:6] Empty/Unknown
│└Mounted as /dev/loop6 @ /snap/core18/2246
├loop7 65.21m [7:7] Empty/Unknown
│└Mounted as /dev/loop7 @ /snap/gtk-common-themes/1519
├loop8 164.76m [7:8] Empty/Unknown
│└Mounted as /dev/loop8 @ /snap/gnome-3-28-1804/161
├loop9 65.10m [7:9] Empty/Unknown
│└Mounted as /dev/loop9 @ /snap/gtk-common-themes/1515
├loop10 162.87m [7:10] Empty/Unknown
│└Mounted as /dev/loop10 @ /snap/gnome-3-28-1804/145
├loop11 0.00k [7:11] Empty/Unknown
├zram0 1.96g [252:0] Empty/Unknown
├zram1 1.96g [252:1] Empty/Unknown
├zram2 1.96g [252:2] Empty/Unknown
├zram3 1.96g [252:3] Empty/Unknown
├zram4 1.96g [252:4] Empty/Unknown
├zram5 1.96g [252:5] Empty/Unknown
├zram6 1.96g [252:6] Empty/Unknown
└zram7 1.96g [252:7] Empty/Unknown

***********************************
at /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd[1](S)
      9766308352 blocks super 1.2

unused devices: <none>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-19  4:31 Need help Recover raid5 array Tony Bush
@ 2021-12-19 10:13 ` Wols Lists
  2021-12-20  0:25   ` Phil Turmel
  2021-12-19 11:58 ` Andreas Klauer
  1 sibling, 1 reply; 8+ messages in thread
From: Wols Lists @ 2021-12-19 10:13 UTC (permalink / raw)
  To: Tony Bush, linux-raid; +Cc: Phil Turmel, NeilBrown

Good report, thanks! :-)

Looks promising. I'm not going to advise much here, but I've punted this 
to two people I know will be able to help. Of course, it's Christmas, 
they might not be around for a bit so you may have to wait.

This is a situation we've recovered a fair few raids from. The 
superblock may well still be there, just not accessible because of the 
new gpt/mbr.

First things first. You're talking about getting new drives? Can you get 
THREE? Back up three of your four corrupted drives. That way, you've got 
four corrupted drives you can attempt to recover, and four backups as 
insurance. That'll take a long time, the sooner the better.

Then move all your raid drives safely out the way, build your new 
working system, and play with loopbacks as per the recovery 
instructions. You understand what it's trying to do? Put a read/write 
layer over the underlying read-only disk? Search the list archives for 
people who've done this, it's not difficult although I'm with you on it 
feeling scary and overwhelming - if you're in a "I need this to work" 
situation it's a bit frightening. Once you're happy it's working, you 
can put your four damaged disks back in and recreate the array.

The one bad bit of news is we don't know what the GPT/MBR has stomped 
over. If it's stomped over the working array, then you've probably got 
some data loss - hopefully not much.

(NB don't blame Windows. It may well have been, but the frightening 
thing is it could have been the new mobo! or Ubuntu. Both have been 
implicated in previous incidents.)

I'll bow out now,
Cheers,
Wol

On 19/12/2021 04:31, Tony Bush wrote:
> I have a small ubuntu server that I was upgrading the hardware to and
> in the process lost my raid.  I changed the CPU, MOBO, RAM.  I added a
> new-to-this-system SSD also to replace the current SSD(in a future
> step).  I forgot that this new-to-this-system SSD had Windows 10 OS on
> it and I believe it tried to boot while I was working on hooking up my
> monitor.  So I think that it saw my raid drives and tried to fdisk
> them.  I did mdadm directly to drive and not to a partition(big
> mistake I know now).  So I think the drives were seen as corrupted and
> fdisk corrected the formatting.  I lost my super blocks on 4 of 5
> drives.  These are shucked external 10TB drives and one even shows up
> with 'my drive' partition label and 2 files that came with those
> drives.  I want to recover my raid and files but don't want to make it
> worse.  I have not mounted the drives as writable.  I think the damage
> should be limited, but I don't know mdadm well.  I have been digging
> for a few days on options and most advice is generic and bad and I
> feel would make it worse.  I don't know the original order the drives
> were in.
> 
> 1 drive is fully intact, probably due to a BIOS sata config not
> enabling all drives when i first booted.
> 
> The size makes this impractical to dd onto new disks.  The drives were
> 99% full and I was about to add 2 new drives.  Now if i can recover
> this, i will be starting a new array correctly and transfering files
> to that.
> 
> To fix, I have been leaning toward making the drives ready only and
> using an overlay file. Like here:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
> But i dont understand all the commands well enough to work this for my
> situation.  Seems like since I don't know the original drive
> arrangement that may be adding an additional level of complexity.  If
> I can figure out the read only and overlay, I still don't know exactly
> the right way to proceed on the mdadm front.  Please anyone who has a
> handle on a situation like this, let me know what I should do.  Thanks
> 
> 
> **Original command history for array:
> sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
> /dev/sdc /dev/sdd /dev/sde
> cat /proc/mdstat
> sudo mkfs.ext4 -F /dev/md0
> sudo mkdir -p /media/raid
> sudo mount /dev/md0 /media/raid
> df -h -x devtmpfs -x tmpfs
> cat /proc/mdstat
> sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
> sudo update-initramfs -u
> 
> sudo  umount /dev/md0
> sudo  umount /dev/md0 -f
> sudo  umount /dev/md0
> sudo  fsck.ext4 -f /dev/md0
> sudo  fsck.ext4
> sudo  fsck.ext4 -f /dev/md0 -p
> sudo  fsck.ext4 -f /dev/md0 -p -y
> sudo  fsck.ext4 -f /dev/md0 -y
> sudo  resize2fs /dev/md0
> sudo fdisk -l
> sudo parted -a optimal /dev/sdf
> sudo -i mdadm --add /dev/md0 /dev/sdf
> watch cat /proc/mdstat
> sudo mdadm --grow /dev/md0 --raid-devices=4
> sudo thunar
> watch cat /proc/mdstat
> sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
> 
> cat /proc/mdstat
> sudo mount /dev/md0 /media/raid
> sudo mdadm --assemble --scan
> sudo mount /dev/md0 /media/raid
> 
> sudo fdisk -l
> sudo parted -s -a optimal /dev/sdb mklabel gpt
> parted /dev/sdb
> sudo parted /dev/sdb
> sudo mdadm --add /dev/md0 /dev/sdb1
> sudo mdadm --add /dev/md0 /dev/sdb
> cat /proc/mdstat
> mdadm --grow --raid-devices=4 /dev/md0
> sudo mdadm --grow --raid-devices=4 /dev/md0
> sudo mdadm --grow --raid-devices=5 /dev/md0
> cat /proc/mdstat
> sudo e2fsck -f /dev/md0
> cat /proc/mdstat
> sudo resize2fs /dev/md0
> cat /proc/mdstat
> sudo e2fsck -f /dev/md0
> sudo resize2fs /dev/md0
> 
> 
> 
> 
> **Here are some current details:
> uname -a
> Linux server 5.11.0-40-generic #44-Ubuntu SMP Wed Oct 20 16:16:42 UTC
> 2021 x86_64 x86_64 x86_64 GNU/Linux
> 
> mdadm --version
> mdadm - v4.1 - 2018-10-01
> 
> **
> sudo smartctl -H -i -l scterc /dev/sda
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Ultrastar He10/12
> Device Model:     WDC WD100EZAZ-11TDBA0
> Serial Number:    1EK7U77Z
> LU WWN Device Id: 5 000cca 27eedd3d5
> Firmware Version: 83.H0A83
> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Nov 30 00:07:28 2021 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read: Disabled
>            Write: Disabled
> 
> **
> sudo smartctl -H -i -l scterc /dev/sdb
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Ultrastar He10/12
> Device Model:     WDC WD100EMAZ-00WJTA0
> Serial Number:    JEHXKMMM
> LU WWN Device Id: 5 000cca 267db1416
> Firmware Version: 83.H0A83
> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Nov 30 00:08:34 2021 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> **
> sudo smartctl -H -i -l scterc /dev/sdc
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Ultrastar He10/12
> Device Model:     WDC WD100EMAZ-00WJTA0
> Serial Number:    2YHVAJ8D
> LU WWN Device Id: 5 000cca 273da10a9
> Firmware Version: 83.H0A83
> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Nov 30 00:11:29 2021 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> **
> sudo smartctl -H -i -l scterc /dev/sdd
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Ultrastar He10/12
> Device Model:     WDC WD100EMAZ-00WJTA0
> Serial Number:    2YHVABZD
> LU WWN Device Id: 5 000cca 273da1024
> Firmware Version: 83.H0A83
> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Nov 30 00:11:58 2021 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> **
> sudo smartctl -H -i -l scterc /dev/sde
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Ultrastar He10/12
> Device Model:     WDC WD100EMAZ-00WJTA0
> Serial Number:    2YHV9GVD
> LU WWN Device Id: 5 000cca 273da0cbc
> Firmware Version: 83.H0A83
> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Nov 30 00:12:53 2021 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> ***************
> sudo mdadm --examine /dev/sda
> /dev/sda:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
> sudo mdadm --examine /dev/sda1
> /dev/sda1:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at   4294967295 (type ff)
> Partition[1] :   4294967295 sectors at   4294967295 (type ff)
> Partition[2] :   4294967295 sectors at   4294967295 (type ff)
> Partition[3] :    740229375 sectors at   4294967295 (type ff)
> 
> sudo mdadm --examine /dev/sdb
> /dev/sdb:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
> sudo mdadm --examine /dev/sdb1
> mdadm: cannot open /dev/sdb1: No such file or directory
> 
> sudo mdadm --examine /dev/sdc
> /dev/sdc:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
> sudo mdadm --examine /dev/sdc1
> mdadm: cannot open /dev/sdc1: No such file or directory
> 
> sudo mdadm --examine /dev/sdd
> /dev/sdd:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x1
>       Array UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
>             Name : bushserver:0  (local to host bushserver)
>    Creation Time : Fri Nov 16 13:20:25 2018
>       Raid Level : raid5
>     Raid Devices : 5
> 
>   Avail Dev Size : 19532616704 (9313.88 GiB 10000.70 GB)
>       Array Size : 39065219072 (37255.50 GiB 40002.78 GB)
>    Used Dev Size : 19532609536 (9313.87 GiB 10000.70 GB)
>      Data Offset : 257024 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=256944 sectors, after=7168 sectors
>            State : clean
>      Device UUID : 2abcf2dc:f786e3fd:d22b7da9:7e8eec53
> 
> Internal Bitmap : 8 sectors from superblock
>      Update Time : Sun Nov 28 15:27:11 2021
>    Bad Block Log : 512 entries available at offset 48 sectors
>         Checksum : e27debbf - correct
>           Events : 213198
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 1
>     Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
> $ sudo mdadm --examine /dev/sdd1
> mdadm: cannot open /dev/sdd1: No such file or directory
> 
> 
> sudo mdadm --examine /dev/sde
> /dev/sde:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sde1
> mdadm: cannot open /dev/sde1: No such file or directory
> 
> ****************************************************
> sudo mdadm --detail /dev/md0
> /dev/md0:
>             Version : 1.2
>          Raid Level : raid0
>       Total Devices : 1
>         Persistence : Superblock is persistent
> 
>               State : inactive
>     Working Devices : 1
> 
>                Name : bushserver:0  (local to host bushserver)
>                UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
>              Events : 213198
> 
>      Number   Major   Minor   RaidDevice
> 
>         -       8       48        -        /dev/sdd
> *******************************************************
> 
> ./lsdrv
> **Warning** The following utility(ies) failed to execute:
>    sginfo
>    pvs
>    lvs
> Some information may be missing.
> 
> PCI [nvme] 41:00.0 Non-Volatile memory controller: Phison Electronics
> Corporation E12 NVMe Controller (rev 01)
> └nvme nvme0 Force MP510                              {211182930001291838A6}
>   └nvme0n1 447.13g [259:0] Empty/Unknown
>    ├nvme0n1p1 431.03g [259:1] Empty/Unknown
>    │└Mounted as /dev/nvme0n1p1 @ /
>    ├nvme0n1p2 1.00k [259:2] Empty/Unknown
>    └nvme0n1p5 15.87g [259:3] Empty/Unknown
> PCI [ahci] 00:17.0 SATA controller: Intel Corporation
> Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI
> Mode] (rev 31)
> ├scsi 0:0:0:0 ATA      WDC WD100EZAZ-11
> │└sda 9.10t [8:0] Empty/Unknown
> │ └sda1 9.10t [8:1] Empty/Unknown
> ├scsi 1:0:0:0 ATA      WDC WD100EMAZ-00
> │└sdb 9.10t [8:16] Empty/Unknown
> ├scsi 3:0:0:0 ATA      WDC WD100EMAZ-00
> │└sdc 9.10t [8:32] Empty/Unknown
> ├scsi 4:0:0:0 ATA      WDC WD100EMAZ-00
> │└sdd 9.10t [8:48] Empty/Unknown
> │ └md0 0.00k [9:0] MD v1.2  () inactive, None (None) None
> {00000000:-0000-00:00-0000-:000000000000}
> │                  Empty/Unknown
> └scsi 5:0:0:0 ATA      WDC WD100EMAZ-00
>   └sde 9.10t [8:64] Empty/Unknown
> PCI [ahci] 04:00.0 SATA controller: ASMedia Technology Inc. ASM1062
> Serial ATA Controller (rev 02)
> └scsi 6:x:x:x [Empty]
> USB [usb-storage] Bus 001 Device 004: ID 1d6b:0104 Linux Foundation
> Multifunction Composite Gadget {CAFEBABE}
> └scsi 8:0:0:0 Linux    File-CD Gadget
>   └sr0 1.00g [11:0] Empty/Unknown
> Other Block Devices
> ├loop0 4.00k [7:0] Empty/Unknown
> │└Mounted as /dev/loop0 @ /snap/bare/5
> ├loop1 144.60m [7:1] Empty/Unknown
> │└Mounted as /dev/loop1 @ /snap/chromium/1810
> ├loop2 99.44m [7:2] Empty/Unknown
> │└Mounted as /dev/loop2 @ /snap/core/11798
> ├loop3 99.44m [7:3] Empty/Unknown
> │└Mounted as /dev/loop3 @ /snap/core/11993
> ├loop4 147.80m [7:4] Empty/Unknown
> │└Mounted as /dev/loop4 @ /snap/chromium/1827
> ├loop5 55.49m [7:5] Empty/Unknown
> │└Mounted as /dev/loop5 @ /snap/core18/2253
> ├loop6 55.50m [7:6] Empty/Unknown
> │└Mounted as /dev/loop6 @ /snap/core18/2246
> ├loop7 65.21m [7:7] Empty/Unknown
> │└Mounted as /dev/loop7 @ /snap/gtk-common-themes/1519
> ├loop8 164.76m [7:8] Empty/Unknown
> │└Mounted as /dev/loop8 @ /snap/gnome-3-28-1804/161
> ├loop9 65.10m [7:9] Empty/Unknown
> │└Mounted as /dev/loop9 @ /snap/gtk-common-themes/1515
> ├loop10 162.87m [7:10] Empty/Unknown
> │└Mounted as /dev/loop10 @ /snap/gnome-3-28-1804/145
> ├loop11 0.00k [7:11] Empty/Unknown
> ├zram0 1.96g [252:0] Empty/Unknown
> ├zram1 1.96g [252:1] Empty/Unknown
> ├zram2 1.96g [252:2] Empty/Unknown
> ├zram3 1.96g [252:3] Empty/Unknown
> ├zram4 1.96g [252:4] Empty/Unknown
> ├zram5 1.96g [252:5] Empty/Unknown
> ├zram6 1.96g [252:6] Empty/Unknown
> └zram7 1.96g [252:7] Empty/Unknown
> 
> ***********************************
> at /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdd[1](S)
>        9766308352 blocks super 1.2
> 
> unused devices: <none>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-19  4:31 Need help Recover raid5 array Tony Bush
  2021-12-19 10:13 ` Wols Lists
@ 2021-12-19 11:58 ` Andreas Klauer
  2021-12-24  3:55   ` Tony Bush
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Klauer @ 2021-12-19 11:58 UTC (permalink / raw)
  To: Tony Bush; +Cc: linux-raid

On Sat, Dec 18, 2021 at 11:31:39PM -0500, Tony Bush wrote:
> I forgot that this new-to-this-system SSD had Windows 10 OS on
> it and I believe it tried to boot while I was working on hooking up my
> monitor.  So I think that it saw my raid drives and tried to fdisk
> them.  I did mdadm directly to drive and not to a partition(big
> mistake I know now).  So I think the drives were seen as corrupted and
> fdisk corrected the formatting.

Windows is known to do this but it can just as well happen within Linux.
Hopefully no filesystem formatting took place...

> To fix, I have been leaning toward making the drives ready only and
> using an overlay file. Like here:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

This method is so useful there should be standard command in Linux 
to create and manage overlays; but there is none so you have to make do 
with the 'overlay manipulation functions' as shown in the wiki.

> But i dont understand all the commands well enough to work this for my
> situation.  Seems like since I don't know the original drive
> arrangement that may be adding an additional level of complexity.  If
> I can figure out the read only and overlay, I still don't know exactly
> the right way to proceed on the mdadm front.  Please anyone who has a
> handle on a situation like this, let me know what I should do.  Thanks

I summarized `mdadm --create` for data recovery here:

  https://unix.stackexchange.com/a/131927/30851

In addition you should remove the bogus GPT and MBR partition headers. 
You can use 'wipefs' for this task. (Test it with overlays first...)

  wipefs --all --types pmbr,gpt,dos /dev/...

You are lucky to have all the relevant `mdadm --examine` output, 
so you already know the correct data offset and only need to guess 
the correct order of drives.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-19 10:13 ` Wols Lists
@ 2021-12-20  0:25   ` Phil Turmel
  0 siblings, 0 replies; 8+ messages in thread
From: Phil Turmel @ 2021-12-20  0:25 UTC (permalink / raw)
  To: Wols Lists, Tony Bush, linux-raid; +Cc: NeilBrown

I'll have some time tomorrow to dig into this.

On 12/19/21 5:13 AM, Wols Lists wrote:
> Good report, thanks! :-)
> 
> Looks promising. I'm not going to advise much here, but I've punted this 
> to two people I know will be able to help. Of course, it's Christmas, 
> they might not be around for a bit so you may have to wait.
> 
> This is a situation we've recovered a fair few raids from. The 
> superblock may well still be there, just not accessible because of the 
> new gpt/mbr.
> 
> First things first. You're talking about getting new drives? Can you get 
> THREE? Back up three of your four corrupted drives. That way, you've got 
> four corrupted drives you can attempt to recover, and four backups as 
> insurance. That'll take a long time, the sooner the better.
> 
> Then move all your raid drives safely out the way, build your new 
> working system, and play with loopbacks as per the recovery 
> instructions. You understand what it's trying to do? Put a read/write 
> layer over the underlying read-only disk? Search the list archives for 
> people who've done this, it's not difficult although I'm with you on it 
> feeling scary and overwhelming - if you're in a "I need this to work" 
> situation it's a bit frightening. Once you're happy it's working, you 
> can put your four damaged disks back in and recreate the array.
> 
> The one bad bit of news is we don't know what the GPT/MBR has stomped 
> over. If it's stomped over the working array, then you've probably got 
> some data loss - hopefully not much.
> 
> (NB don't blame Windows. It may well have been, but the frightening 
> thing is it could have been the new mobo! or Ubuntu. Both have been 
> implicated in previous incidents.)
> 
> I'll bow out now,
> Cheers,
> Wol
> 
> On 19/12/2021 04:31, Tony Bush wrote:
>> I have a small ubuntu server that I was upgrading the hardware to and
>> in the process lost my raid.  I changed the CPU, MOBO, RAM.  I added a
>> new-to-this-system SSD also to replace the current SSD(in a future
>> step).  I forgot that this new-to-this-system SSD had Windows 10 OS on
>> it and I believe it tried to boot while I was working on hooking up my
>> monitor.  So I think that it saw my raid drives and tried to fdisk
>> them.  I did mdadm directly to drive and not to a partition(big
>> mistake I know now).  So I think the drives were seen as corrupted and
>> fdisk corrected the formatting.  I lost my super blocks on 4 of 5
>> drives.  These are shucked external 10TB drives and one even shows up
>> with 'my drive' partition label and 2 files that came with those
>> drives.  I want to recover my raid and files but don't want to make it
>> worse.  I have not mounted the drives as writable.  I think the damage
>> should be limited, but I don't know mdadm well.  I have been digging
>> for a few days on options and most advice is generic and bad and I
>> feel would make it worse.  I don't know the original order the drives
>> were in.
>>
>> 1 drive is fully intact, probably due to a BIOS sata config not
>> enabling all drives when i first booted.
>>
>> The size makes this impractical to dd onto new disks.  The drives were
>> 99% full and I was about to add 2 new drives.  Now if i can recover
>> this, i will be starting a new array correctly and transfering files
>> to that.
>>
>> To fix, I have been leaning toward making the drives ready only and
>> using an overlay file. Like here:
>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file 
>>
>> But i dont understand all the commands well enough to work this for my
>> situation.  Seems like since I don't know the original drive
>> arrangement that may be adding an additional level of complexity.  If
>> I can figure out the read only and overlay, I still don't know exactly
>> the right way to proceed on the mdadm front.  Please anyone who has a
>> handle on a situation like this, let me know what I should do.  Thanks
>>
>>
>> **Original command history for array:
>> sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
>> /dev/sdc /dev/sdd /dev/sde
>> cat /proc/mdstat
>> sudo mkfs.ext4 -F /dev/md0
>> sudo mkdir -p /media/raid
>> sudo mount /dev/md0 /media/raid
>> df -h -x devtmpfs -x tmpfs
>> cat /proc/mdstat
>> sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
>> sudo update-initramfs -u
>>
>> sudo  umount /dev/md0
>> sudo  umount /dev/md0 -f
>> sudo  umount /dev/md0
>> sudo  fsck.ext4 -f /dev/md0
>> sudo  fsck.ext4
>> sudo  fsck.ext4 -f /dev/md0 -p
>> sudo  fsck.ext4 -f /dev/md0 -p -y
>> sudo  fsck.ext4 -f /dev/md0 -y
>> sudo  resize2fs /dev/md0
>> sudo fdisk -l
>> sudo parted -a optimal /dev/sdf
>> sudo -i mdadm --add /dev/md0 /dev/sdf
>> watch cat /proc/mdstat
>> sudo mdadm --grow /dev/md0 --raid-devices=4
>> sudo thunar
>> watch cat /proc/mdstat
>> sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf
>>
>> cat /proc/mdstat
>> sudo mount /dev/md0 /media/raid
>> sudo mdadm --assemble --scan
>> sudo mount /dev/md0 /media/raid
>>
>> sudo fdisk -l
>> sudo parted -s -a optimal /dev/sdb mklabel gpt
>> parted /dev/sdb
>> sudo parted /dev/sdb
>> sudo mdadm --add /dev/md0 /dev/sdb1
>> sudo mdadm --add /dev/md0 /dev/sdb
>> cat /proc/mdstat
>> mdadm --grow --raid-devices=4 /dev/md0
>> sudo mdadm --grow --raid-devices=4 /dev/md0
>> sudo mdadm --grow --raid-devices=5 /dev/md0
>> cat /proc/mdstat
>> sudo e2fsck -f /dev/md0
>> cat /proc/mdstat
>> sudo resize2fs /dev/md0
>> cat /proc/mdstat
>> sudo e2fsck -f /dev/md0
>> sudo resize2fs /dev/md0
>>
>>
>>
>>
>> **Here are some current details:
>> uname -a
>> Linux server 5.11.0-40-generic #44-Ubuntu SMP Wed Oct 20 16:16:42 UTC
>> 2021 x86_64 x86_64 x86_64 GNU/Linux
>>
>> mdadm --version
>> mdadm - v4.1 - 2018-10-01
>>
>> **
>> sudo smartctl -H -i -l scterc /dev/sda
>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local 
>> build)
>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, 
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Western Digital Ultrastar He10/12
>> Device Model:     WDC WD100EZAZ-11TDBA0
>> Serial Number:    1EK7U77Z
>> LU WWN Device Id: 5 000cca 27eedd3d5
>> Firmware Version: 83.H0A83
>> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    5400 rpm
>> Form Factor:      3.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Nov 30 00:07:28 2021 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read: Disabled
>>            Write: Disabled
>>
>> **
>> sudo smartctl -H -i -l scterc /dev/sdb
>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local 
>> build)
>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, 
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Western Digital Ultrastar He10/12
>> Device Model:     WDC WD100EMAZ-00WJTA0
>> Serial Number:    JEHXKMMM
>> LU WWN Device Id: 5 000cca 267db1416
>> Firmware Version: 83.H0A83
>> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    5400 rpm
>> Form Factor:      3.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Nov 30 00:08:34 2021 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:     70 (7.0 seconds)
>>            Write:     70 (7.0 seconds)
>>
>> **
>> sudo smartctl -H -i -l scterc /dev/sdc
>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local 
>> build)
>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, 
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Western Digital Ultrastar He10/12
>> Device Model:     WDC WD100EMAZ-00WJTA0
>> Serial Number:    2YHVAJ8D
>> LU WWN Device Id: 5 000cca 273da10a9
>> Firmware Version: 83.H0A83
>> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    5400 rpm
>> Form Factor:      3.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Nov 30 00:11:29 2021 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:     70 (7.0 seconds)
>>            Write:     70 (7.0 seconds)
>>
>> **
>> sudo smartctl -H -i -l scterc /dev/sdd
>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local 
>> build)
>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, 
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Western Digital Ultrastar He10/12
>> Device Model:     WDC WD100EMAZ-00WJTA0
>> Serial Number:    2YHVABZD
>> LU WWN Device Id: 5 000cca 273da1024
>> Firmware Version: 83.H0A83
>> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    5400 rpm
>> Form Factor:      3.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Nov 30 00:11:58 2021 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:     70 (7.0 seconds)
>>            Write:     70 (7.0 seconds)
>>
>> **
>> sudo smartctl -H -i -l scterc /dev/sde
>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.11.0-41-generic] (local 
>> build)
>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, 
>> www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     Western Digital Ultrastar He10/12
>> Device Model:     WDC WD100EMAZ-00WJTA0
>> Serial Number:    2YHV9GVD
>> LU WWN Device Id: 5 000cca 273da0cbc
>> Firmware Version: 83.H0A83
>> User Capacity:    10,000,831,348,736 bytes [10.0 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    5400 rpm
>> Form Factor:      3.5 inches
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Tue Nov 30 00:12:53 2021 EST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:     70 (7.0 seconds)
>>            Write:     70 (7.0 seconds)
>>
>> ***************
>> sudo mdadm --examine /dev/sda
>> /dev/sda:
>>     MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>
>> sudo mdadm --examine /dev/sda1
>> /dev/sda1:
>>     MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at   4294967295 (type ff)
>> Partition[1] :   4294967295 sectors at   4294967295 (type ff)
>> Partition[2] :   4294967295 sectors at   4294967295 (type ff)
>> Partition[3] :    740229375 sectors at   4294967295 (type ff)
>>
>> sudo mdadm --examine /dev/sdb
>> /dev/sdb:
>>     MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>
>> sudo mdadm --examine /dev/sdb1
>> mdadm: cannot open /dev/sdb1: No such file or directory
>>
>> sudo mdadm --examine /dev/sdc
>> /dev/sdc:
>>     MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>
>> sudo mdadm --examine /dev/sdc1
>> mdadm: cannot open /dev/sdc1: No such file or directory
>>
>> sudo mdadm --examine /dev/sdd
>> /dev/sdd:
>>            Magic : a92b4efc
>>          Version : 1.2
>>      Feature Map : 0x1
>>       Array UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
>>             Name : bushserver:0  (local to host bushserver)
>>    Creation Time : Fri Nov 16 13:20:25 2018
>>       Raid Level : raid5
>>     Raid Devices : 5
>>
>>   Avail Dev Size : 19532616704 (9313.88 GiB 10000.70 GB)
>>       Array Size : 39065219072 (37255.50 GiB 40002.78 GB)
>>    Used Dev Size : 19532609536 (9313.87 GiB 10000.70 GB)
>>      Data Offset : 257024 sectors
>>     Super Offset : 8 sectors
>>     Unused Space : before=256944 sectors, after=7168 sectors
>>            State : clean
>>      Device UUID : 2abcf2dc:f786e3fd:d22b7da9:7e8eec53
>>
>> Internal Bitmap : 8 sectors from superblock
>>      Update Time : Sun Nov 28 15:27:11 2021
>>    Bad Block Log : 512 entries available at offset 48 sectors
>>         Checksum : e27debbf - correct
>>           Events : 213198
>>
>>           Layout : left-symmetric
>>       Chunk Size : 512K
>>
>>     Device Role : Active device 1
>>     Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
>> $ sudo mdadm --examine /dev/sdd1
>> mdadm: cannot open /dev/sdd1: No such file or directory
>>
>>
>> sudo mdadm --examine /dev/sde
>> /dev/sde:
>>     MBR Magic : aa55
>> Partition[0] :   4294967295 sectors at            1 (type ee)
>> $ sudo mdadm --examine /dev/sde1
>> mdadm: cannot open /dev/sde1: No such file or directory
>>
>> ****************************************************
>> sudo mdadm --detail /dev/md0
>> /dev/md0:
>>             Version : 1.2
>>          Raid Level : raid0
>>       Total Devices : 1
>>         Persistence : Superblock is persistent
>>
>>               State : inactive
>>     Working Devices : 1
>>
>>                Name : bushserver:0  (local to host bushserver)
>>                UUID : 93e81091:84ba78f0:eb8232d9:c3c995f0
>>              Events : 213198
>>
>>      Number   Major   Minor   RaidDevice
>>
>>         -       8       48        -        /dev/sdd
>> *******************************************************
>>
>> ./lsdrv
>> **Warning** The following utility(ies) failed to execute:
>>    sginfo
>>    pvs
>>    lvs
>> Some information may be missing.
>>
>> PCI [nvme] 41:00.0 Non-Volatile memory controller: Phison Electronics
>> Corporation E12 NVMe Controller (rev 01)
>> └nvme nvme0 Force MP510                              
>> {211182930001291838A6}
>>   └nvme0n1 447.13g [259:0] Empty/Unknown
>>    ├nvme0n1p1 431.03g [259:1] Empty/Unknown
>>    │└Mounted as /dev/nvme0n1p1 @ /
>>    ├nvme0n1p2 1.00k [259:2] Empty/Unknown
>>    └nvme0n1p5 15.87g [259:3] Empty/Unknown
>> PCI [ahci] 00:17.0 SATA controller: Intel Corporation
>> Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI
>> Mode] (rev 31)
>> ├scsi 0:0:0:0 ATA      WDC WD100EZAZ-11
>> │└sda 9.10t [8:0] Empty/Unknown
>> │ └sda1 9.10t [8:1] Empty/Unknown
>> ├scsi 1:0:0:0 ATA      WDC WD100EMAZ-00
>> │└sdb 9.10t [8:16] Empty/Unknown
>> ├scsi 3:0:0:0 ATA      WDC WD100EMAZ-00
>> │└sdc 9.10t [8:32] Empty/Unknown
>> ├scsi 4:0:0:0 ATA      WDC WD100EMAZ-00
>> │└sdd 9.10t [8:48] Empty/Unknown
>> │ └md0 0.00k [9:0] MD v1.2  () inactive, None (None) None
>> {00000000:-0000-00:00-0000-:000000000000}
>> │                  Empty/Unknown
>> └scsi 5:0:0:0 ATA      WDC WD100EMAZ-00
>>   └sde 9.10t [8:64] Empty/Unknown
>> PCI [ahci] 04:00.0 SATA controller: ASMedia Technology Inc. ASM1062
>> Serial ATA Controller (rev 02)
>> └scsi 6:x:x:x [Empty]
>> USB [usb-storage] Bus 001 Device 004: ID 1d6b:0104 Linux Foundation
>> Multifunction Composite Gadget {CAFEBABE}
>> └scsi 8:0:0:0 Linux    File-CD Gadget
>>   └sr0 1.00g [11:0] Empty/Unknown
>> Other Block Devices
>> ├loop0 4.00k [7:0] Empty/Unknown
>> │└Mounted as /dev/loop0 @ /snap/bare/5
>> ├loop1 144.60m [7:1] Empty/Unknown
>> │└Mounted as /dev/loop1 @ /snap/chromium/1810
>> ├loop2 99.44m [7:2] Empty/Unknown
>> │└Mounted as /dev/loop2 @ /snap/core/11798
>> ├loop3 99.44m [7:3] Empty/Unknown
>> │└Mounted as /dev/loop3 @ /snap/core/11993
>> ├loop4 147.80m [7:4] Empty/Unknown
>> │└Mounted as /dev/loop4 @ /snap/chromium/1827
>> ├loop5 55.49m [7:5] Empty/Unknown
>> │└Mounted as /dev/loop5 @ /snap/core18/2253
>> ├loop6 55.50m [7:6] Empty/Unknown
>> │└Mounted as /dev/loop6 @ /snap/core18/2246
>> ├loop7 65.21m [7:7] Empty/Unknown
>> │└Mounted as /dev/loop7 @ /snap/gtk-common-themes/1519
>> ├loop8 164.76m [7:8] Empty/Unknown
>> │└Mounted as /dev/loop8 @ /snap/gnome-3-28-1804/161
>> ├loop9 65.10m [7:9] Empty/Unknown
>> │└Mounted as /dev/loop9 @ /snap/gtk-common-themes/1515
>> ├loop10 162.87m [7:10] Empty/Unknown
>> │└Mounted as /dev/loop10 @ /snap/gnome-3-28-1804/145
>> ├loop11 0.00k [7:11] Empty/Unknown
>> ├zram0 1.96g [252:0] Empty/Unknown
>> ├zram1 1.96g [252:1] Empty/Unknown
>> ├zram2 1.96g [252:2] Empty/Unknown
>> ├zram3 1.96g [252:3] Empty/Unknown
>> ├zram4 1.96g [252:4] Empty/Unknown
>> ├zram5 1.96g [252:5] Empty/Unknown
>> ├zram6 1.96g [252:6] Empty/Unknown
>> └zram7 1.96g [252:7] Empty/Unknown
>>
>> ***********************************
>> at /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md0 : inactive sdd[1](S)
>>        9766308352 blocks super 1.2
>>
>> unused devices: <none>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-19 11:58 ` Andreas Klauer
@ 2021-12-24  3:55   ` Tony Bush
  2021-12-24  7:45     ` Andreas Klauer
  0 siblings, 1 reply; 8+ messages in thread
From: Tony Bush @ 2021-12-24  3:55 UTC (permalink / raw)
  To: Andreas Klauer, Wols Lists; +Cc: linux-raid

I am at a loss.  I tried setting up an overlay with the 'overlay
manipulation functions' as a script.  First time touching that, but i
think it is working correctly.  I then wipefs --all --types
pmbr,gpt,dos /dev/sd{a,b,c,e}.  I wanted to tack on a file system
label of 'linux_raid_memeber' but dont know how.  Then I did :

sudo mdadm --create /dev/md2 --assume-clean     --level=5 --chunk=512K
--metadata=1.2 --data-offset=257024s     --raid-devices=5
/dev/mapper/sda /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd
/dev/mapper/sde
mdadm: /dev/mapper/sdd appears to be part of a raid array:
       level=raid5 devices=5 ctime=Fri Nov 16 13:20:25 2018
mdadm: partition table exists on /dev/mapper/sdd but will be lost or
       meaningless after creating array
Continue creating array? y
mdadm: array /dev/md2 started.
thecompguru@bushserver:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md2 : active raid5 dm-4[4] dm-3[3] dm-2[2] dm-1[1] dm-0[0]
      39065233408 blocks super 1.2 level 5, 512k chunk, algorithm 2
[5/5] [UUUUU]
      bitmap: 0/73 pages [0KB], 65536KB chunk

unused devices: <none>
thecompguru@bushserver:~$ sudo mount /dev/md2 /media/raid
mount: /media/raid: wrong fs type, bad option, bad superblock on
/dev/md2, missing codepage or helper program, or other error.
thecompguru@bushserver:~$ sudo mdadm --stop /dev/md2

Is this normal and looks as expected?  Am I doing this right?  Do I
need to do this 120 times changing the drive order till it shows up as
working?  I need some hand holding or some more step by step because I
am just not sure what to do.

Is it possible to do some kind of dd snip and copy out some parts of
the good drive to get mdadm to look for the superblock or whatever
it's needing from the other drives?

Check me on the overlay as well.  I just copied the 2 functions and
added a line at the bottom into a .sh executable script and ran with
sudo.
****
devices="/dev/sda /dev/sdb /dev/sdc"

overlay_create()
{
        free=$((`stat -c '%a*%S/1024/1024' -f .`))
        echo free ${free}M
        overlays=""
        overlay_remove
        for d in $devices; do
                b=$(basename $d)
                size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
                # reserve 1M space for snapshot header
                # ext3 max file length is 2TB
                truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo
"Do you use ext4?"; return 1)
                loop=$(losetup -f --show -- $b.ovr)
                #
https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
                dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
                echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
                overlays="$overlays /dev/mapper/$b"
        done
        overlays=${overlays# }
}

overlay_remove()
{
        for d in $devices; do
                b=$(basename $d)
                [ -e /dev/mapper/$b ] && dmsetup remove $b && echo
/dev/mapper/$b
                if [ -e $b.ovr ]; then
                        echo $b.ovr
                        l=$(losetup -j $b.ovr | cut -d : -f1)
                        echo $l
                        [ -n "$l" ] && losetup -d $(losetup -j $b.ovr
| cut -d : -f1)
                        rm -f $b.ovr &> /dev/null
                fi
        done
}
overlay_create
****

My only way to proceed right now would be to run the overlay_create
and I assume that starts me fresh again on drive changes?  I then try
creating the array again with a different drive order?  Not really
very feasible.  Can I determine the order placement of the intact
drive in any way?  Then that's like 24 possible arrangement options
instead of 120.

Thanks for any help.

On Sun, Dec 19, 2021 at 6:58 AM Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
>
> On Sat, Dec 18, 2021 at 11:31:39PM -0500, Tony Bush wrote:
> > I forgot that this new-to-this-system SSD had Windows 10 OS on
> > it and I believe it tried to boot while I was working on hooking up my
> > monitor.  So I think that it saw my raid drives and tried to fdisk
> > them.  I did mdadm directly to drive and not to a partition(big
> > mistake I know now).  So I think the drives were seen as corrupted and
> > fdisk corrected the formatting.
>
> Windows is known to do this but it can just as well happen within Linux.
> Hopefully no filesystem formatting took place...
>
> > To fix, I have been leaning toward making the drives ready only and
> > using an overlay file. Like here:
> > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> This method is so useful there should be standard command in Linux
> to create and manage overlays; but there is none so you have to make do
> with the 'overlay manipulation functions' as shown in the wiki.
>
> > But i dont understand all the commands well enough to work this for my
> > situation.  Seems like since I don't know the original drive
> > arrangement that may be adding an additional level of complexity.  If
> > I can figure out the read only and overlay, I still don't know exactly
> > the right way to proceed on the mdadm front.  Please anyone who has a
> > handle on a situation like this, let me know what I should do.  Thanks
>
> I summarized `mdadm --create` for data recovery here:
>
>   https://unix.stackexchange.com/a/131927/30851
>
> In addition you should remove the bogus GPT and MBR partition headers.
> You can use 'wipefs' for this task. (Test it with overlays first...)
>
>   wipefs --all --types pmbr,gpt,dos /dev/...
>
> You are lucky to have all the relevant `mdadm --examine` output,
> so you already know the correct data offset and only need to guess
> the correct order of drives.
>
> Regards
> Andreas Klauer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-24  3:55   ` Tony Bush
@ 2021-12-24  7:45     ` Andreas Klauer
  2021-12-28 23:56       ` Tony Bush
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Klauer @ 2021-12-24  7:45 UTC (permalink / raw)
  To: Tony Bush; +Cc: linux-raid

On Thu, Dec 23, 2021 at 10:55:01PM -0500, Tony Bush wrote:
> /dev/mapper/sda /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd
> /dev/mapper/sde

Hi Tony, 

your examine output of the one drive that was left showed Device role 1,  
and count starts from 0 so that's the 2nd drive in the array. The order 
of the others is unknown so yes, unless you are able to derive order 
from raw data, you simply have to try all combinations. This can be 
scripted as well.

Furthermore you should --examine the array you created and make sure 
that all other variables (offset, level, layout, ...), match your 
previous --examine.

As for re-creating overlays, you can do that for every single step 
but it might not be necessary just for mount attempt.

Note that there is the case where mounting might succeed but drive 
order is still wrong - find a large file and see if it is fully intact.

Best of luck,
Andreas Klauer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-24  7:45     ` Andreas Klauer
@ 2021-12-28 23:56       ` Tony Bush
  2021-12-29  4:37         ` Andreas Klauer
  0 siblings, 1 reply; 8+ messages in thread
From: Tony Bush @ 2021-12-28 23:56 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Thanks for the help.  I have successfully recovered my raid.  Here is
what I did if it helps someone else.

My situation: /dev/sdd is in tact and the second (1) drive in my array
for from mdadm examine of this drive
I created a script to make and overlay via nano with this inside:
-----------------------------------------------------------------------------
devices="/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde"

overlay_create()
{

       free=$((`stat -c '%a*%S/1024/1024' -f .`))
        echo free ${free}M
        overlays=""
        overlay_remove
        for d in $devices; do
                b=$(basename $d)
                size_bkl=$(blockdev --getsz $d) # in 512 blocks/sectors
                # reserve 1M space for snapshot header
                # ext3 max file length is 2TB
                truncate -s$((((size_bkl+1)/2)+1024))K $b.ovr || (echo
"Do you use ext4?"; return 1)
                loop=$(losetup -f --show -- $b.ovr)
                #
https://www.kernel.org/doc/Documentation/device-mapper/snapshot.txt
                dmsetup create $b --table "0 $size_bkl snapshot $d $loop P 8"
                echo $d $((size_bkl/2048))M $loop /dev/mapper/$b
                overlays="$overlays /dev/mapper/$b"
        done
        overlays=${overlays# }
}

overlay_remove()
{
        for d in $devices; do
                b=$(basename $d)
                [ -e /dev/mapper/$b ] && dmsetup remove $b && echo
/dev/mapper/$b
                if [ -e $b.ovr ]; then
                        echo $b.ovr
                        l=$(losetup -j $b.ovr | cut -d : -f1)
                        echo $l
                        [ -n "$l" ] && losetup -d $(losetup -j $b.ovr
| cut -d : -f1)
                        rm -f $b.ovr &> /dev/null
                fi
        done
}

overlay_create
------------------------------------------------------------------------------
ran script
sudo ./overlay.sh

cleared bogus info on my overlayed drives:
sudo wipefs --all --types pmbr,gpt,dos /dev/mapper/sda
sudo wipefs --all --types pmbr,gpt,dos /dev/mapper/sdb
sudo wipefs --all --types pmbr,gpt,dos /dev/mapper/sdc
sudo wipefs --all --types pmbr,gpt,dos /dev/mapper/sde

 created script createmdadm.sh with this in it:
-------------------------------------------------------------------------------
mdadm --create /dev/md2 --assume-clean  --readonly    --level=5
--chunk=512K --metadata=1.2  --layout left-symmetric
--data-offset=257024s     --raid-devices=5 /dev/mapper/sd"$1"
/dev/mapper/sdd /dev/mapper/sd"$2" /dev/mapper/sd"$3"
/dev/mapper/sd"$4"
-----------------------------------------------------------------------------------

made a list of all 24 permutations of drive arrangements to plug into
the script each time.  The actual list i used shown below with
examples of outputs that were given:
------
aDbce*
abce
mount: /media/raid: wrong fs type, bad option, bad superblock on
/dev/md2, missing codepage or helper program, or other error.
*this was the output on all attempts to Mount the new array unless
noted otherwise
bace*
cabe*
acbe*
bcae*
cbae*
ebac mount: /media/raid: mount(2) system call failed: Structure needs cleaning.

beac*
aebc*
eabc mount: /media/raid: mount(2) system call failed: Structure needs cleaning.
baec*
abec*
aceb*
caeb*
eacb mount: /media/raid: mount(2) system call failed: Structure needs cleaning.

aecb*
ceab*
ecab mount: /media/raid: mount(2) system call failed: Structure needs cleaning.

ecba mount: /media/raid: WARNING: source write-protected, mounted read-only.

ceba
beca
ebca
cbea
bcea
----------------
ran the mdadmcreate script to create array:
sudo ./createmdadm.sh a b c e
sudo mount /dev/md2 /media/raid
output: mount: /media/raid: wrong fs type, bad option, bad superblock
on /dev/md2, missing codepage or helper program, or other error.

didnt work so i stop it(never needed to umount)
sudo mdadm --stop /dev/md2

then repeat steps with new permutation of drive arrangement:
sudo ./createmdadm.sh a b c e
sudo mount /dev/md2 /media/raid
sudo mdadm --stop /dev/md2

I was looking for a change in output on mounting the raid.
The first and second drive being correct seemed to give a different output:
output:    mount: /media/raid: mount(2) system call failed: Structure
needs cleaning.
correct arrangment gave this output:
output:    mount: /media/raid: WARNING: source write-protected,
mounted read-only.

I then mounted the drive as my user, tested multiple large files and
when i was done crying i unmounted
sudo umount /...
sudo mdadm --stop /dev/md2

modified my overlay.sh last line to overlay_remove instead of overlay_create
removed the overlay the best i know how:
sudo ./overlay.sh

then ran mdadm create on the real drives without readonly peramiter
and the discovered disk arrangements.  Also changed /dev/md2 to
/dev/md0  which is the original config for my raid that is setup
already:
sudo mdadm --create /dev/md0 --assume-clean    --level=5 --chunk=512K
--metadata=1.2  --layout left-symmetric  --data-offset=257024s
--raid-devices=5 /dev/sde /dev/sdd /dev/sdc /dev/sdb /dev/sda
sudo mount /dev/md0 /media/raid
checked my files and rejoiced
not elegant but simple enough.  Wish i would have had a play by play
like this.  Knowing what to expect on outputs really would have been
handy.  I hope this helps someone out there.

My next task is creating a new raid that is inside of partitions.
Need to do this with 2 new drives, transfer 10TB, then shrink and
remove a drive to add to new raid, transfer and repeat.  Let me know
if this is a bad idea please.  I fear shrinking or removing the raid
but heard that this was a feature that has been added more recently
and could work.

Thanks guys.

On Fri, Dec 24, 2021 at 2:45 AM Andreas Klauer
<Andreas.Klauer@metamorpher.de> wrote:
>
> On Thu, Dec 23, 2021 at 10:55:01PM -0500, Tony Bush wrote:
> > /dev/mapper/sda /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd
> > /dev/mapper/sde
>
> Hi Tony,
>
> your examine output of the one drive that was left showed Device role 1,
> and count starts from 0 so that's the 2nd drive in the array. The order
> of the others is unknown so yes, unless you are able to derive order
> from raw data, you simply have to try all combinations. This can be
> scripted as well.
>
> Furthermore you should --examine the array you created and make sure
> that all other variables (offset, level, layout, ...), match your
> previous --examine.
>
> As for re-creating overlays, you can do that for every single step
> but it might not be necessary just for mount attempt.
>
> Note that there is the case where mounting might succeed but drive
> order is still wrong - find a large file and see if it is fully intact.
>
> Best of luck,
> Andreas Klauer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Need help Recover raid5 array
  2021-12-28 23:56       ` Tony Bush
@ 2021-12-29  4:37         ` Andreas Klauer
  0 siblings, 0 replies; 8+ messages in thread
From: Andreas Klauer @ 2021-12-29  4:37 UTC (permalink / raw)
  To: Tony Bush; +Cc: linux-raid

On Tue, Dec 28, 2021 at 06:56:35PM -0500, Tony Bush wrote:
> I then mounted the drive as my user, tested multiple large files and
> when i was done crying i unmounted

:-)

> My next task is creating a new raid that is inside of partitions.

Now that you know how to create overlays and verified the data 
is still there... if you want partitions you can do the whole 
thing one more time.

Your data offset is 257024 sectors so you can take 2048 sectors 
off that to create a partition at 1 MiB offset.

Since GPT has a partition table backup at end of drive, you might 
also have to shrink the filesystem on the RAID just a little to 
make room for that.

Test it all on overlays until you make it work...

> Need to do this with 2 new drives, transfer 10TB, then shrink and
> remove a drive to add to new raid, transfer and repeat.

> Let me know if this is a bad idea please.

Growing is way more common than the other way around.
Things can go mysteriously wrong when shrinking stuff. 

As an alternative to the overlay method mentioned above, 
if you do not mind re-syncing, you could also mdadm --replace 
your full disk members with partitioned members.

It will complain about the device "not large enough to join array" 
so you still have to shrink the filesystem just a little and shrink 
the RAID itself accordingly until the device is not too small anymore.

The main issue is that mdadm's --max-size and --size options can 
be difficult to deal with. So things can go wrong here, too.

Test it on a separate array (loop devices) first...

# mdadm --grow /dev/md100 --array-size=max
(no output, check dmesg or /proc/mdstat for size)
      25132032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
# blockdev --getsize64 /dev/md100
25735200768

### shrink filesystem on /dev/md100 to 25100000K

# mdadm --grow /dev/md100 --array-size=25100000K

### double check that filesystem is still OK here

# mdadm /dev/md100 --add /dev/loop5
mdadm: /dev/loop5 not large enough to join array
# mdadm --grow /dev/md100 --size=max
mdadm: component size of /dev/md100 unchanged at 8377344K
# mdadm --grow /dev/md100 --size=8377343K
mdadm: component size of /dev/md100 has been set to 8376832K
# mdadm /dev/md100 --add /dev/loop5
mdadm: /dev/loop5 not large enough to join array
# mdadm --grow /dev/md100 --size=8376831K
mdadm: component size of /dev/md100 has been set to 8376320K
# mdadm /dev/md100 --add /dev/loop5
mdadm: added /dev/loop5
# mdadm /dev/md100 --replace /dev/loop3

# cat /proc/mdstat
md100 : active raid5 loop5[6](R) loop4[5] loop3[4] loop2[2] loop1[1] loop0[0](F)
      25100000 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [=========>...........]  recovery = 45.4% (3806592/8376320) finish=0.3min speed=200346K/sec

# mdadm --grow /dev/md100 --size=max
# mdadm --grow /dev/md100 --array-size=max
# resize2fs /dev/md100

Something like this, might maybe work if you had to do it without overlays.

Not sure if there is a simpler way right now.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-29  4:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-19  4:31 Need help Recover raid5 array Tony Bush
2021-12-19 10:13 ` Wols Lists
2021-12-20  0:25   ` Phil Turmel
2021-12-19 11:58 ` Andreas Klauer
2021-12-24  3:55   ` Tony Bush
2021-12-24  7:45     ` Andreas Klauer
2021-12-28 23:56       ` Tony Bush
2021-12-29  4:37         ` Andreas Klauer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.