Recovery after accidental raid5 superblock rewrite

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Recovery after accidental raid5 superblock rewrite
@ 2017-06-03 19:46 Paul Tonelli
  2017-06-03 21:20 ` Andreas Klauer
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-03 19:46 UTC (permalink / raw)
  To: linux-raid

Hello,

I am trying to recover an ext4 partition on lvm2 on raid5. After reading 
to find solutions by myself, I tried to go on the freenod irc linux-raid 
channel, who advised me to describe my problem here, so here I am.

The first part of the mail describes what led to my issue, the second 
part is what I tried to solve it, the third is my current status, the 
fourth are the questions. the mail can be read in markdown.

Part I: creation and loss of the array
=============================================

The raid is on 3 sata disks of 3Tb each. It was initialised as:

```
mdadm --create --verbose --force /dev/md0 --level=5 --raid-devices=2  
/dev/sdb /dev/sdc
pvcreate /dev/md0
vgcreate vg0 /dev/md0
lvcreate -L 2.5T -n data /dev/vg0 #command guessed from the lvm archives 
files
mkfs.ext4 /dev/vg0/data
```

the raid did not initialize correctly at each boot I had to rebuild the 
disk using :

```
mdadm --create --verbose --force --assume-clean /dev/md0 --level=5 
--raid-devices=2  /dev/sdb /dev/sdc
```

it would then mount without issue (autodetection of the vg and lv worked)

I then extended as follow to add a third disk, (hot plugged to the 
system this has its importance). The raid had the time to grow and I was 
able to extend everything on top of it:

```
sudo mdadm --add /dev/md0 /dev/sdd
sudo mdadm --grow --raid-devices=3 /dev/md0
sudo lvextend -L +256G /dev/mapper/vg0-data
sudo resize2fs /dev/vg0/data
sudo lvextend -L +256G /dev/mapper/vg0-data
sudo resize2fs /dev/vg0/data
sudo lvresize -L +500G  /dev/vg0/data
sudo resize2fs /dev/vg0/data
```

Here the machine crashed for unrelated reasons

The data was not backupped: this was a transition situation where we 
regrouped data from several machines and the backup nas was being setup 
when this occured (this was the first mistake).

at reboot, I could not reassemble the raid and I did (this was the 
second mistake, I had not read the wiki at this time):

mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sdb 
/dev/sdc /dev/sdd

I realized my error half an hour later when I could not detect any 
volume group or mount anything and immediately stopped the rebuild of 
drive sdd which was occuring (it was stopped <5%, so the first 5% of 
disk sdd are now wrong).

Actually, between the reboots, the hard drive order had changed (because 
the disk had been hotplugged initially), the most probable change is:

sdc became sdd
sdb became sdc
sdd became sdb

I immediately made backups of the three disks to spares using dd (to sde 
sdf and sdg) and have been testing different methods to get back my data 
ever since without success.

I made another mistake during the 3 days I spent trying to recover the 
data, I switched two disks ids in a dd command and overwrite the first 
800Mb or so of disk c:

```
dd if=/dev/sdc of=/dev/sdf bs=64k count=12500
```

The data contained on the disks is yml files, pictures (lots of it, with 
a specific order) and binary files. Recovering of huge yml (Gb long) and 
the structure of the filesystem are the most important data.

Part 2: What I tried
====================

The main test has been to rebuild the raid5 with the different possible 
disk orders and try to detect data on it.

I tried several disk orders, restored the physical volume, volume group 
and logical volume using:

```
mdadm --create --assume-clean --level=5 --raid-devices=3 /dev/md0 
/dev/sdc missing /dev/sdb
pvcreate --uuid "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d" --restorefile 
/home/ptonelli/backup_raid/lvm/backup/vg0 /dev/md0
vgcfgrestore vg0 -f /home/ptonelli/backup_raid/lvm/backup/vg0
vgchange -a a vg0
testdisk /dev/vg0/data
```

and did a deep scan of the disks and let it run until it reached 3%. I 
got the following data (m for missing):

- sdd(m) sdb sdc : 258046 782334 1306622 1830910 2355198
- sdd(m) sdc sdb : 783358 1306622, 23562222
- sdb sdd(m) sdc : 258046 1307646 1830910
- sdc sdd(m) sdb : 259070 783358 1307646 1831834 235622
- sdb sdc sdd(m) : nothing detected
- sdc sdb sdd(m) : 259070 782334 1831934 235198

I wrote only the ext4 superblocks start position returned by testdisk), 
the rest of the data was the same each time and matched the ext4 
partition size I am trying to recover.

between each test, I restored the disks with the following method:

```
vgchange -a n vg0
mdadm --stop /dev/md0
dd if=/dev/sde of=/dev/sdb bs=64k count=125000
dd if=/dev/sdf of=/dev/sdc bs=64k count=125000
dd if=/dev/sdg of=/dev/sdd bs=64k count=125000
```

on the most promising orders ([sdc, missing, sdb] and [missing, sdb, 
sdc])  i tried to rebuild the ext4 filesystem from the earliest 
superblock using):

```
for i in $(seq 0 64000);do echo $i;e2fsck -b $i /dev/vg0/data;done
#and then
e2fsck -b 32XXX /dev/vg0/data -y
```

Each time the superblock was found aroud block 32000 , with a little 
difference between the two attempts.

I let it run, it ran fixing/deleting... inodes for 3 hours (from the 
output, one out of 10 inodes was modified during the repair), after 3 
hours it was still at ~22 000 000 inodes so I guess the disk structure 
is incorrect, I expected the repair to  be a lot shorter with correct 
structure.

I completely restored the disk between and after the tests with dd.

Part 3: current situation
=========================

So What I have:

- all three raid superblocks are screwed and were overwritten without 
backup, but I have the commands used to build the initial backup
- I have all the incremental files for the lvm2 structure and the latest 
file matches the ext4 superblocks found on the disks
- I have "nearly" complete backup of the three raid5 disks:
   - one is good appart from the raid superblock (sdb)
   - one is missing ~1 GB at the start (sdc)
   - one is missing ~120 GB at the start of the array, I have marked 
this disk as missing for all my tests

but I cannot find my data.

additional system info:
the machine is running with an amd64 debian jessie with backports 
enabled, mdadm is the standard debian: v3.3.2 - 21st August 2014

I put here the relevant part of the lvm backup and archive files (I can 
provide the full files if necessary)

before extension:

```
physical_volumes {

pv0 {
         id = "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d"
         device = "/dev/md0"     # Hint only

         status = ["ALLOCATABLE"]
         flags = []
         dev_size = 5860270080   # 2.7289 Terabytes
         pe_start = 2048
         pe_count = 715364       # 2.7289 Terabytes
         }
}

logical_volumes {

data {
id = "OwfU2H-UStb-fkaD-EAvk-fetk-CiOk-xkaWkA"
creation_time = 1494949403      # 2017-05-16 17:43:23 +0200
segment_count = 1

segment1 {
         start_extent = 0
         extent_count = 681575   # 2.6 Terabytes

         type = "striped"
         stripe_count = 1        # linear

         stripes = [
                 "pv0", 0
         ]
```

after extension:

```
pv0 {
         id = "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d"
         device = "/dev/md0"     # Hint only

         status = ["ALLOCATABLE"]
         flags = []
         dev_size = 11720538112  # 5.4578 Terabytes
         pe_start = 2048
         pe_count = 1430729      # 5.4578 Terabytes
}
}

logical_volumes {

data {
creation_time = 1494949403      # 2017-05-16 17:43:23 +0200
segment_count = 1

segment1 {
         start_extent = 0
         extent_count = 1065575  # 4.06485 Terabytes

         type = "striped"
         stripe_count = 1        # linear

         stripes = [
                 "pv0", 0
         ]
```

from the raid wiki, I believe only this information is useful as the 
raid superblocks are wrong:

```
PCI [ahci] 00:11.4 SATA controller: Intel Corporation Wellsburg sSATA 
Controller [AHCI mode] (rev 05)
├scsi 0:0:0:0 ATA      Crucial_CT1050MX {1651150EFB63}
│└sda 978.09g [8:0] Partitioned (gpt)
│ ├sda1 512.00m [8:1] vfat {5AB9-E482}
│ │└Mounted as /dev/sda1 @ /boot/efi
│ ├sda2 29.80g [8:2] ext4 {f8f9eb9a-fc49-4b2b-8c8c-27278dfc7f29}
│ │└Mounted as /dev/sda2 @ /
│ ├sda3 29.80g [8:3] swap {1ea1c6c1-7ec7-49cc-8696-f1fb8fb6e7b0}
│ └sda4 917.98g [8:4] PV LVM2_member 910.83g used, 7.15g free 
{TJKWU2-oTcU-mSBC-sGHz-ZTg7-8HoY-u0Tyjj}
│  └VG vg_ssd 917.98g 7.15g free {hguqji-h777-K0yt-gjma-gEbO-HUfw-NU9aRK}
│   └redacted
├scsi 1:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3EP81NC}
│└sdb 2.73t [8:16] Partitioned (gpt)
│ ├sdb1 2.37g [8:17] ext4 '1.42.6-5691' 
{30ef58b3-1e3f-4f33-ade7-7365ebd8c427}
│ ├sdb2 2.00g [8:18] Empty/Unknown
│ └sdb3 2.72t [8:19] Empty/Unknown
├scsi 2:0:0:0 ATA      WDC WD30EFRX-68E {WD-WMC4N1087039}
│└sdc 2.73t [8:32] Partitioned (gpt)
└scsi 3:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N3EP8C25}
  └sdd 2.73t [8:48] Partitioned (gpt)
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation Wellsburg 6-Port 
SATA Controller [AHCI mode] (rev 05)
├scsi 4:x:x:x [Empty]
├scsi 5:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N0XARKSC}
│└sde 2.73t [8:64] Partitioned (gpt)
│ ├sde1 2.37g [8:65] ext4 '1.42.6-5691' 
{30ef58b3-1e3f-4f33-ade7-7365ebd8c427}
│ ├sde2 2.00g [8:66] Empty/Unknown
│ └sde3 2.72t [8:67] Empty/Unknown
├scsi 6:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N7KPUH6U}
│└sdf 2.73t [8:80] Partitioned (gpt)
├scsi 7:0:0:0 ATA      WDC WD30EFRX-68E {WD-WCC4N1TYVTEN}
│└sdg 2.73t [8:96] Partitioned (gpt)
├scsi 8:x:x:x [Empty]
└scsi 9:x:x:x [Empty]
Other Block Devices
└md0 0.00k [9:0] MD vnone  () clear, None (None) None {None}
                  Empty/Unknown
```


I am currently digging on the mailing list archive to find more 
information and things to test.

Part 4: Questions
==================

- How much am I screwed ? Do you believe I can still get most of my data 
back, what about the ext4 folder tree ?

- what should be my next steps (I would be happy to use any link to 
relevant software/procedures).

- Is all the necessary information here or should I gather additional 
information before continuing

- I am a the point where hiring somebody / a company with better 
experience than mine to solve this issue is necessary. If yes who would 
you advise, if this is an allowed question on the mailing list ?


Thank you for reading me down to this point, and thank you for your 
answer if you can take the time to answer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-03 19:46 Recovery after accidental raid5 superblock rewrite Paul Tonelli
@ 2017-06-03 21:20 ` Andreas Klauer
  2017-06-03 22:33   ` Paul Tonelli
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-03 21:20 UTC (permalink / raw)
  To: Paul Tonelli; +Cc: linux-raid

On Sat, Jun 03, 2017 at 09:46:44PM +0200, Paul Tonelli wrote:
> I am trying to recover an ext4 partition on lvm2 on raid5.

Okay, your mail is very long, still unclear in places.

This was all done recently? So we do not have to consider that mdadm 
changed its defaults in regards to metadata versions, offsets, ...?

In that case I might have good news for you. 
Provided you didn't screw anything else up.

> ```
> mdadm --create --verbose --force --assume-clean /dev/md0 --level=5 
> --raid-devices=2  /dev/sdb /dev/sdc
> ```

You're not really supposed to do that.
( https://unix.stackexchange.com/a/131927/30851 )

> I immediately made backups of the three disks to spares using dd

This is a key point. If those backups are not good, you have lost.

> I made another mistake during the 3 days I spent trying to recover the 
> data, I switched two disks ids in a dd command and overwrite the first 
> 800Mb or so of disk c:

Just to confirm, this is somehow not covered by your backups?

> Part 2: What I tried
> ====================

In a data recovery situation there is one thing you should absolutely not do.
That is writing to your disks. Please use overlays in the future...
( https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Making_the_harddisks_read-only_using_an_overlay_file )

Your experiments wrote all sorts of nonsense to your disks.
As stated above, now it all depends on the backups you made...

>    - one is missing ~120 GB at the start of the array, I have marked 
> this disk as missing for all my tests

Maybe good news for you. Provided those backups are still there.

If I understood your story correctly, then this disk has good data.

RAID5 parity is a simple XOR. a XOR b = c

You had a RAID 5 that was fully grown, fully synced. 
You re-created it with the correct drives but wrong disk order. 
This started a sync.

The sync should have done a XOR b = c (only c is written to disk c)
Wrong order you did c XOR b = a (only a is written to disk a)

It makes no difference. Either way it wrote the data that was already there. 
Merely the data representation (what you got from /dev/md0) was garbage.

As long as you did not write anything to /dev/md0 when you couldn't mount, 
you're good right here. You just have to put the disks in correct order.

Proof:

--- Step 1: RAID Creation ---

# truncate -s 100M a b c
# losetup --find --show a
/dev/loop0
# losetup --find --show b
/dev/loop1
# losetup --find --show c
/dev/loop2
# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mdadm --wait /dev/md42
# mkfs.ext4 /dev/md42
# mount /dev/md42 loop/
# echo I am selling these fine leather jackets... > loop/somefile.txt
# umount loop/
# mdadm --stop /dev/md42

--- Step 2: Foobaring it up (wrong disk order) ---

# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop2 /dev/loop1 /dev/loop0
mdadm: /dev/loop2 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
mdadm: /dev/loop1 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
mdadm: /dev/loop0 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mdadm --wait /dev/md42
# mount /dev/md42 loop/
mount: wrong fs type, bad option, bad superblock on /dev/md42,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
# mdadm --stop /dev/md42

--- Step 3: Pulling the rabbit out of the hat (correct order, even one missing) ---

# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 missing /dev/loop2
mdadm: /dev/loop0 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Sat Jun  3 23:04:35 2017
mdadm: /dev/loop2 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Sat Jun  3 23:04:35 2017
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mount /dev/md42 loop/
# cat loop/somefile.txt 
I am selling these fine leather jackets...

> - I am a the point where hiring somebody / a company with better 
> experience than mine to solve this issue is necessary. If yes who would 
> you advise, if this is an allowed question on the mailing list ?

Oh. I guess should have asked for money first? Damn.

Seriously though. I don't know if the above will solve your issue. 
It is certainly worth a try. And if it doesn't work it probably means 
something else happened... in that case chances of survival are low.

Pictures (if they are small / unfragmented, with identifiable headers, 
i.e. JPEGs not RAWs) can be recovered but not their filenames / order. 

Filesystem with first roughly 2GiB missing... filesystems _HATE_ that.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-03 21:20 ` Andreas Klauer
@ 2017-06-03 22:33   ` Paul Tonelli
  2017-06-03 23:29     ` Andreas Klauer
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-03 22:33 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Thank you for your answer and your time.

On 06/03/2017 11:20 PM, Andreas Klauer wrote:
> On Sat, Jun 03, 2017 at 09:46:44PM +0200, Paul Tonelli wrote:
>> I am trying to recover an ext4 partition on lvm2 on raid5.
> Okay, your mail is very long, still unclear in places.
>
> This was all done recently? So we do not have to consider that mdadm
> changed its defaults in regards to metadata versions, offsets, ...?
Correct, no change in version of mdadm, kernel, lvm or any other thing, 
mdadm was installed on the machine on the day the raid was created and 
it has not been upgraded since (I checked comparing the apt log 
timestamps and the lvm metadata files).
> In that case I might have good news for you.
> Provided you didn't screw anything else up.
>
>> ```
>> mdadm --create --verbose --force --assume-clean /dev/md0 --level=5
>> --raid-devices=2  /dev/sdb /dev/sdc
>> ```
> You're not really supposed to do that.
> ( https://unix.stackexchange.com/a/131927/30851 )
I know that, now :-/. This was done before the backups.
>
>> I immediately made backups of the three disks to spares using dd
> This is a key point. If those backups are not good, you have lost.
I did backups (just after erasing the raid superblock ), and still have 
them, I have been using them as a reference for all the later tests.
>> I made another mistake during the 3 days I spent trying to recover the
>> data, I switched two disks ids in a dd command and overwrite the first
>> 800Mb or so of disk c:
> Just to confirm, this is somehow not covered by your backups?
Right, this is not covered by my backups. i mistakenly copied from the 
disks I was experimenting with to one of the backup and not the opposite 
once (third mistake is working too late in the evening)

I am still searching for a way to put a complete block device (/dev/sdX) 
read-only for these tests, I believe using overlays is the solution.
>> Part 2: What I tried
>> ====================
> In a data recovery situation there is one thing you should absolutely not do.
> That is writing to your disks. Please use overlays in the future...
> ( https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Making_the_harddisks_read-only_using_an_overlay_file )
Point taken, I had done the copy the whole disk way, will try this for 
my next tests.
> Your experiments wrote all sorts of nonsense to your disks.
> As stated above, now it all depends on the backups you made...
Appart from the error just on top, I always used a copy of the original, 
the originals are still available.
>>     - one is missing ~120 GB at the start of the array, I have marked
>> this disk as missing for all my tests
> Maybe good news for you. Provided those backups are still there.
>
> If I understood your story correctly, then this disk has good data.
>
> RAID5 parity is a simple XOR. a XOR b = c
>
> You had a RAID 5 that was fully grown, fully synced.
Actually, this is one question I have: with mdadm, creating a raid5 with 
two disks and then growing it to 3 creates exactly the same structure as 
creating directly a 3 disk raid5 ? Your message seems to say it is the 
same thing.
> You re-created it with the correct drives but wrong disk order.
> This started a sync.
>
> The sync should have done a XOR b = c (only c is written to disk c)
> Wrong order you did c XOR b = a (only a is written to disk a)
>
> It makes no difference. Either way it wrote the data that was already there.
> Merely the data representation (what you got from /dev/md0) was garbage.
>
> As long as you did not write anything to /dev/md0 when you couldn't mount,
> you're good right here. You just have to put the disks in correct order.
>
> Proof:
>
> --- Step 1: RAID Creation ---
>
> # truncate -s 100M a b c
> # losetup --find --show a
> /dev/loop0
> # losetup --find --show b
> /dev/loop1
> # losetup --find --show c
> /dev/loop2
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mdadm --wait /dev/md42
> # mkfs.ext4 /dev/md42
> # mount /dev/md42 loop/
> # echo I am selling these fine leather jackets... > loop/somefile.txt
> # umount loop/
> # mdadm --stop /dev/md42
>
> --- Step 2: Foobaring it up (wrong disk order) ---
>
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop2 /dev/loop1 /dev/loop0
> mdadm: /dev/loop2 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
> mdadm: /dev/loop1 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
> mdadm: /dev/loop0 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Sat Jun  3 23:01:31 2017
> Continue creating array? yes
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mdadm --wait /dev/md42
> # mount /dev/md42 loop/
> mount: wrong fs type, bad option, bad superblock on /dev/md42,
>         missing codepage or helper program, or other error
>
>         In some cases useful info is found in syslog - try
>         dmesg | tail or so.
> # mdadm --stop /dev/md42
>
> --- Step 3: Pulling the rabbit out of the hat (correct order, even one missing) ---
>
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 missing /dev/loop2
> mdadm: /dev/loop0 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Sat Jun  3 23:04:35 2017
> mdadm: /dev/loop2 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Sat Jun  3 23:04:35 2017
> Continue creating array? yes
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mount /dev/md42 loop/
> # cat loop/somefile.txt
> I am selling these fine leather jackets...
Thanks you
>> - I am a the point where hiring somebody / a company with better
>> experience than mine to solve this issue is necessary. If yes who would
>> you advise, if this is an allowed question on the mailing list ?
> Oh. I guess should have asked for money first? Damn.
>
> Seriously though. I don't know if the above will solve your issue.
> It is certainly worth a try. And if it doesn't work it probably means
> something else happened... in that case chances of survival are low.
>
> Pictures (if they are small / unfragmented, with identifiable headers,
> i.e. JPEGs not RAWs) can be recovered but not their filenames / order.
for my case, losing even 20% of the pictures is not an issue, the 
filename / order / directory tree is more important.
>
> Filesystem with first roughly 2GiB missing... filesystems _HATE_ that.
Thank you, so from what you told me, the next steps should be to :
- start using overlays as described in the wiki this will save me a lot 
of time.
   - use the correct disk (with only the raid superblock missing)
   - use the disk which was partially xor-ed during the sync as this has 
no impact on the data
   - do not use the disk with the first GB missing
- try rebuilding the raid with these disks by doing all 6 combinations ?

I will try this tomorrow and update depending on the result.

I have gathered a second question from my unsuccessful tests and search:

Is it possible to copy only a raid superblock from one disk to another 
directly using dd ? after reading on the wiki that the raid superblock 
was 256 bytes long + 2 for each device, I tried:

```
dd if=/dev/sdx of=/dev/sdy count=262 iflag=count_bytes
```

but it did not copy the superblock correctly (mdadm did not find it), 
There may be an offset or something missing.

Thank you again for your time, I will try this tomorrow after a good 
night sleep. It will be less risky.

> Regards
> Andreas Klauer



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-03 22:33   ` Paul Tonelli
@ 2017-06-03 23:29     ` Andreas Klauer
  2017-06-04 22:58       ` Paul Tonelli
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-03 23:29 UTC (permalink / raw)
  To: Paul Tonelli; +Cc: linux-raid

On Sun, Jun 04, 2017 at 12:33:43AM +0200, Paul Tonelli wrote:
> I am still searching for a way to put a complete block device (/dev/sdX) 
> read-only for these tests, I believe using overlays is the solution.

Yes.

Overlays are extremely useful for data recovery.

It is unfortunate there is no standard tool to manage them easily.
The "overlay functions" in the wiki come close but people don't find 
out about it until it's too late.

> Actually, this is one question I have: with mdadm, creating a raid5 with 
> two disks and then growing it to 3 creates exactly the same structure as 
> creating directly a 3 disk raid5 ? Your message seems to say it is the 
> same thing.

Good catch. It would probably move the data offset.

# truncate -s 3TB a b c
# mdadm --create /dev/md42 --level=5 --raid-devices=2 /dev/loop[01]
# mdadm --examine /dev/loop0
    Data Offset : 262144 sectors
# mdadm --grow /dev/md42 --raid-devices=3 --add /dev/loop2
# mdadm --examine /dev/loop0
    Data Offset : 262144 sectors
     New Offset : 260096 sectors

So on re-create you have to find and specify the correct --data-offset.

How to determine the correct data offset? See if you can find LVM magic 
string "LABELONE" in the first 256MiB of the two disks you didn't 
dd-overwrite. That minus 512 bytes should be the correct offset.

# hexdump -C /dev/some-pv
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200  4c 41 42 45 4c 4f 4e 45  01 00 00 00 00 00 00 00  |LABELONE........|
00000210  55 87 20 ff 20 00 00 00  4c 56 4d 32 20 30 30 31  |U. . ...LVM2 001|

If unlucky it just happened to be in the drive you overwrite. 
In that case you have to xor the two others.

> Is it possible to copy only a raid superblock from one disk to another 
> directly using dd ?

Each superblock is unique (differs in device role and checksum at minimum). 
So copying superblocks usually is not a thing. Even copying drives can 
result in a mess (UUIDs are no longer unique, you have little / no control 
which drive will actually be used). This is also a problem you might 
encounter with overlays in conjunction with autoassembly/automount magicks 
that might be running in the background.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-03 23:29     ` Andreas Klauer
@ 2017-06-04 22:58       ` Paul Tonelli
  2017-06-05  9:24         ` Andreas Klauer
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-04 22:58 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Hello again, things are somewhat improving, I haven't got my data but I 
have got things to do:

On 06/04/2017 01:29 AM, Andreas Klauer wrote:
> Good catch. It would probably move the data offset.
>
> # truncate -s 3TB a b c
> # mdadm --create /dev/md42 --level=5 --raid-devices=2 /dev/loop[01]
> # mdadm --examine /dev/loop0
>      Data Offset : 262144 sectors
> # mdadm --grow /dev/md42 --raid-devices=3 --add /dev/loop2
> # mdadm --examine /dev/loop0
>      Data Offset : 262144 sectors
>       New Offset : 260096 sectors
>
> So on re-create you have to find and specify the correct --data-offset.
>
> How to determine the correct data offset? See if you can find LVM magic
> string "LABELONE" in the first 256MiB of the two disks you didn't
> dd-overwrite. That minus 512 bytes should be the correct offset.
>
> # hexdump -C /dev/some-pv
> 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> *
> 00000200  4c 41 42 45 4c 4f 4e 45  01 00 00 00 00 00 00 00  |LABELONE........|
> 00000210  55 87 20 ff 20 00 00 00  4c 56 4d 32 20 30 30 31  |U. . ...LVM2 001|
>
> If unlucky it just happened to be in the drive you overwrite.
> In that case you have to xor the two others.
I  did find this one at that exact offset 26096 = (133169664 - 512)/512 
on the sdc drive (I rebuilt the xor by rebuilding  the raid, using 
overlays this time, it speeds up things a lot compared to my previous tests)

so this matches the commands. Now, after doing a pvcreate and a 
vgcfgrestore using the backup lvm file, e2fsck still refuses to run, but 
testdisk now finds a lot more superblocks:

    ext4                      267262 8729457661 8729190400
    ext4                      791550 8729981949 8729190400
    ext4                     1315838 8730506237 8729190400
    ext4                     1840126 8731030525 8729190400
    ext4                     6034430 8735224829 8729190400
    ext4                     6558718 8735749117 8729190400
    ext4                    12325886 8741516285 8729190400
    ext4                    20714494 8749904893 8729190400
    ext4                    32248830 8761439229 8729190400

  I also find the following backups superblocks when looping with e2fsck 
-b XXX:

using testdisk:

```
33408
98944
164480
230016
...
```

and they always have a +640 difference with what a new ext4 I would 
create on this volume doing:

```
mke2fs /dev/vg0/data -n:
Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
         4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 
78675968,
         102400000, 214990848, 512000000, 550731776, 644972544
```

So I believe I still have an offset issue of 640 sectors or something 
like this. Still digging this issue

I have tried to rebuild the ext4 superblock on top of the logical volume 
with the  "offset" option of mke2fs.ext4, but this did not work (e2fsck 
is still not running by itself):


```

mke2fs -E offset=640 -n -S /dev/vg0/data

```


I am still digging around to know where this 640 offset comes from, but 
as the last mail made me go forward significantly (thank you Andreas), I 
am trying again.

thanks again for reading me to this point.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-04 22:58       ` Paul Tonelli
@ 2017-06-05  9:24         ` Andreas Klauer
  2017-06-05 23:24           ` Paul Tonelli
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-05  9:24 UTC (permalink / raw)
  To: Paul Tonelli; +Cc: linux-raid

On Mon, Jun 05, 2017 at 12:58:03AM +0200, Paul Tonelli wrote:
> I  did find this one at that exact offset 26096 = (133169664 - 512)/512 

26096->260096? Okay.

> Now, after doing a pvcreate and a vgcfgrestore ...

Should not be necessary if the data on two drives was okay? 
You did leave the dd-overwritten drive out as missing, right? 
Do you have the correct disk order and chunk size and offset?

You have to be 140% sure the RAID itself is running correctly, 
otherwise all other steps are bound to fail.

If you run photorec on the RAID and it manages to recover 
a file that is larger than number of drives * chunk size 
and intact, you can have some confidence the RAID is okay 
in terms of disk order and chunk size - the offset may 
still be off by multiple of stripe alignment but if the 
offset is correct too, file -s /dev/md0 should say LVM PV 
in your case.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-05  9:24         ` Andreas Klauer
@ 2017-06-05 23:24           ` Paul Tonelli
  2017-06-05 23:56             ` Andreas Klauer
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-05 23:24 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Hi again Andreas (and anybody else reading)

On 06/05/2017 11:24 AM, Andreas Klauer wrote:
> On Mon, Jun 05, 2017 at 12:58:03AM +0200, Paul Tonelli wrote:
>> I  did find this one at that exact offset 26096 = (133169664 - 512)/512
> 26096->260096? Okay.
Yes, sorry this was a typo, the superblock was exactly where you 
expected it, I did the following procedure:

```
export $DEVICES="/dev/sdc /dev/sdd /dev/sdb"
/root/create_overlay.sh
mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 
missing /dev/mapper/sdd /dev/mapper/sdb
mdadm --add /dev/md0  /dev/mapper/sdc
#sleep 30s
mdadm --stop /dev/md0
xxd -u /dev/mapper/sdc | grep  -C 3 'LABELONE'
 >7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
 >7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
 >7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........
 >7f00210: B5E5 02EA 2000 0000 4C56 4D32 2030 3031  .... ...LVM2 001
 >7f00220: 5132 5A33 3244 6979 506A 3951 5970 7558 Q2Z32DiyPj9QYpuX
 >7f00230: 4243 7130 3265 7338 514B 3665 7176 3464 BCq02es8QK6eqv4d
```

I have let it run for a longer time (10 min) and it does not find 
another LABELONE after that. (I also tried to find LABELONE on the other 
disks without success).


>> Now, after doing a pvcreate and a vgcfgrestore ...
> Should not be necessary if the data on two drives was okay?
> You did leave the dd-overwritten drive out as missing, right?
> Do you have the correct disk order and chunk size and offset?
I believe you are right, the issue is still the raid: I have tried 
photorec and most files I have opened look like they have been 
truncated. I have checked again the commands used to build the raid, (I 
had initially extracted them from bash_history, but I also have a record 
with timestamps in /var/log/auth.log because of sudo).

and all the commands match ( I have put all the commands used on the 
raid at the end of the mail )
> You have to be 140% sure the RAID itself is running correctly,
> otherwise all other steps are bound to fail.
>
> If you run photorec on the RAID and it manages to recover
> a file that is larger than number of drives * chunk size
> and intact, you can have some confidence the RAID is okay
> in terms of disk order and chunk size - the offset may
> still be off by multiple of stripe alignment but if the
> offset is correct too, file -s /dev/md0 should say LVM PV
> in your case.
Photorec does not: I get only small files when I use photorec. The only 
"big" files it recovers are tar (and I believe that is because it does 
not check integrity of the files). I have left it running for ~10G of 
files were recovered.

 From what I understand, finding the "LABELONE" at the correct position 
shows:

- apart from the raid superblock, the disks I use (sdd and sdb) have not 
been erased  (as sdc is rebuilt from xor)
- the offset to build the raid is correct (I find all ext4 backup 
superblock when doing the testdisk / checking with e2fsck -b XXX ), with 
the issue of the 640 ext4 chunk offset.
- the first disk of the array is sdc (as the "LABELONE" can only be 
found on this disk)

so I may need to check the other parame

  /dev/md0 should say LVM PV

I believe my best option is to do a script to explore several parameters 
using the overlay and see if one finds the correct data. I would use as 
parameters:
- disk order
- raid chunk size

Would you test any other parameters ?

In the things I have tried, I used the offset option on an losetup to 
align the backup superblocks from the logical volume, e2fsck then runs 
directly, but it then refuse to continue because of the wrong volume 
size, so I do not think this is the right solution, and the 640 ext4 
offset cannot easily be added to lvm, I agree with your previous comment 
that it is still on the raid level there are issues.

Thanks (again) for reading me to this point, I will still happily take 
any advice, and update once I have written and run my script.

Dump of raid building commands
=====================

part 1
---------

sudo commands which created the array:

```
May 16 17:38:47 redacted sshd[10978]: pam_unix(sshd:session): session 
opened for user skoos by (uid=0)
May 16 17:38:47 redacted systemd-logind[1071]: New session 754 of user 
skoos.
May 16 17:41:29 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/usr/bin/apt-get install mdadm
May 16 17:41:29 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:41:45 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:41:48 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/mdadm --create --verbose --force 
--assume-clean /dev/md0 -
-level=5 --raid-devices=2 /dev/sdb /dev/sdc
May 16 17:41:48 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:42:08 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:42:10 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/mdadm --create --verbose --force 
--assume-clean /dev/md0 -
-level=5 --raid-devices=2 /dev/sdb /dev/sdc
May 16 17:42:10 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:42:13 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:42:43 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/pvcreate /dev/md0
May 16 17:42:43 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:42:44 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:42:49 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/vgdisplay
May 16 17:42:49 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:42:49 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:43:02 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/vgcreate vg0 /dev/md0
May 16 17:43:02 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:43:02 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:43:23 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/lvcreate -L 2.6T -n data /dev/vg0
May 16 17:43:23 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:43:23 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:43:32 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/sbin/mkfs.ext4 /dev/vg0/data
May 16 17:43:32 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:43:48 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:45:34 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ; 
USER=root ; COMMAND=/bin/mount /dev/vg0/data /mnt/data
May 16 17:45:34 redacted sudo: pam_unix(sudo:session): session opened 
for user root by user(uid=0)
May 16 17:45:35 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 16 17:46:08 redacted sudo: user : TTY=pts/103 ; PWD=/mnt/data ; 
USER=root ; COMMAND=/usr/bin/rsync -avH /srv/data/ /mnt/data
```

part 2
---------

command to rebuild the array after reboot (the array was not detected 
any longer after reboot, but once the following command was run, it 
mounted fine):

```
May 23 13:23:55 redacted sudo: user : TTY=pts/1 ; PWD=/home/user ; 
USER=root ; COMMAND=/sbin/mdadm --create --verbose /dev/md0 --level=5 
--raid-devices=2 /dev/sdb /dev/sdc
```

commands used to grow with third disk

```
May 24 21:04:10 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --add /dev/md0 /dev/sdd
May 24 21:04:10 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:10 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:04:15 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm /dev/md0
May 24 21:04:15 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:15 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:04:20 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm /dev/md0 --detail
May 24 21:04:20 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:20 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:04:26 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 24 21:04:26 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:26 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:04:46 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --gros /dev/md0 /dev/sdd
May 24 21:04:46 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:46 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:04:49 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --grow /dev/md0 /dev/sdd
May 24 21:04:49 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:04:49 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:05:16 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --grow --raid-devices=3 /dev/md0
May 24 21:05:16 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:05:17 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:05:19 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 24 21:05:19 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 24 21:05:19 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 24 21:05:41 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
```

part 3
---------

commands used to resize after adding the third disk

```
May 29 09:45:20 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 29 09:45:20 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:45:21 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:45:37 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:45:37 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:45:38 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:45:43 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
May 29 09:45:43 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:45:49 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:46:02 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:46:02 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:46:03 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:46:05 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
May 29 09:46:05 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:46:12 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:46:14 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:46:14 redacted sudo: pam_unix(sudo:session): session opened 
for user root by (uid=0)
May 29 09:46:15 redacted sudo: pam_unix(sudo:session): session closed 
for user root
May 29 09:46:16 redacted sudo:     root : TTY=pts/30 ; PWD=/root ; 
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
```


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-05 23:24           ` Paul Tonelli
@ 2017-06-05 23:56             ` Andreas Klauer
  2017-06-10 20:04               ` Paul Tonelli
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-05 23:56 UTC (permalink / raw)
  To: Paul Tonelli; +Cc: linux-raid

On Tue, Jun 06, 2017 at 01:24:41AM +0200, Paul Tonelli wrote:
> mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 missing /dev/mapper/sdd /dev/mapper/sdb

You did not specify the --data-offset here?
Check mdadm --examine to make sure which offset it's using.

> xxd -u /dev/mapper/sdc | grep  -C 3 'LABELONE'
>  >7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>  >7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>  >7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........

If correct this should appear at the start of /dev/md0 (even w/o adding sdc).

LABELONE should appear on the first drive so this should not be wrong, 
however the sdd sdb could still have switched  order, and of course 
the chunk size could be different (although unlikely according to your log).

> I believe you are right, the issue is still the raid: I have tried 
> photorec and most files I have opened look like they have been 
> truncated.

Well, files could be fragmented, there's a certain sweet spot 
(like - megapixel JPEGs of few megs size) where it's sufficiently 
unlikely to be a problem.

I don't know what files you have, if it's movies it would be okay 
too if the first few megs of the file were playable.

> - apart from the raid superblock, the disks I use (sdd and sdb) have not 
> been erased  (as sdc is rebuilt from xor)

It only rebuilds starting from offset. So it should not have covered that 
offset if you did not specify it. Check it's not there before you --add. 
If it's there then this is after all not the drive you overwrite with dd?

I am confused now.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-05 23:56             ` Andreas Klauer
@ 2017-06-10 20:04               ` Paul Tonelli
  2017-06-10 20:41                 ` Andreas Klauer
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-10 20:04 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Hello Everybody,

tl dr: I found the --data-offset to allow a e2fsck, but things are not 
as they should be. I still manage to get back data (pictures which are 
 >= 4Mb in size).

On 06/06/2017 01:56 AM, Andreas Klauer wrote:
> On Tue, Jun 06, 2017 at 01:24:41AM +0200, Paul Tonelli wrote:
>> mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 missing /dev/mapper/sdd /dev/mapper/sdb
> You did not specify the --data-offset here?
> Check mdadm --examine to make sure which offset it's using.
The objective was just to reverse the /dev/sdc destruction using the xor 
on /dev/sdb and /dev/sdc, I believe the default offset is earlier than 
the one I specify by hand.
>> xxd -u /dev/mapper/sdc | grep  -C 3 'LABELONE'
>>   >7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>>   >7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>>   >7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........
> If correct this should appear at the start of /dev/md0 (even w/o adding sdc).
>
> LABELONE should appear on the first drive so this should not be wrong,
> however the sdd sdb could still have switched  order, and of course
> the chunk size could be different (although unlikely according to your log).
Even after rebuilding the disks, I find the LABELONE at 7f00200. But it 
does not provide a recoverable LVM / ext4 filesystem (the backup 
superblock offset is incorrect for this partition size)
>> I believe you are right, the issue is still the raid: I have tried
>> photorec and most files I have opened look like they have been
>> truncated.
> Well, files could be fragmented, there's a certain sweet spot
> (like - megapixel JPEGs of few megs size) where it's sufficiently
> unlikely to be a problem.
>
> I don't know what files you have, if it's movies it would be okay
> too if the first few megs of the file were playable.
I have multiple assemblies with a custom data-offset in which a e2fsck 
agrees to run from a backup superblock (I have brute forced the tests)

the possible assemblies are (offset, disk order):

261120 missing /dev/mapper/sdc /dev/mapper/sdd
261120 missing /dev/mapper/sdd /dev/mapper/sdc
261120 /dev/mapper/sdb missing /dev/mapper/sdd
261120 /dev/mapper/sdb /dev/mapper/sdd missing

I have already found bmp/png which are > 4MB by using photorec, so I 
think I am going forward.

Right now, I am running e2fsck -n to see which order returns the least 
errors, I will then try to  get back as many files as I can, still using 
overlay.

>> - apart from the raid superblock, the disks I use (sdd and sdb) have not
>> been erased  (as sdc is rebuilt from xor)
> It only rebuilds starting from offset. So it should not have covered that
> offset if you did not specify it. Check it's not there before you --add.
> If it's there then this is after all not the drive you overwrite with dd?
I believe the offset we manually specify is after the default one on a 
raid assembly with 3 disks, so it should have rebuilt the previous LVM 
superblock, or am I missing something (again).
> I am confused now.
So am I, and I am afraid I have used all the time I could spend to get 
this data back. Thanks to your help, I can still recover many files, 
even if it is not a full filesystem :-). Thank you !

Paul



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-10 20:04               ` Paul Tonelli
@ 2017-06-10 20:41                 ` Andreas Klauer
  2017-07-31 19:57                   ` Paul Tonelli
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-10 20:41 UTC (permalink / raw)
  To: Paul Tonelli; +Cc: linux-raid

On Sat, Jun 10, 2017 at 10:04:16PM +0200, Paul Tonelli wrote:
> I believe the offset we manually specify is after the default one on a 
> raid assembly with 3 disks, so it should have rebuilt the previous LVM 
> superblock, or am I missing something (again).

It should be the reverse. When you grow the raid, the offset shrinks.

Not able to provide further insights. You have a LVM2 header which, 
given the correct offset, should appear on the /dev/md device and 
in that case everything else should appear too.

If that is not the case then things will be a lot more complicated.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-06-10 20:41                 ` Andreas Klauer
@ 2017-07-31 19:57                   ` Paul Tonelli
  2017-07-31 20:35                     ` Wols Lists
  2017-08-01 14:01                     ` Phil Turmel
  0 siblings, 2 replies; 13+ messages in thread
From: Paul Tonelli @ 2017-07-31 19:57 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

Hello (again)

Sorry to resuscitate this topic back from the dead, but I have again the 
same issue.

TL;DR

The difference compared to last time:
- I created a clean raid a few days back
- the data is completely backed up and available
- I can actually access the data

but I still have no clue about what went wrong.

1) I created the raid:

sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 
/dev/sdb /dev/sdc /dev/sdd

without lvm this time, juste a single ext4 partition

mkfs.ext4 /dev/md0
mount /dev/md0 /srv/data

I copied 3Tb on it

I just rebooted the machine

2) and (again) no md0 assembled at boot:

mdadm -E /dev/sd[bcd]
/dev/sdb:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdc:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)


This time, I am able to get my data back. I first create the overlays and:

mdadm --create --verbose /dev/md0 --assume-clean --level=5 
--raid-devices=3 /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd
mount /dev/md0 /srv/data

# mdadm --detail --scan /dev/md0
ARRAY /dev/md0 metadata=1.2 name=smic:0 
UUID=2ebab32e:82283ac5:4232d2ee:92abf170

and it mounts, so I did exactly the same using the real disks and 
(again) got my data back. The raid is now running:

# mdadm -E /dev/sdb
/dev/sdb:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
            Name : smic:0  (local to host smic)
   Creation Time : Mon Jul 31 21:36:12 2017
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
      Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : 2d1daf78:1e085799:6f1799b2:e88e60d1

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Jul 31 21:36:19 2017
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 63abce91 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm -E /dev/sdc
/dev/sdc:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
            Name : smic:0  (local to host smic)
   Creation Time : Mon Jul 31 21:36:12 2017
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
      Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : db6d0696:072c5166:1491157d:6ddf34cf

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Jul 31 21:36:19 2017
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 164e0d53 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm -E /dev/sdd
/dev/sdd:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
            Name : smic:0  (local to host smic)
   Creation Time : Mon Jul 31 21:36:12 2017
      Raid Level : raid5
    Raid Devices : 3

  Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
      Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : 0c52bea6:a0cbf060:fe5edff0:4ee71d21

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Jul 31 21:36:19 2017
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : e75866e8 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

but a reboot always brings me back to step 2), with the need to press 
ctrl-D at boot time.

What am I missing? Am I using a bad version of the mdadm or kernel 
(mdadm:amd64 3.3.2-5+deb8u2, kernel 4.8.0-0.bpo.2-amd64) or am I doing 
it wrong in another way ?

best regards,

On 06/10/2017 10:41 PM, Andreas Klauer wrote:
> On Sat, Jun 10, 2017 at 10:04:16PM +0200, Paul Tonelli wrote:
>> I believe the offset we manually specify is after the default one on a
>> raid assembly with 3 disks, so it should have rebuilt the previous LVM
>> superblock, or am I missing something (again).
> It should be the reverse. When you grow the raid, the offset shrinks.
>
> Not able to provide further insights. You have a LVM2 header which,
> given the correct offset, should appear on the /dev/md device and
> in that case everything else should appear too.
>
> If that is not the case then things will be a lot more complicated.
>
> Regards
> Andreas Klauer



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-07-31 19:57                   ` Paul Tonelli
@ 2017-07-31 20:35                     ` Wols Lists
  2017-08-01 14:01                     ` Phil Turmel
  1 sibling, 0 replies; 13+ messages in thread
From: Wols Lists @ 2017-07-31 20:35 UTC (permalink / raw)
  To: Paul Tonelli, Andreas Klauer; +Cc: linux-raid

On 31/07/17 20:57, Paul Tonelli wrote:
> but a reboot always brings me back to step 2), with the need to press
> ctrl-D at boot time.
> 
> What am I missing? Am I using a bad version of the mdadm or kernel
> (mdadm:amd64 3.3.2-5+deb8u2, kernel 4.8.0-0.bpo.2-amd64) or am I doing
> it wrong in another way ?

Okay. Step 2. Does an "mdadm --assemble --scan" work instead? This will
tell us whether your raid array is fine, just that it's not being
properly assembled at boot. Actually, before you do that, try a "mdadm
/dev/md0 --stop".

If the stop then assemble scan works, it tells us that your boot
sequence is at fault, not the array.

Can you post the relevant section of grub.cfg? That might not be
assembling the arrays.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Recovery after accidental raid5 superblock rewrite
  2017-07-31 19:57                   ` Paul Tonelli
  2017-07-31 20:35                     ` Wols Lists
@ 2017-08-01 14:01                     ` Phil Turmel
  1 sibling, 0 replies; 13+ messages in thread
From: Phil Turmel @ 2017-08-01 14:01 UTC (permalink / raw)
  To: Paul Tonelli, Andreas Klauer; +Cc: linux-raid

On 07/31/2017 03:57 PM, Paul Tonelli wrote:
> Hello (again)
> 
> Sorry to resuscitate this topic back from the dead, but I have again the
> same issue.
> 
> TL;DR
> 
> The difference compared to last time:
> - I created a clean raid a few days back
> - the data is completely backed up and available
> - I can actually access the data
> 
> but I still have no clue about what went wrong.
> 
> 1) I created the raid:
> 
> sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
> /dev/sdb /dev/sdc /dev/sdd
> 
> without lvm this time, juste a single ext4 partition
> 
> mkfs.ext4 /dev/md0
> mount /dev/md0 /srv/data
> 
> I copied 3Tb on it
> 
> I just rebooted the machine
> 
> 2) and (again) no md0 assembled at boot:
> 
> mdadm -E /dev/sd[bcd]
> /dev/sdb:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> /dev/sdc:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> /dev/sdd:
>    MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)

You have partition tables on these drives, but you are using the entire
drives when you create your arrays.  You need to zero the first 4k of
each drive to kill off the incorrect partition tables.

You might also have a GPT partition table backup at the end of the disk
that needs to die as well.

Phil

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-01 14:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-03 19:46 Recovery after accidental raid5 superblock rewrite Paul Tonelli
2017-06-03 21:20 ` Andreas Klauer
2017-06-03 22:33   ` Paul Tonelli
2017-06-03 23:29     ` Andreas Klauer
2017-06-04 22:58       ` Paul Tonelli
2017-06-05  9:24         ` Andreas Klauer
2017-06-05 23:24           ` Paul Tonelli
2017-06-05 23:56             ` Andreas Klauer
2017-06-10 20:04               ` Paul Tonelli
2017-06-10 20:41                 ` Andreas Klauer
2017-07-31 19:57                   ` Paul Tonelli
2017-07-31 20:35                     ` Wols Lists
2017-08-01 14:01                     ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).