* Recovery after accidental raid5 superblock rewrite
@ 2017-06-03 19:46 Paul Tonelli
2017-06-03 21:20 ` Andreas Klauer
0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-03 19:46 UTC (permalink / raw)
To: linux-raid
Hello,
I am trying to recover an ext4 partition on lvm2 on raid5. After reading
to find solutions by myself, I tried to go on the freenod irc linux-raid
channel, who advised me to describe my problem here, so here I am.
The first part of the mail describes what led to my issue, the second
part is what I tried to solve it, the third is my current status, the
fourth are the questions. the mail can be read in markdown.
Part I: creation and loss of the array
=============================================
The raid is on 3 sata disks of 3Tb each. It was initialised as:
```
mdadm --create --verbose --force /dev/md0 --level=5 --raid-devices=2
/dev/sdb /dev/sdc
pvcreate /dev/md0
vgcreate vg0 /dev/md0
lvcreate -L 2.5T -n data /dev/vg0 #command guessed from the lvm archives
files
mkfs.ext4 /dev/vg0/data
```
the raid did not initialize correctly at each boot I had to rebuild the
disk using :
```
mdadm --create --verbose --force --assume-clean /dev/md0 --level=5
--raid-devices=2 /dev/sdb /dev/sdc
```
it would then mount without issue (autodetection of the vg and lv worked)
I then extended as follow to add a third disk, (hot plugged to the
system this has its importance). The raid had the time to grow and I was
able to extend everything on top of it:
```
sudo mdadm --add /dev/md0 /dev/sdd
sudo mdadm --grow --raid-devices=3 /dev/md0
sudo lvextend -L +256G /dev/mapper/vg0-data
sudo resize2fs /dev/vg0/data
sudo lvextend -L +256G /dev/mapper/vg0-data
sudo resize2fs /dev/vg0/data
sudo lvresize -L +500G /dev/vg0/data
sudo resize2fs /dev/vg0/data
```
Here the machine crashed for unrelated reasons
The data was not backupped: this was a transition situation where we
regrouped data from several machines and the backup nas was being setup
when this occured (this was the first mistake).
at reboot, I could not reassemble the raid and I did (this was the
second mistake, I had not read the wiki at this time):
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sdb
/dev/sdc /dev/sdd
I realized my error half an hour later when I could not detect any
volume group or mount anything and immediately stopped the rebuild of
drive sdd which was occuring (it was stopped <5%, so the first 5% of
disk sdd are now wrong).
Actually, between the reboots, the hard drive order had changed (because
the disk had been hotplugged initially), the most probable change is:
sdc became sdd
sdb became sdc
sdd became sdb
I immediately made backups of the three disks to spares using dd (to sde
sdf and sdg) and have been testing different methods to get back my data
ever since without success.
I made another mistake during the 3 days I spent trying to recover the
data, I switched two disks ids in a dd command and overwrite the first
800Mb or so of disk c:
```
dd if=/dev/sdc of=/dev/sdf bs=64k count=12500
```
The data contained on the disks is yml files, pictures (lots of it, with
a specific order) and binary files. Recovering of huge yml (Gb long) and
the structure of the filesystem are the most important data.
Part 2: What I tried
====================
The main test has been to rebuild the raid5 with the different possible
disk orders and try to detect data on it.
I tried several disk orders, restored the physical volume, volume group
and logical volume using:
```
mdadm --create --assume-clean --level=5 --raid-devices=3 /dev/md0
/dev/sdc missing /dev/sdb
pvcreate --uuid "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d" --restorefile
/home/ptonelli/backup_raid/lvm/backup/vg0 /dev/md0
vgcfgrestore vg0 -f /home/ptonelli/backup_raid/lvm/backup/vg0
vgchange -a a vg0
testdisk /dev/vg0/data
```
and did a deep scan of the disks and let it run until it reached 3%. I
got the following data (m for missing):
- sdd(m) sdb sdc : 258046 782334 1306622 1830910 2355198
- sdd(m) sdc sdb : 783358 1306622, 23562222
- sdb sdd(m) sdc : 258046 1307646 1830910
- sdc sdd(m) sdb : 259070 783358 1307646 1831834 235622
- sdb sdc sdd(m) : nothing detected
- sdc sdb sdd(m) : 259070 782334 1831934 235198
I wrote only the ext4 superblocks start position returned by testdisk),
the rest of the data was the same each time and matched the ext4
partition size I am trying to recover.
between each test, I restored the disks with the following method:
```
vgchange -a n vg0
mdadm --stop /dev/md0
dd if=/dev/sde of=/dev/sdb bs=64k count=125000
dd if=/dev/sdf of=/dev/sdc bs=64k count=125000
dd if=/dev/sdg of=/dev/sdd bs=64k count=125000
```
on the most promising orders ([sdc, missing, sdb] and [missing, sdb,
sdc]) i tried to rebuild the ext4 filesystem from the earliest
superblock using):
```
for i in $(seq 0 64000);do echo $i;e2fsck -b $i /dev/vg0/data;done
#and then
e2fsck -b 32XXX /dev/vg0/data -y
```
Each time the superblock was found aroud block 32000 , with a little
difference between the two attempts.
I let it run, it ran fixing/deleting... inodes for 3 hours (from the
output, one out of 10 inodes was modified during the repair), after 3
hours it was still at ~22 000 000 inodes so I guess the disk structure
is incorrect, I expected the repair to be a lot shorter with correct
structure.
I completely restored the disk between and after the tests with dd.
Part 3: current situation
=========================
So What I have:
- all three raid superblocks are screwed and were overwritten without
backup, but I have the commands used to build the initial backup
- I have all the incremental files for the lvm2 structure and the latest
file matches the ext4 superblocks found on the disks
- I have "nearly" complete backup of the three raid5 disks:
- one is good appart from the raid superblock (sdb)
- one is missing ~1 GB at the start (sdc)
- one is missing ~120 GB at the start of the array, I have marked
this disk as missing for all my tests
but I cannot find my data.
additional system info:
the machine is running with an amd64 debian jessie with backports
enabled, mdadm is the standard debian: v3.3.2 - 21st August 2014
I put here the relevant part of the lvm backup and archive files (I can
provide the full files if necessary)
before extension:
```
physical_volumes {
pv0 {
id = "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d"
device = "/dev/md0" # Hint only
status = ["ALLOCATABLE"]
flags = []
dev_size = 5860270080 # 2.7289 Terabytes
pe_start = 2048
pe_count = 715364 # 2.7289 Terabytes
}
}
logical_volumes {
data {
id = "OwfU2H-UStb-fkaD-EAvk-fetk-CiOk-xkaWkA"
creation_time = 1494949403 # 2017-05-16 17:43:23 +0200
segment_count = 1
segment1 {
start_extent = 0
extent_count = 681575 # 2.6 Terabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 0
]
```
after extension:
```
pv0 {
id = "Q2Z32D-iyPj-9QYp-uXBC-q02e-s8QK-6eqv4d"
device = "/dev/md0" # Hint only
status = ["ALLOCATABLE"]
flags = []
dev_size = 11720538112 # 5.4578 Terabytes
pe_start = 2048
pe_count = 1430729 # 5.4578 Terabytes
}
}
logical_volumes {
data {
creation_time = 1494949403 # 2017-05-16 17:43:23 +0200
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1065575 # 4.06485 Terabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 0
]
```
from the raid wiki, I believe only this information is useful as the
raid superblocks are wrong:
```
PCI [ahci] 00:11.4 SATA controller: Intel Corporation Wellsburg sSATA
Controller [AHCI mode] (rev 05)
├scsi 0:0:0:0 ATA Crucial_CT1050MX {1651150EFB63}
│└sda 978.09g [8:0] Partitioned (gpt)
│ ├sda1 512.00m [8:1] vfat {5AB9-E482}
│ │└Mounted as /dev/sda1 @ /boot/efi
│ ├sda2 29.80g [8:2] ext4 {f8f9eb9a-fc49-4b2b-8c8c-27278dfc7f29}
│ │└Mounted as /dev/sda2 @ /
│ ├sda3 29.80g [8:3] swap {1ea1c6c1-7ec7-49cc-8696-f1fb8fb6e7b0}
│ └sda4 917.98g [8:4] PV LVM2_member 910.83g used, 7.15g free
{TJKWU2-oTcU-mSBC-sGHz-ZTg7-8HoY-u0Tyjj}
│ └VG vg_ssd 917.98g 7.15g free {hguqji-h777-K0yt-gjma-gEbO-HUfw-NU9aRK}
│ └redacted
├scsi 1:0:0:0 ATA WDC WD30EFRX-68E {WD-WCC4N3EP81NC}
│└sdb 2.73t [8:16] Partitioned (gpt)
│ ├sdb1 2.37g [8:17] ext4 '1.42.6-5691'
{30ef58b3-1e3f-4f33-ade7-7365ebd8c427}
│ ├sdb2 2.00g [8:18] Empty/Unknown
│ └sdb3 2.72t [8:19] Empty/Unknown
├scsi 2:0:0:0 ATA WDC WD30EFRX-68E {WD-WMC4N1087039}
│└sdc 2.73t [8:32] Partitioned (gpt)
└scsi 3:0:0:0 ATA WDC WD30EFRX-68E {WD-WCC4N3EP8C25}
└sdd 2.73t [8:48] Partitioned (gpt)
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation Wellsburg 6-Port
SATA Controller [AHCI mode] (rev 05)
├scsi 4:x:x:x [Empty]
├scsi 5:0:0:0 ATA WDC WD30EFRX-68E {WD-WCC4N0XARKSC}
│└sde 2.73t [8:64] Partitioned (gpt)
│ ├sde1 2.37g [8:65] ext4 '1.42.6-5691'
{30ef58b3-1e3f-4f33-ade7-7365ebd8c427}
│ ├sde2 2.00g [8:66] Empty/Unknown
│ └sde3 2.72t [8:67] Empty/Unknown
├scsi 6:0:0:0 ATA WDC WD30EFRX-68E {WD-WCC4N7KPUH6U}
│└sdf 2.73t [8:80] Partitioned (gpt)
├scsi 7:0:0:0 ATA WDC WD30EFRX-68E {WD-WCC4N1TYVTEN}
│└sdg 2.73t [8:96] Partitioned (gpt)
├scsi 8:x:x:x [Empty]
└scsi 9:x:x:x [Empty]
Other Block Devices
└md0 0.00k [9:0] MD vnone () clear, None (None) None {None}
Empty/Unknown
```
I am currently digging on the mailing list archive to find more
information and things to test.
Part 4: Questions
==================
- How much am I screwed ? Do you believe I can still get most of my data
back, what about the ext4 folder tree ?
- what should be my next steps (I would be happy to use any link to
relevant software/procedures).
- Is all the necessary information here or should I gather additional
information before continuing
- I am a the point where hiring somebody / a company with better
experience than mine to solve this issue is necessary. If yes who would
you advise, if this is an allowed question on the mailing list ?
Thank you for reading me down to this point, and thank you for your
answer if you can take the time to answer.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-03 19:46 Recovery after accidental raid5 superblock rewrite Paul Tonelli
@ 2017-06-03 21:20 ` Andreas Klauer
2017-06-03 22:33 ` Paul Tonelli
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-03 21:20 UTC (permalink / raw)
To: Paul Tonelli; +Cc: linux-raid
On Sat, Jun 03, 2017 at 09:46:44PM +0200, Paul Tonelli wrote:
> I am trying to recover an ext4 partition on lvm2 on raid5.
Okay, your mail is very long, still unclear in places.
This was all done recently? So we do not have to consider that mdadm
changed its defaults in regards to metadata versions, offsets, ...?
In that case I might have good news for you.
Provided you didn't screw anything else up.
> ```
> mdadm --create --verbose --force --assume-clean /dev/md0 --level=5
> --raid-devices=2 /dev/sdb /dev/sdc
> ```
You're not really supposed to do that.
( https://unix.stackexchange.com/a/131927/30851 )
> I immediately made backups of the three disks to spares using dd
This is a key point. If those backups are not good, you have lost.
> I made another mistake during the 3 days I spent trying to recover the
> data, I switched two disks ids in a dd command and overwrite the first
> 800Mb or so of disk c:
Just to confirm, this is somehow not covered by your backups?
> Part 2: What I tried
> ====================
In a data recovery situation there is one thing you should absolutely not do.
That is writing to your disks. Please use overlays in the future...
( https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Making_the_harddisks_read-only_using_an_overlay_file )
Your experiments wrote all sorts of nonsense to your disks.
As stated above, now it all depends on the backups you made...
> - one is missing ~120 GB at the start of the array, I have marked
> this disk as missing for all my tests
Maybe good news for you. Provided those backups are still there.
If I understood your story correctly, then this disk has good data.
RAID5 parity is a simple XOR. a XOR b = c
You had a RAID 5 that was fully grown, fully synced.
You re-created it with the correct drives but wrong disk order.
This started a sync.
The sync should have done a XOR b = c (only c is written to disk c)
Wrong order you did c XOR b = a (only a is written to disk a)
It makes no difference. Either way it wrote the data that was already there.
Merely the data representation (what you got from /dev/md0) was garbage.
As long as you did not write anything to /dev/md0 when you couldn't mount,
you're good right here. You just have to put the disks in correct order.
Proof:
--- Step 1: RAID Creation ---
# truncate -s 100M a b c
# losetup --find --show a
/dev/loop0
# losetup --find --show b
/dev/loop1
# losetup --find --show c
/dev/loop2
# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mdadm --wait /dev/md42
# mkfs.ext4 /dev/md42
# mount /dev/md42 loop/
# echo I am selling these fine leather jackets... > loop/somefile.txt
# umount loop/
# mdadm --stop /dev/md42
--- Step 2: Foobaring it up (wrong disk order) ---
# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop2 /dev/loop1 /dev/loop0
mdadm: /dev/loop2 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
mdadm: /dev/loop1 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mdadm --wait /dev/md42
# mount /dev/md42 loop/
mount: wrong fs type, bad option, bad superblock on /dev/md42,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
# mdadm --stop /dev/md42
--- Step 3: Pulling the rabbit out of the hat (correct order, even one missing) ---
# mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 missing /dev/loop2
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jun 3 23:04:35 2017
mdadm: /dev/loop2 appears to be part of a raid array:
level=raid5 devices=3 ctime=Sat Jun 3 23:04:35 2017
Continue creating array? yes
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# mount /dev/md42 loop/
# cat loop/somefile.txt
I am selling these fine leather jackets...
> - I am a the point where hiring somebody / a company with better
> experience than mine to solve this issue is necessary. If yes who would
> you advise, if this is an allowed question on the mailing list ?
Oh. I guess should have asked for money first? Damn.
Seriously though. I don't know if the above will solve your issue.
It is certainly worth a try. And if it doesn't work it probably means
something else happened... in that case chances of survival are low.
Pictures (if they are small / unfragmented, with identifiable headers,
i.e. JPEGs not RAWs) can be recovered but not their filenames / order.
Filesystem with first roughly 2GiB missing... filesystems _HATE_ that.
Regards
Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-03 21:20 ` Andreas Klauer
@ 2017-06-03 22:33 ` Paul Tonelli
2017-06-03 23:29 ` Andreas Klauer
0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-03 22:33 UTC (permalink / raw)
To: Andreas Klauer; +Cc: linux-raid
Thank you for your answer and your time.
On 06/03/2017 11:20 PM, Andreas Klauer wrote:
> On Sat, Jun 03, 2017 at 09:46:44PM +0200, Paul Tonelli wrote:
>> I am trying to recover an ext4 partition on lvm2 on raid5.
> Okay, your mail is very long, still unclear in places.
>
> This was all done recently? So we do not have to consider that mdadm
> changed its defaults in regards to metadata versions, offsets, ...?
Correct, no change in version of mdadm, kernel, lvm or any other thing,
mdadm was installed on the machine on the day the raid was created and
it has not been upgraded since (I checked comparing the apt log
timestamps and the lvm metadata files).
> In that case I might have good news for you.
> Provided you didn't screw anything else up.
>
>> ```
>> mdadm --create --verbose --force --assume-clean /dev/md0 --level=5
>> --raid-devices=2 /dev/sdb /dev/sdc
>> ```
> You're not really supposed to do that.
> ( https://unix.stackexchange.com/a/131927/30851 )
I know that, now :-/. This was done before the backups.
>
>> I immediately made backups of the three disks to spares using dd
> This is a key point. If those backups are not good, you have lost.
I did backups (just after erasing the raid superblock ), and still have
them, I have been using them as a reference for all the later tests.
>> I made another mistake during the 3 days I spent trying to recover the
>> data, I switched two disks ids in a dd command and overwrite the first
>> 800Mb or so of disk c:
> Just to confirm, this is somehow not covered by your backups?
Right, this is not covered by my backups. i mistakenly copied from the
disks I was experimenting with to one of the backup and not the opposite
once (third mistake is working too late in the evening)
I am still searching for a way to put a complete block device (/dev/sdX)
read-only for these tests, I believe using overlays is the solution.
>> Part 2: What I tried
>> ====================
> In a data recovery situation there is one thing you should absolutely not do.
> That is writing to your disks. Please use overlays in the future...
> ( https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID#Making_the_harddisks_read-only_using_an_overlay_file )
Point taken, I had done the copy the whole disk way, will try this for
my next tests.
> Your experiments wrote all sorts of nonsense to your disks.
> As stated above, now it all depends on the backups you made...
Appart from the error just on top, I always used a copy of the original,
the originals are still available.
>> - one is missing ~120 GB at the start of the array, I have marked
>> this disk as missing for all my tests
> Maybe good news for you. Provided those backups are still there.
>
> If I understood your story correctly, then this disk has good data.
>
> RAID5 parity is a simple XOR. a XOR b = c
>
> You had a RAID 5 that was fully grown, fully synced.
Actually, this is one question I have: with mdadm, creating a raid5 with
two disks and then growing it to 3 creates exactly the same structure as
creating directly a 3 disk raid5 ? Your message seems to say it is the
same thing.
> You re-created it with the correct drives but wrong disk order.
> This started a sync.
>
> The sync should have done a XOR b = c (only c is written to disk c)
> Wrong order you did c XOR b = a (only a is written to disk a)
>
> It makes no difference. Either way it wrote the data that was already there.
> Merely the data representation (what you got from /dev/md0) was garbage.
>
> As long as you did not write anything to /dev/md0 when you couldn't mount,
> you're good right here. You just have to put the disks in correct order.
>
> Proof:
>
> --- Step 1: RAID Creation ---
>
> # truncate -s 100M a b c
> # losetup --find --show a
> /dev/loop0
> # losetup --find --show b
> /dev/loop1
> # losetup --find --show c
> /dev/loop2
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mdadm --wait /dev/md42
> # mkfs.ext4 /dev/md42
> # mount /dev/md42 loop/
> # echo I am selling these fine leather jackets... > loop/somefile.txt
> # umount loop/
> # mdadm --stop /dev/md42
>
> --- Step 2: Foobaring it up (wrong disk order) ---
>
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop2 /dev/loop1 /dev/loop0
> mdadm: /dev/loop2 appears to be part of a raid array:
> level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
> mdadm: /dev/loop1 appears to be part of a raid array:
> level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
> mdadm: /dev/loop0 appears to be part of a raid array:
> level=raid5 devices=3 ctime=Sat Jun 3 23:01:31 2017
> Continue creating array? yes
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mdadm --wait /dev/md42
> # mount /dev/md42 loop/
> mount: wrong fs type, bad option, bad superblock on /dev/md42,
> missing codepage or helper program, or other error
>
> In some cases useful info is found in syslog - try
> dmesg | tail or so.
> # mdadm --stop /dev/md42
>
> --- Step 3: Pulling the rabbit out of the hat (correct order, even one missing) ---
>
> # mdadm --create /dev/md42 --level=5 --raid-devices=3 /dev/loop0 missing /dev/loop2
> mdadm: /dev/loop0 appears to be part of a raid array:
> level=raid5 devices=3 ctime=Sat Jun 3 23:04:35 2017
> mdadm: /dev/loop2 appears to be part of a raid array:
> level=raid5 devices=3 ctime=Sat Jun 3 23:04:35 2017
> Continue creating array? yes
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md42 started.
> # mount /dev/md42 loop/
> # cat loop/somefile.txt
> I am selling these fine leather jackets...
Thanks you
>> - I am a the point where hiring somebody / a company with better
>> experience than mine to solve this issue is necessary. If yes who would
>> you advise, if this is an allowed question on the mailing list ?
> Oh. I guess should have asked for money first? Damn.
>
> Seriously though. I don't know if the above will solve your issue.
> It is certainly worth a try. And if it doesn't work it probably means
> something else happened... in that case chances of survival are low.
>
> Pictures (if they are small / unfragmented, with identifiable headers,
> i.e. JPEGs not RAWs) can be recovered but not their filenames / order.
for my case, losing even 20% of the pictures is not an issue, the
filename / order / directory tree is more important.
>
> Filesystem with first roughly 2GiB missing... filesystems _HATE_ that.
Thank you, so from what you told me, the next steps should be to :
- start using overlays as described in the wiki this will save me a lot
of time.
- use the correct disk (with only the raid superblock missing)
- use the disk which was partially xor-ed during the sync as this has
no impact on the data
- do not use the disk with the first GB missing
- try rebuilding the raid with these disks by doing all 6 combinations ?
I will try this tomorrow and update depending on the result.
I have gathered a second question from my unsuccessful tests and search:
Is it possible to copy only a raid superblock from one disk to another
directly using dd ? after reading on the wiki that the raid superblock
was 256 bytes long + 2 for each device, I tried:
```
dd if=/dev/sdx of=/dev/sdy count=262 iflag=count_bytes
```
but it did not copy the superblock correctly (mdadm did not find it),
There may be an offset or something missing.
Thank you again for your time, I will try this tomorrow after a good
night sleep. It will be less risky.
> Regards
> Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-03 22:33 ` Paul Tonelli
@ 2017-06-03 23:29 ` Andreas Klauer
2017-06-04 22:58 ` Paul Tonelli
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-03 23:29 UTC (permalink / raw)
To: Paul Tonelli; +Cc: linux-raid
On Sun, Jun 04, 2017 at 12:33:43AM +0200, Paul Tonelli wrote:
> I am still searching for a way to put a complete block device (/dev/sdX)
> read-only for these tests, I believe using overlays is the solution.
Yes.
Overlays are extremely useful for data recovery.
It is unfortunate there is no standard tool to manage them easily.
The "overlay functions" in the wiki come close but people don't find
out about it until it's too late.
> Actually, this is one question I have: with mdadm, creating a raid5 with
> two disks and then growing it to 3 creates exactly the same structure as
> creating directly a 3 disk raid5 ? Your message seems to say it is the
> same thing.
Good catch. It would probably move the data offset.
# truncate -s 3TB a b c
# mdadm --create /dev/md42 --level=5 --raid-devices=2 /dev/loop[01]
# mdadm --examine /dev/loop0
Data Offset : 262144 sectors
# mdadm --grow /dev/md42 --raid-devices=3 --add /dev/loop2
# mdadm --examine /dev/loop0
Data Offset : 262144 sectors
New Offset : 260096 sectors
So on re-create you have to find and specify the correct --data-offset.
How to determine the correct data offset? See if you can find LVM magic
string "LABELONE" in the first 256MiB of the two disks you didn't
dd-overwrite. That minus 512 bytes should be the correct offset.
# hexdump -C /dev/some-pv
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
00000210 55 87 20 ff 20 00 00 00 4c 56 4d 32 20 30 30 31 |U. . ...LVM2 001|
If unlucky it just happened to be in the drive you overwrite.
In that case you have to xor the two others.
> Is it possible to copy only a raid superblock from one disk to another
> directly using dd ?
Each superblock is unique (differs in device role and checksum at minimum).
So copying superblocks usually is not a thing. Even copying drives can
result in a mess (UUIDs are no longer unique, you have little / no control
which drive will actually be used). This is also a problem you might
encounter with overlays in conjunction with autoassembly/automount magicks
that might be running in the background.
Regards
Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-03 23:29 ` Andreas Klauer
@ 2017-06-04 22:58 ` Paul Tonelli
2017-06-05 9:24 ` Andreas Klauer
0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-04 22:58 UTC (permalink / raw)
To: Andreas Klauer; +Cc: linux-raid
Hello again, things are somewhat improving, I haven't got my data but I
have got things to do:
On 06/04/2017 01:29 AM, Andreas Klauer wrote:
> Good catch. It would probably move the data offset.
>
> # truncate -s 3TB a b c
> # mdadm --create /dev/md42 --level=5 --raid-devices=2 /dev/loop[01]
> # mdadm --examine /dev/loop0
> Data Offset : 262144 sectors
> # mdadm --grow /dev/md42 --raid-devices=3 --add /dev/loop2
> # mdadm --examine /dev/loop0
> Data Offset : 262144 sectors
> New Offset : 260096 sectors
>
> So on re-create you have to find and specify the correct --data-offset.
>
> How to determine the correct data offset? See if you can find LVM magic
> string "LABELONE" in the first 256MiB of the two disks you didn't
> dd-overwrite. That minus 512 bytes should be the correct offset.
>
> # hexdump -C /dev/some-pv
> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 00000200 4c 41 42 45 4c 4f 4e 45 01 00 00 00 00 00 00 00 |LABELONE........|
> 00000210 55 87 20 ff 20 00 00 00 4c 56 4d 32 20 30 30 31 |U. . ...LVM2 001|
>
> If unlucky it just happened to be in the drive you overwrite.
> In that case you have to xor the two others.
I did find this one at that exact offset 26096 = (133169664 - 512)/512
on the sdc drive (I rebuilt the xor by rebuilding the raid, using
overlays this time, it speeds up things a lot compared to my previous tests)
so this matches the commands. Now, after doing a pvcreate and a
vgcfgrestore using the backup lvm file, e2fsck still refuses to run, but
testdisk now finds a lot more superblocks:
ext4 267262 8729457661 8729190400
ext4 791550 8729981949 8729190400
ext4 1315838 8730506237 8729190400
ext4 1840126 8731030525 8729190400
ext4 6034430 8735224829 8729190400
ext4 6558718 8735749117 8729190400
ext4 12325886 8741516285 8729190400
ext4 20714494 8749904893 8729190400
ext4 32248830 8761439229 8729190400
I also find the following backups superblocks when looping with e2fsck
-b XXX:
using testdisk:
```
33408
98944
164480
230016
...
```
and they always have a +640 difference with what a new ext4 I would
create on this volume doing:
```
mke2fs /dev/vg0/data -n:
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616,
78675968,
102400000, 214990848, 512000000, 550731776, 644972544
```
So I believe I still have an offset issue of 640 sectors or something
like this. Still digging this issue
I have tried to rebuild the ext4 superblock on top of the logical volume
with the "offset" option of mke2fs.ext4, but this did not work (e2fsck
is still not running by itself):
```
mke2fs -E offset=640 -n -S /dev/vg0/data
```
I am still digging around to know where this 640 offset comes from, but
as the last mail made me go forward significantly (thank you Andreas), I
am trying again.
thanks again for reading me to this point.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-04 22:58 ` Paul Tonelli
@ 2017-06-05 9:24 ` Andreas Klauer
2017-06-05 23:24 ` Paul Tonelli
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-05 9:24 UTC (permalink / raw)
To: Paul Tonelli; +Cc: linux-raid
On Mon, Jun 05, 2017 at 12:58:03AM +0200, Paul Tonelli wrote:
> I did find this one at that exact offset 26096 = (133169664 - 512)/512
26096->260096? Okay.
> Now, after doing a pvcreate and a vgcfgrestore ...
Should not be necessary if the data on two drives was okay?
You did leave the dd-overwritten drive out as missing, right?
Do you have the correct disk order and chunk size and offset?
You have to be 140% sure the RAID itself is running correctly,
otherwise all other steps are bound to fail.
If you run photorec on the RAID and it manages to recover
a file that is larger than number of drives * chunk size
and intact, you can have some confidence the RAID is okay
in terms of disk order and chunk size - the offset may
still be off by multiple of stripe alignment but if the
offset is correct too, file -s /dev/md0 should say LVM PV
in your case.
Regards
Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-05 9:24 ` Andreas Klauer
@ 2017-06-05 23:24 ` Paul Tonelli
2017-06-05 23:56 ` Andreas Klauer
0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-05 23:24 UTC (permalink / raw)
To: Andreas Klauer; +Cc: linux-raid
Hi again Andreas (and anybody else reading)
On 06/05/2017 11:24 AM, Andreas Klauer wrote:
> On Mon, Jun 05, 2017 at 12:58:03AM +0200, Paul Tonelli wrote:
>> I did find this one at that exact offset 26096 = (133169664 - 512)/512
> 26096->260096? Okay.
Yes, sorry this was a typo, the superblock was exactly where you
expected it, I did the following procedure:
```
export $DEVICES="/dev/sdc /dev/sdd /dev/sdb"
/root/create_overlay.sh
mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3
missing /dev/mapper/sdd /dev/mapper/sdb
mdadm --add /dev/md0 /dev/mapper/sdc
#sleep 30s
mdadm --stop /dev/md0
xxd -u /dev/mapper/sdc | grep -C 3 'LABELONE'
>7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........
>7f00210: B5E5 02EA 2000 0000 4C56 4D32 2030 3031 .... ...LVM2 001
>7f00220: 5132 5A33 3244 6979 506A 3951 5970 7558 Q2Z32DiyPj9QYpuX
>7f00230: 4243 7130 3265 7338 514B 3665 7176 3464 BCq02es8QK6eqv4d
```
I have let it run for a longer time (10 min) and it does not find
another LABELONE after that. (I also tried to find LABELONE on the other
disks without success).
>> Now, after doing a pvcreate and a vgcfgrestore ...
> Should not be necessary if the data on two drives was okay?
> You did leave the dd-overwritten drive out as missing, right?
> Do you have the correct disk order and chunk size and offset?
I believe you are right, the issue is still the raid: I have tried
photorec and most files I have opened look like they have been
truncated. I have checked again the commands used to build the raid, (I
had initially extracted them from bash_history, but I also have a record
with timestamps in /var/log/auth.log because of sudo).
and all the commands match ( I have put all the commands used on the
raid at the end of the mail )
> You have to be 140% sure the RAID itself is running correctly,
> otherwise all other steps are bound to fail.
>
> If you run photorec on the RAID and it manages to recover
> a file that is larger than number of drives * chunk size
> and intact, you can have some confidence the RAID is okay
> in terms of disk order and chunk size - the offset may
> still be off by multiple of stripe alignment but if the
> offset is correct too, file -s /dev/md0 should say LVM PV
> in your case.
Photorec does not: I get only small files when I use photorec. The only
"big" files it recovers are tar (and I believe that is because it does
not check integrity of the files). I have left it running for ~10G of
files were recovered.
From what I understand, finding the "LABELONE" at the correct position
shows:
- apart from the raid superblock, the disks I use (sdd and sdb) have not
been erased (as sdc is rebuilt from xor)
- the offset to build the raid is correct (I find all ext4 backup
superblock when doing the testdisk / checking with e2fsck -b XXX ), with
the issue of the 640 ext4 chunk offset.
- the first disk of the array is sdc (as the "LABELONE" can only be
found on this disk)
so I may need to check the other parame
/dev/md0 should say LVM PV
I believe my best option is to do a script to explore several parameters
using the overlay and see if one finds the correct data. I would use as
parameters:
- disk order
- raid chunk size
Would you test any other parameters ?
In the things I have tried, I used the offset option on an losetup to
align the backup superblocks from the logical volume, e2fsck then runs
directly, but it then refuse to continue because of the wrong volume
size, so I do not think this is the right solution, and the 640 ext4
offset cannot easily be added to lvm, I agree with your previous comment
that it is still on the raid level there are issues.
Thanks (again) for reading me to this point, I will still happily take
any advice, and update once I have written and run my script.
Dump of raid building commands
=====================
part 1
---------
sudo commands which created the array:
```
May 16 17:38:47 redacted sshd[10978]: pam_unix(sshd:session): session
opened for user skoos by (uid=0)
May 16 17:38:47 redacted systemd-logind[1071]: New session 754 of user
skoos.
May 16 17:41:29 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/usr/bin/apt-get install mdadm
May 16 17:41:29 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:41:45 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:41:48 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/mdadm --create --verbose --force
--assume-clean /dev/md0 -
-level=5 --raid-devices=2 /dev/sdb /dev/sdc
May 16 17:41:48 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:42:08 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:42:10 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/mdadm --create --verbose --force
--assume-clean /dev/md0 -
-level=5 --raid-devices=2 /dev/sdb /dev/sdc
May 16 17:42:10 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:42:13 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:42:43 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/pvcreate /dev/md0
May 16 17:42:43 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:42:44 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:42:49 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/vgdisplay
May 16 17:42:49 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:42:49 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:43:02 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/vgcreate vg0 /dev/md0
May 16 17:43:02 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:43:02 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:43:23 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/lvcreate -L 2.6T -n data /dev/vg0
May 16 17:43:23 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:43:23 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:43:32 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/sbin/mkfs.ext4 /dev/vg0/data
May 16 17:43:32 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:43:48 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:45:34 redacted sudo: user : TTY=pts/83 ; PWD=/srv/data ;
USER=root ; COMMAND=/bin/mount /dev/vg0/data /mnt/data
May 16 17:45:34 redacted sudo: pam_unix(sudo:session): session opened
for user root by user(uid=0)
May 16 17:45:35 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 16 17:46:08 redacted sudo: user : TTY=pts/103 ; PWD=/mnt/data ;
USER=root ; COMMAND=/usr/bin/rsync -avH /srv/data/ /mnt/data
```
part 2
---------
command to rebuild the array after reboot (the array was not detected
any longer after reboot, but once the following command was run, it
mounted fine):
```
May 23 13:23:55 redacted sudo: user : TTY=pts/1 ; PWD=/home/user ;
USER=root ; COMMAND=/sbin/mdadm --create --verbose /dev/md0 --level=5
--raid-devices=2 /dev/sdb /dev/sdc
```
commands used to grow with third disk
```
May 24 21:04:10 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --add /dev/md0 /dev/sdd
May 24 21:04:10 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:10 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:04:15 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm /dev/md0
May 24 21:04:15 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:15 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:04:20 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm /dev/md0 --detail
May 24 21:04:20 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:20 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:04:26 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 24 21:04:26 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:26 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:04:46 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --gros /dev/md0 /dev/sdd
May 24 21:04:46 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:46 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:04:49 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --grow /dev/md0 /dev/sdd
May 24 21:04:49 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:04:49 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:05:16 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --grow --raid-devices=3 /dev/md0
May 24 21:05:16 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:05:17 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:05:19 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 24 21:05:19 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 24 21:05:19 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 24 21:05:41 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
```
part 3
---------
commands used to resize after adding the third disk
```
May 29 09:45:20 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/mdadm --detail /dev/md0
May 29 09:45:20 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:45:21 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:45:37 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:45:37 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:45:38 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:45:43 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
May 29 09:45:43 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:45:49 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:46:02 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:46:02 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:46:03 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:46:05 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
May 29 09:46:05 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:46:12 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:46:14 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/lvextend -L +500G /dev/vg0/data
May 29 09:46:14 redacted sudo: pam_unix(sudo:session): session opened
for user root by (uid=0)
May 29 09:46:15 redacted sudo: pam_unix(sudo:session): session closed
for user root
May 29 09:46:16 redacted sudo: root : TTY=pts/30 ; PWD=/root ;
USER=root ; COMMAND=/sbin/resize2fs /dev/vg0/data
```
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-05 23:24 ` Paul Tonelli
@ 2017-06-05 23:56 ` Andreas Klauer
2017-06-10 20:04 ` Paul Tonelli
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-05 23:56 UTC (permalink / raw)
To: Paul Tonelli; +Cc: linux-raid
On Tue, Jun 06, 2017 at 01:24:41AM +0200, Paul Tonelli wrote:
> mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 missing /dev/mapper/sdd /dev/mapper/sdb
You did not specify the --data-offset here?
Check mdadm --examine to make sure which offset it's using.
> xxd -u /dev/mapper/sdc | grep -C 3 'LABELONE'
> >7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> >7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> >7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........
If correct this should appear at the start of /dev/md0 (even w/o adding sdc).
LABELONE should appear on the first drive so this should not be wrong,
however the sdd sdb could still have switched order, and of course
the chunk size could be different (although unlikely according to your log).
> I believe you are right, the issue is still the raid: I have tried
> photorec and most files I have opened look like they have been
> truncated.
Well, files could be fragmented, there's a certain sweet spot
(like - megapixel JPEGs of few megs size) where it's sufficiently
unlikely to be a problem.
I don't know what files you have, if it's movies it would be okay
too if the first few megs of the file were playable.
> - apart from the raid superblock, the disks I use (sdd and sdb) have not
> been erased (as sdc is rebuilt from xor)
It only rebuilds starting from offset. So it should not have covered that
offset if you did not specify it. Check it's not there before you --add.
If it's there then this is after all not the drive you overwrite with dd?
I am confused now.
Regards
Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-05 23:56 ` Andreas Klauer
@ 2017-06-10 20:04 ` Paul Tonelli
2017-06-10 20:41 ` Andreas Klauer
0 siblings, 1 reply; 13+ messages in thread
From: Paul Tonelli @ 2017-06-10 20:04 UTC (permalink / raw)
To: Andreas Klauer; +Cc: linux-raid
Hello Everybody,
tl dr: I found the --data-offset to allow a e2fsck, but things are not
as they should be. I still manage to get back data (pictures which are
>= 4Mb in size).
On 06/06/2017 01:56 AM, Andreas Klauer wrote:
> On Tue, Jun 06, 2017 at 01:24:41AM +0200, Paul Tonelli wrote:
>> mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 missing /dev/mapper/sdd /dev/mapper/sdb
> You did not specify the --data-offset here?
> Check mdadm --examine to make sure which offset it's using.
The objective was just to reverse the /dev/sdc destruction using the xor
on /dev/sdb and /dev/sdc, I believe the default offset is earlier than
the one I specify by hand.
>> xxd -u /dev/mapper/sdc | grep -C 3 'LABELONE'
>> >7f001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>> >7f001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
>> >7f00200: 4C41 4245 4C4F 4E45 0100 0000 0000 0000 LABELONE........
> If correct this should appear at the start of /dev/md0 (even w/o adding sdc).
>
> LABELONE should appear on the first drive so this should not be wrong,
> however the sdd sdb could still have switched order, and of course
> the chunk size could be different (although unlikely according to your log).
Even after rebuilding the disks, I find the LABELONE at 7f00200. But it
does not provide a recoverable LVM / ext4 filesystem (the backup
superblock offset is incorrect for this partition size)
>> I believe you are right, the issue is still the raid: I have tried
>> photorec and most files I have opened look like they have been
>> truncated.
> Well, files could be fragmented, there's a certain sweet spot
> (like - megapixel JPEGs of few megs size) where it's sufficiently
> unlikely to be a problem.
>
> I don't know what files you have, if it's movies it would be okay
> too if the first few megs of the file were playable.
I have multiple assemblies with a custom data-offset in which a e2fsck
agrees to run from a backup superblock (I have brute forced the tests)
the possible assemblies are (offset, disk order):
261120 missing /dev/mapper/sdc /dev/mapper/sdd
261120 missing /dev/mapper/sdd /dev/mapper/sdc
261120 /dev/mapper/sdb missing /dev/mapper/sdd
261120 /dev/mapper/sdb /dev/mapper/sdd missing
I have already found bmp/png which are > 4MB by using photorec, so I
think I am going forward.
Right now, I am running e2fsck -n to see which order returns the least
errors, I will then try to get back as many files as I can, still using
overlay.
>> - apart from the raid superblock, the disks I use (sdd and sdb) have not
>> been erased (as sdc is rebuilt from xor)
> It only rebuilds starting from offset. So it should not have covered that
> offset if you did not specify it. Check it's not there before you --add.
> If it's there then this is after all not the drive you overwrite with dd?
I believe the offset we manually specify is after the default one on a
raid assembly with 3 disks, so it should have rebuilt the previous LVM
superblock, or am I missing something (again).
> I am confused now.
So am I, and I am afraid I have used all the time I could spend to get
this data back. Thanks to your help, I can still recover many files,
even if it is not a full filesystem :-). Thank you !
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-10 20:04 ` Paul Tonelli
@ 2017-06-10 20:41 ` Andreas Klauer
2017-07-31 19:57 ` Paul Tonelli
0 siblings, 1 reply; 13+ messages in thread
From: Andreas Klauer @ 2017-06-10 20:41 UTC (permalink / raw)
To: Paul Tonelli; +Cc: linux-raid
On Sat, Jun 10, 2017 at 10:04:16PM +0200, Paul Tonelli wrote:
> I believe the offset we manually specify is after the default one on a
> raid assembly with 3 disks, so it should have rebuilt the previous LVM
> superblock, or am I missing something (again).
It should be the reverse. When you grow the raid, the offset shrinks.
Not able to provide further insights. You have a LVM2 header which,
given the correct offset, should appear on the /dev/md device and
in that case everything else should appear too.
If that is not the case then things will be a lot more complicated.
Regards
Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-06-10 20:41 ` Andreas Klauer
@ 2017-07-31 19:57 ` Paul Tonelli
2017-07-31 20:35 ` Wols Lists
2017-08-01 14:01 ` Phil Turmel
0 siblings, 2 replies; 13+ messages in thread
From: Paul Tonelli @ 2017-07-31 19:57 UTC (permalink / raw)
To: Andreas Klauer; +Cc: linux-raid
Hello (again)
Sorry to resuscitate this topic back from the dead, but I have again the
same issue.
TL;DR
The difference compared to last time:
- I created a clean raid a few days back
- the data is completely backed up and available
- I can actually access the data
but I still have no clue about what went wrong.
1) I created the raid:
sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
/dev/sdb /dev/sdc /dev/sdd
without lvm this time, juste a single ext4 partition
mkfs.ext4 /dev/md0
mount /dev/md0 /srv/data
I copied 3Tb on it
I just rebooted the machine
2) and (again) no md0 assembled at boot:
mdadm -E /dev/sd[bcd]
/dev/sdb:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdc:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
This time, I am able to get my data back. I first create the overlays and:
mdadm --create --verbose /dev/md0 --assume-clean --level=5
--raid-devices=3 /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd
mount /dev/md0 /srv/data
# mdadm --detail --scan /dev/md0
ARRAY /dev/md0 metadata=1.2 name=smic:0
UUID=2ebab32e:82283ac5:4232d2ee:92abf170
and it mounts, so I did exactly the same using the real disks and
(again) got my data back. The raid is now running:
# mdadm -E /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
Name : smic:0 (local to host smic)
Creation Time : Mon Jul 31 21:36:12 2017
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=944 sectors
State : clean
Device UUID : 2d1daf78:1e085799:6f1799b2:e88e60d1
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jul 31 21:36:19 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 63abce91 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm -E /dev/sdc
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
Name : smic:0 (local to host smic)
Creation Time : Mon Jul 31 21:36:12 2017
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=944 sectors
State : clean
Device UUID : db6d0696:072c5166:1491157d:6ddf34cf
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jul 31 21:36:19 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 164e0d53 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm -E /dev/sdd
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 2ebab32e:82283ac5:4232d2ee:92abf170
Name : smic:0 (local to host smic)
Creation Time : Mon Jul 31 21:36:12 2017
Raid Level : raid5
Raid Devices : 3
Avail Dev Size : 5860271024 (2794.40 GiB 3000.46 GB)
Array Size : 5860270080 (5588.79 GiB 6000.92 GB)
Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=944 sectors
State : clean
Device UUID : 0c52bea6:a0cbf060:fe5edff0:4ee71d21
Internal Bitmap : 8 sectors from superblock
Update Time : Mon Jul 31 21:36:19 2017
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e75866e8 - correct
Events : 1
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
but a reboot always brings me back to step 2), with the need to press
ctrl-D at boot time.
What am I missing? Am I using a bad version of the mdadm or kernel
(mdadm:amd64 3.3.2-5+deb8u2, kernel 4.8.0-0.bpo.2-amd64) or am I doing
it wrong in another way ?
best regards,
On 06/10/2017 10:41 PM, Andreas Klauer wrote:
> On Sat, Jun 10, 2017 at 10:04:16PM +0200, Paul Tonelli wrote:
>> I believe the offset we manually specify is after the default one on a
>> raid assembly with 3 disks, so it should have rebuilt the previous LVM
>> superblock, or am I missing something (again).
> It should be the reverse. When you grow the raid, the offset shrinks.
>
> Not able to provide further insights. You have a LVM2 header which,
> given the correct offset, should appear on the /dev/md device and
> in that case everything else should appear too.
>
> If that is not the case then things will be a lot more complicated.
>
> Regards
> Andreas Klauer
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-07-31 19:57 ` Paul Tonelli
@ 2017-07-31 20:35 ` Wols Lists
2017-08-01 14:01 ` Phil Turmel
1 sibling, 0 replies; 13+ messages in thread
From: Wols Lists @ 2017-07-31 20:35 UTC (permalink / raw)
To: Paul Tonelli, Andreas Klauer; +Cc: linux-raid
On 31/07/17 20:57, Paul Tonelli wrote:
> but a reboot always brings me back to step 2), with the need to press
> ctrl-D at boot time.
>
> What am I missing? Am I using a bad version of the mdadm or kernel
> (mdadm:amd64 3.3.2-5+deb8u2, kernel 4.8.0-0.bpo.2-amd64) or am I doing
> it wrong in another way ?
Okay. Step 2. Does an "mdadm --assemble --scan" work instead? This will
tell us whether your raid array is fine, just that it's not being
properly assembled at boot. Actually, before you do that, try a "mdadm
/dev/md0 --stop".
If the stop then assemble scan works, it tells us that your boot
sequence is at fault, not the array.
Can you post the relevant section of grub.cfg? That might not be
assembling the arrays.
Cheers,
Wol
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Recovery after accidental raid5 superblock rewrite
2017-07-31 19:57 ` Paul Tonelli
2017-07-31 20:35 ` Wols Lists
@ 2017-08-01 14:01 ` Phil Turmel
1 sibling, 0 replies; 13+ messages in thread
From: Phil Turmel @ 2017-08-01 14:01 UTC (permalink / raw)
To: Paul Tonelli, Andreas Klauer; +Cc: linux-raid
On 07/31/2017 03:57 PM, Paul Tonelli wrote:
> Hello (again)
>
> Sorry to resuscitate this topic back from the dead, but I have again the
> same issue.
>
> TL;DR
>
> The difference compared to last time:
> - I created a clean raid a few days back
> - the data is completely backed up and available
> - I can actually access the data
>
> but I still have no clue about what went wrong.
>
> 1) I created the raid:
>
> sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3
> /dev/sdb /dev/sdc /dev/sdd
>
> without lvm this time, juste a single ext4 partition
>
> mkfs.ext4 /dev/md0
> mount /dev/md0 /srv/data
>
> I copied 3Tb on it
>
> I just rebooted the machine
>
> 2) and (again) no md0 assembled at boot:
>
> mdadm -E /dev/sd[bcd]
> /dev/sdb:
> MBR Magic : aa55
> Partition[0] : 4294967295 sectors at 1 (type ee)
> /dev/sdc:
> MBR Magic : aa55
> Partition[0] : 4294967295 sectors at 1 (type ee)
> /dev/sdd:
> MBR Magic : aa55
> Partition[0] : 4294967295 sectors at 1 (type ee)
You have partition tables on these drives, but you are using the entire
drives when you create your arrays. You need to zero the first 4k of
each drive to kill off the incorrect partition tables.
You might also have a GPT partition table backup at the end of the disk
that needs to die as well.
Phil
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-08-01 14:01 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-03 19:46 Recovery after accidental raid5 superblock rewrite Paul Tonelli
2017-06-03 21:20 ` Andreas Klauer
2017-06-03 22:33 ` Paul Tonelli
2017-06-03 23:29 ` Andreas Klauer
2017-06-04 22:58 ` Paul Tonelli
2017-06-05 9:24 ` Andreas Klauer
2017-06-05 23:24 ` Paul Tonelli
2017-06-05 23:56 ` Andreas Klauer
2017-06-10 20:04 ` Paul Tonelli
2017-06-10 20:41 ` Andreas Klauer
2017-07-31 19:57 ` Paul Tonelli
2017-07-31 20:35 ` Wols Lists
2017-08-01 14:01 ` Phil Turmel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).