* Re: data recovery on raid5
@ 2006-04-22 18:57 Jonathan
2006-04-22 19:48 ` Molle Bestefich
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan @ 2006-04-22 18:57 UTC (permalink / raw)
To: linux-raid
Having raid fail on friday evening is pretty bad timing - not there is
perhaps any good time for such a thing. I'm the sys-admin for the
machine in question (apologies for starting a new thread rather than
replying - I just subscribed to the list)
From my reading, it seems like maybe:
mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792
/dev/etherd/e0.[01234]
would be a thing to try?
Frankly, I'm terrified that I'll screw this up - I'm not too savy with raid.
following is a record of the only thing that I've done so far:
Please note that /dev/md1 is composed of 5 attitional drive which share
the same hardware as the failed /dev/md0, but are in no other way related.
We're seriously considering sending the drives to a data recovery place
and spending a bazillion bucks to recover the data. if anyone reading
this feels confident that they can help us rebuild this array and get us
to a place where we can copy the data off of it. Please send mail to
support@abhost.net. We'll be happy to pay you for your services. - I'll
post a summary of what we did when all is done.
help, please.
comparing the superblocks below with those posted yesterday, you can see
that things have changed. I'm pulling my hair out - I hope I didn't bork
our data.
-- Jonathan
hazel /tmp # df -H
Filesystem Size Used Avail Use% Mounted on
/dev/hda4 67G 5.8G 58G 10% /
udev 526M 177k 526M 1% /dev
/dev/hda3 8.1G 34M 7.7G 1% /tmp
none 526M 0 526M 0% /dev/shm
/dev/md1 591G 34M 561G 1% /md1
hazel /tmp # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
level=5 devices=4 ctime=Mon Jan 3 03:16:48 2005
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
level=5 devices=4 ctime=Mon Jan 3 03:16:48 2005
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
size=720300416K mtime=Wed Oct 5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
level=5 devices=4 ctime=Mon Jan 3 03:16:48 2005
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /tmp # aoe-stat
e0.0 eth1 up
e0.1 eth1 up
e0.2 eth1 up
e0.3 eth1 up
e0.4 eth1 up
e0.5 eth1 up
e0.6 eth1 up
e0.7 eth1 up
e0.8 eth1 up
e0.9 eth1 up
hazel /tmp # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2]
etherd/e0.6[1] etherd/e0.5[0]
586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
md0 : active raid5 etherd/e0.3[3] etherd/e0.2[2] etherd/e0.0[1]
586082688 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
unused devices: <none>
hazel /tmp # mkdir /md0
hazel /tmp # mount -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
or too many mounted file systems
hazel /tmp # mount -t ext2 -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
or too many mounted file systems
hazel /tmp # mdadm -S /dev/md0
hazel /tmp # aoe-stat
e0.0 eth1 up
e0.1 eth1 up
e0.2 eth1 up
e0.3 eth1 up
e0.4 eth1 up
e0.5 eth1 up
e0.6 eth1 up
e0.7 eth1 up
e0.8 eth1 up
e0.9 eth1 up
hazel /tmp # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2]
etherd/e0.6[1] etherd/e0.5[0]
586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
unused devices: <none>
hazel /tmp # mdadm -E /dev/etherd/e0.[01234]
/dev/etherd/e0.0:
Magic : a92b4efc
Version : 00.90.02
UUID : ec0bdbb3:f625880f:dbf65130:057d069c
Creation Time : Fri Apr 21 22:56:18 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Apr 21 22:56:18 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 1742f65 - correct
Events : 0.3493634
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 152 0 1 active sync /dev/etherd/e0.0
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.1:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Apr 21 14:03:12 2006
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 3
Spare Devices : 1
Checksum : 4cc991d7 - correct
Events : 0.3493633
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 4 152 16 4 spare /dev/etherd/e0.1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
/dev/etherd/e0.2:
Magic : a92b4efc
Version : 00.90.02
UUID : ec0bdbb3:f625880f:dbf65130:057d069c
Creation Time : Fri Apr 21 22:56:18 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Apr 21 22:56:18 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 1742f87 - correct
Events : 0.3493634
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 152 32 2 active sync /dev/etherd/e0.2
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.3:
Magic : a92b4efc
Version : 00.90.02
UUID : ec0bdbb3:f625880f:dbf65130:057d069c
Creation Time : Fri Apr 21 22:56:18 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Apr 21 22:56:18 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 1742f99 - correct
Events : 0.3493634
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 152 48 3 active sync /dev/etherd/e0.3
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.4:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Thu Apr 20 21:07:50 2006
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 4cc84d59 - correct
Events : 0.3482550
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 152 64 0 active sync /dev/etherd/e0.4
0 0 152 64 0 active sync /dev/etherd/e0.4
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-22 18:57 data recovery on raid5 Jonathan
@ 2006-04-22 19:48 ` Molle Bestefich
2006-04-22 20:07 ` Jonathan
2006-04-23 2:46 ` Neil Brown
0 siblings, 2 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 19:48 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
Jonathan wrote:
> # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
I think you should have tried "mdadm --assemble --force" first, as I
proposed earlier.
By doing the above, you have effectively replaced your version 0.9.0
superblocks with version 0.9.2. I don't know if version 0.9.2
superblocks are larger than 0.9.0, Neil hasn't responded to that yet.
Potentially hazardous, who knows.
Anyway.
This is from your old superblock as described by Sam Hopkins:
> /dev/etherd/<blah>:
> Chunk Size : 32K
This is from what you've just posted:
> /dev/etherd/<blah>:
> Chunk Size : 64K
If I were you, I'd recreate your superblocks now, but with the correct
chunk size (use -c).
> We'll be happy to pay you for your services.
I'll be modest and charge you a penny per byte of data recovered, ho hum.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 19:48 ` Molle Bestefich
@ 2006-04-22 20:07 ` Jonathan
2006-04-22 20:22 ` Molle Bestefich
2006-04-22 20:28 ` Carlos Carvalho
2006-04-23 2:46 ` Neil Brown
1 sibling, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:07 UTC (permalink / raw)
To: linux-raid
I was already terrified of screwing things up -- now I'm afraid of
making things worse
based on what was posted before is this a sensible thing to try?
mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]
Is what I've done to the superblock size recoverable?
I don't understand how mdadm --assemble would know what to do, which is
why I didn't try it initially. That said, obviouly my lack of
understanding isn't helping one bit.
I don't think I can afford a penny per byte, but I'd happy part with
hundreds of dollars to get the data back. I would really like someone
with more knowledge than me to hold my hand before I continue to make
things worse.
help please - support@abhost.net
-- Jonathan
Molle Bestefich wrote:
>Jonathan wrote:
>
>
>># mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>>
>>
>
>I think you should have tried "mdadm --assemble --force" first, as I
>proposed earlier.
>
>By doing the above, you have effectively replaced your version 0.9.0
>superblocks with version 0.9.2. I don't know if version 0.9.2
>superblocks are larger than 0.9.0, Neil hasn't responded to that yet.
>Potentially hazardous, who knows.
>
>Anyway.
>This is from your old superblock as described by Sam Hopkins:
>
>
>
>>/dev/etherd/<blah>:
>> Chunk Size : 32K
>>
>>
>
>This is from what you've just posted:
>
>
>>/dev/etherd/<blah>:
>> Chunk Size : 64K
>>
>>
>
>If I were you, I'd recreate your superblocks now, but with the correct
>chunk size (use -c).
>
>
>
>>We'll be happy to pay you for your services.
>>
>>
>
>I'll be modest and charge you a penny per byte of data recovered, ho hum.
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:07 ` Jonathan
@ 2006-04-22 20:22 ` Molle Bestefich
2006-04-22 20:32 ` Jonathan
2006-04-22 20:28 ` Carlos Carvalho
1 sibling, 1 reply; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:22 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
Jonathan wrote:
> I was already terrified of screwing things up
> now I'm afraid of making things worse
Adrenalin... makes life worth living there for a sec, doesn't it ;o)
> based on what was posted before is this a sensible thing to try?
> mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]
Yes, looks exactly right.
> Is what I've done to the superblock size recoverable?
I don't think you've done anything at all.
I just *don't know* if you have, that's all.
Was just trying to say that it wasn't super-cautious of you to begin
with, that's all :-).
> I don't understand how mdadm --assemble would know what to do,
> which is why I didn't try it initially.
By giving it --force, you tell it to forcefully mount the array even
though it might be damaged.
That means including some disks (the freshest ones) that are out of sync.
That help?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:22 ` Molle Bestefich
@ 2006-04-22 20:32 ` Jonathan
2006-04-22 20:38 ` Molle Bestefich
2006-04-22 20:51 ` Molle Bestefich
0 siblings, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:32 UTC (permalink / raw)
To: linux-raid
Well, the block sizes are back to 32k now, but I still had no luck
mounting /dev/md0 once I created the array. below is a record of what I
just tried:
how safe should the following be?
mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792
/dev/etherd/e0.[01234]
I am *really* not interested in making my situation worse.
-- Jonathan
hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing
/dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
size=720300416K mtime=Wed Oct 5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /virtual # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2]
etherd/e0.6[1] etherd/e0.5[0]
586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
md0 : active raid5 etherd/e0.3[3] etherd/e0.2[2] etherd/e0.0[1]
586082688 blocks level 5, 32k chunk, algorithm 2 [4/3] [_UUU]
unused devices: <none>
hazel /virtual # mount -t ext2 -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
or too many mounted file systems
hazel /virtual # mdadm -S /dev/md0
hazel /virtual # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2]
etherd/e0.6[1] etherd/e0.5[0]
586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
unused devices: <none>
hazel /virtual # mdadm -E /dev/etherd/e0.[01234]
/dev/etherd/e0.0:
Magic : a92b4efc
Version : 00.90.02
UUID : 518b5d59:44292ca3:6c358813:c6f00804
Creation Time : Sat Apr 22 13:25:40 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Sat Apr 22 13:25:40 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 6aaa56f - correct
Events : 0.3493635
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 152 0 1 active sync /dev/etherd/e0.0
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.1:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Apr 21 14:03:12 2006
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 3
Spare Devices : 1
Checksum : 4cc991d7 - correct
Events : 0.3493633
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 4 152 16 4 spare /dev/etherd/e0.1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
/dev/etherd/e0.2:
Magic : a92b4efc
Version : 00.90.02
UUID : 518b5d59:44292ca3:6c358813:c6f00804
Creation Time : Sat Apr 22 13:25:40 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Sat Apr 22 13:25:40 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 6aaa591 - correct
Events : 0.3493635
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 2 152 32 2 active sync /dev/etherd/e0.2
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.3:
Magic : a92b4efc
Version : 00.90.02
UUID : 518b5d59:44292ca3:6c358813:c6f00804
Creation Time : Sat Apr 22 13:25:40 2006
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Sat Apr 22 13:25:40 2006
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Checksum : 6aaa5a3 - correct
Events : 0.3493635
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 152 48 3 active sync /dev/etherd/e0.3
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
/dev/etherd/e0.4:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Thu Apr 20 21:07:50 2006
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 4cc84d59 - correct
Events : 0.3482550
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 152 64 0 active sync /dev/etherd/e0.4
0 0 152 64 0 active sync /dev/etherd/e0.4
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-22 20:32 ` Jonathan
@ 2006-04-22 20:38 ` Molle Bestefich
2006-04-22 20:55 ` Jonathan
2006-04-22 20:51 ` Molle Bestefich
1 sibling, 1 reply; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:38 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
Jonathan wrote:
> Well, the block sizes are back to 32k now, but I still had no luck
> mounting /dev/md0 once I created the array.
Ahem, I missed something.
Sorry, the 'a' was hard to spot.
Your array used layout : left-asymmetric, while the superblock you've
just created has layout: left-symmetric.
Try again, but add the option "--parity=left-asymmetric"
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:38 ` Molle Bestefich
@ 2006-04-22 20:55 ` Jonathan
2006-04-22 21:17 ` Molle Bestefich
2006-04-22 23:17 ` Christian Pedaschus
0 siblings, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:55 UTC (permalink / raw)
To: linux-raid
hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5
--parity=left-asymmetric missing /dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
size=720300416K mtime=Wed Oct 5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /virtual # mount -t ext2 -r /dev/md0 /md0
hazel /virtual # df -H
Filesystem Size Used Avail Use% Mounted on
/dev/hda4 67G 5.8G 58G 10% /
udev 526M 177k 526M 1% /dev
/dev/hda3 8.1G 34M 7.7G 1% /tmp
none 526M 0 526M 0% /dev/shm
/dev/md1 591G 11G 551G 2% /virtual
/dev/md0 591G 54G 507G 10% /md0
now I'm doing a:
(cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)
thank you thank you thank you thank you thank you thank you
Molle Bestefich wrote:
>Jonathan wrote:
>
>
>>Well, the block sizes are back to 32k now, but I still had no luck
>>mounting /dev/md0 once I created the array.
>>
>>
>
>Ahem, I missed something.
>Sorry, the 'a' was hard to spot.
>
>Your array used layout : left-asymmetric, while the superblock you've
>just created has layout: left-symmetric.
>
>Try again, but add the option "--parity=left-asymmetric"
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-22 20:55 ` Jonathan
@ 2006-04-22 21:17 ` Molle Bestefich
2006-04-22 21:42 ` Carlos Carvalho
2006-04-22 22:30 ` David Greaves
2006-04-22 23:17 ` Christian Pedaschus
1 sibling, 2 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 21:17 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
Jonathan wrote:
> /dev/md0 591G 54G 507G 10% /md0
Hrm. Have you been hassling us for a mere 54G worth of data?
(Just kidding :-D.)
> now I'm doing a:
> (cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)
I wonder - if you have the disk space, why not take David Greaves'
advice and create a backup of the individual disks before fiddling
with md superblocks?
Anyway, a quick cheat sheet might come in handy:
* Hot-add a disk to an array (outdated raid data on that disk won't be used):
# mdadm /dev/md1 --add /dev/etherd/e0.xyz
* Check that parity is OK for an entire array:
# echo check > /sys/block/md1/md/sync_action
* Manually start a resync:
# echo repair > /sys/block/md1/md/sync_action
* Experiment with MDADM by using sparse files:
# dd if=/dev/zero of=sparse0 bs=1M seek=200000 count=1
# dd if=/dev/zero of=sparse1 bs=1M seek=200000 count=1
# dd if=/dev/zero of=sparse2 bs=1M seek=200000 count=1
# dd if=/dev/zero of=sparse3 bs=1M seek=200000 count=1
# losetup /dev/loop0 sparse0
# losetup /dev/loop1 sparse1
# losetup /dev/loop2 sparse2
# losetup /dev/loop3 sparse3
# mdadm -C /dev/md5 -c 32 -p left-a -n 4 -l 5 missing /dev/loop{1,2,3}
> thank you thank you thank you thank you thank you thank you
np.
(wait, does that mean I won't get my money? ;-))
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-22 21:17 ` Molle Bestefich
@ 2006-04-22 21:42 ` Carlos Carvalho
2006-04-22 22:58 ` Molle Bestefich
2006-04-22 22:30 ` David Greaves
1 sibling, 1 reply; 21+ messages in thread
From: Carlos Carvalho @ 2006-04-22 21:42 UTC (permalink / raw)
To: linux-raid
Molle Bestefich (molle.bestefich@gmail.com) wrote on 22 April 2006 23:17:
>Jonathan wrote:
>> thank you thank you thank you thank you thank you thank you
Nice job Molle!
>np.
>(wait, does that mean I won't get my money? ;-))
At least you showed someone a new meaning of "support"...
Is it my turn now? Just look at this:
md1 : active raid5 sdp2[15](F) sdo2[16](F) sdn2[13] sdm2[12] sdl2[11] sdk2[10] sdj2[9] sdi2[8] sdh2[17](F) sdg2[18](F) sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
4198656 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]
md3 : active raid5 sdp5[15](F) sdo5[16](F) sdn5[13] sdm5[12] sdl5[11] sdk5[10] sdj5[9] sdi5[8] sdh5[17](F) sdg5[18](F) sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] sda5[0]
588487424 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]
md4 : active raid5 sdp6[15](S) sdo6[16](F) sdn6[13] sdm6[12] sdl6[11] sdk6[10] sdj6[9] sdi6[8] sdh6[17](F) sdg6[18](F) sdf6[5] sde6[4] sdd6[3] sdc6[2] sdb6[1] sda6[0]
2141065472 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]
This is a 15-array + spare where 3 or 4 disks dropped, *very* likely
because of these miserable crappy sata power connectors :-( :-( It's
unbelievable that the industry changes from a working standard to an
unreliable trash in such common and widespread devices...
I'm trying to get enough stamina to put the damn thing back on line...
Or at least try to :-(
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-22 21:42 ` Carlos Carvalho
@ 2006-04-22 22:58 ` Molle Bestefich
0 siblings, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 22:58 UTC (permalink / raw)
To: Carlos Carvalho; +Cc: linux-raid
Carlos Carvalho wrote:
> Is it my turn now?
Hah, nah, based on your postings you seem to know more about MD than I do ;-)
> [UUUUUU__UUUUUU_]
>
> This is a 15-array + spare where 3 or 4 disks dropped, *very* likely
> because of these miserable crappy sata power connectors :-( :-(
They really suck, don't they.
I had a SATA data connector that was slightly skewed the other day, no
more than 1.5mm, and it caused random read errors. Gah.
> It's unbelievable that the industry changes from a working standard
> to an unreliable trash in such common and widespread devices...
Yes. Sigh..
> I'm trying to get enough stamina to put the damn thing back on line...
Guess it's just --assemble --force and then cross your fingers...
Hmm, your mdstat does look a bit weird.
Why wasn't the spare taken into use on md4?
Perhaps you should check the event counters.
Just in case the spares got added late in the game, got half-rebuilt
and the MD crapped out?
Wouldn't want to include them in the assemble in that case...
(Dunno if it could even happen. Hopefully not =))
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 21:17 ` Molle Bestefich
2006-04-22 21:42 ` Carlos Carvalho
@ 2006-04-22 22:30 ` David Greaves
1 sibling, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22 22:30 UTC (permalink / raw)
To: Molle Bestefich; +Cc: Jonathan, linux-raid
Molle Bestefich wrote:
> Anyway, a quick cheat sheet might come in handy:
>
Which is why I posted about a wiki a few days back :)
I'm progressing it and I'll see if we can't get something up.
There's a lot of info on the list and it would be nice to get it a
little more focused...
David
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:55 ` Jonathan
2006-04-22 21:17 ` Molle Bestefich
@ 2006-04-22 23:17 ` Christian Pedaschus
1 sibling, 0 replies; 21+ messages in thread
From: Christian Pedaschus @ 2006-04-22 23:17 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
nice to hear you got your data back.
now it's perhaps a good time to donate some money to some
ppls/oss-projects for saving your ass ;) ;)
greets, chris
Jonathan wrote:
> hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5
> --parity=left-asymmetric missing /dev/etherd/e0.[023]
> mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
> level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
> level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
> size=720300416K mtime=Wed Oct 5 16:39:28 2005
> mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
> level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> Continue creating array? y
> mdadm: array /dev/md0 started.
> hazel /virtual # mount -t ext2 -r /dev/md0 /md0
> hazel /virtual # df -H
> Filesystem Size Used Avail Use% Mounted on
> /dev/hda4 67G 5.8G 58G 10% /
> udev 526M 177k 526M 1% /dev
> /dev/hda3 8.1G 34M 7.7G 1% /tmp
> none 526M 0 526M 0% /dev/shm
> /dev/md1 591G 11G 551G 2% /virtual
> /dev/md0 591G 54G 507G 10% /md0
>
> now I'm doing a:
>
> (cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)
>
> thank you thank you thank you thank you thank you thank you
>
>
> Molle Bestefich wrote:
>
>> Jonathan wrote:
>>
>>
>>> Well, the block sizes are back to 32k now, but I still had no luck
>>> mounting /dev/md0 once I created the array.
>>>
>>
>>
>> Ahem, I missed something.
>> Sorry, the 'a' was hard to spot.
>>
>> Your array used layout : left-asymmetric, while the superblock you've
>> just created has layout: left-symmetric.
>>
>> Try again, but add the option "--parity=left-asymmetric"
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:32 ` Jonathan
2006-04-22 20:38 ` Molle Bestefich
@ 2006-04-22 20:51 ` Molle Bestefich
1 sibling, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:51 UTC (permalink / raw)
To: Jonathan; +Cc: linux-raid
Jonathan wrote:
> how safe should the following be?
>
> mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792
> /dev/etherd/e0.[01234]
You can hardly do --assemble anymore.
After you have recreated superblocks on some of the devices, those are
conceptually part of a different raid array. At least as seen by MD.
> I am *really* not interested in making my situation worse.
We'll keep going till you got your data back..
Recreating superblocks again on e0.{0,2,3} can't hurt, since you've
already done this and thereby nuked the old superblocks.
You can shake your own hand and thank yourself now (oh, and Sam too)
for posting all the debug output you have. Otherwise we would
probably never have spotted nor known about the parity/chunk size
differences :o).
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 20:07 ` Jonathan
2006-04-22 20:22 ` Molle Bestefich
@ 2006-04-22 20:28 ` Carlos Carvalho
1 sibling, 0 replies; 21+ messages in thread
From: Carlos Carvalho @ 2006-04-22 20:28 UTC (permalink / raw)
To: linux-raid
Jonathan (jrs@abhost.net) wrote on 22 April 2006 13:07:
>I was already terrified of screwing things up -- now I'm afraid of
>making things worse
>
>based on what was posted before is this a sensible thing to try?
>
>mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
>Is what I've done to the superblock size recoverable?
Raid metadata are stored at the end of the partition, in a small area.
Perhaps some overwrite of data happened, I don't know, but it'd be
small. If you re-make the superblock with the right chunk size you
can read the data back.
I'd suggest following the help you're getting on the list, and if you
don't understand something ask before running any commands. Also don't
do anything that causes writes to the disks (such as fsck) before
getting the raid back...
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-22 19:48 ` Molle Bestefich
2006-04-22 20:07 ` Jonathan
@ 2006-04-23 2:46 ` Neil Brown
1 sibling, 0 replies; 21+ messages in thread
From: Neil Brown @ 2006-04-23 2:46 UTC (permalink / raw)
To: Molle Bestefich; +Cc: Jonathan, linux-raid
On Saturday April 22, molle.bestefich@gmail.com wrote:
> Jonathan wrote:
> > # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
> I think you should have tried "mdadm --assemble --force" first, as I
> proposed earlier.
>
> By doing the above, you have effectively replaced your version 0.9.0
> superblocks with version 0.9.2. I don't know if version 0.9.2
> superblocks are larger than 0.9.0, Neil hasn't responded to that yet.
> Potentially hazardous, who knows.
There is no difference in the superblock between 0.90.0 and 0.90.2.
md has always used version numbers, but always in a confusing way.
There should be two completely separate version numbers: the version
for the format of the superblock, and the version for the software
implementation. md confuses these two.
To try to sort it out, I have decided that:
- The 'major' version number is the overall choice of superblock
This is currently 0 or 1
- The 'minor' version encodes minor variation in the superblock.
For version 1, this is different locations (there are other bits
in the superblock to allow new fields to be added)
For version 0, it is currently only used to make sure old software
doesn't try to assemble an array which is undergoing a
shape, as that would confuse it totals.
- The 'patchlevel' is used to indicate feature availability in
the implementation. It really should be stored in the superblock,
but it is for historical reasons. It is not checked when
validating a superblock.
To quote from md.h
/*
* MD_PATCHLEVEL_VERSION indicates kernel functionality.
* >=1 means different superblock formats are selectable using SET_ARRAY_INFO
* and major_version/minor_version accordingly
* >=2 means that Internal bitmaps are supported by setting MD_SB_BITMAP_PRESENT
* in the super status byte
* >=3 means that bitmap superblock version 4 is supported, which uses
* little-ending representation rather than host-endian
*/
Hope that helps.
NeilBrown
^ permalink raw reply [flat|nested] 21+ messages in thread
* data recovery on raid5
@ 2006-04-21 23:11 Sam Hopkins
2006-04-21 23:31 ` Mike Tran
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Sam Hopkins @ 2006-04-21 23:11 UTC (permalink / raw)
To: linux-raid; +Cc: jrs, support
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
Hello,
I have a client with a failed raid5 that is in desperate need of the
data that's on the raid. The attached file holds the mdadm -E
superblocks that are hopefully the keys to the puzzle. Linux-raid
folks, if you can give any help here it would be much appreciated.
# mdadm -V
mdadm - v1.7.0 - 11 August 2004
# uname -a
Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
Here's my take:
Logfiles show that last night drive /dev/etherd/e0.4 failed and around
noon today /dev/etherd/e0.0 failed. This jibes with the superblock
dates and info.
My assessment is that since the last known good configuration was
0 <missing>
1 /dev/etherd/e0.0
2 /dev/etherd/e0.2
3 /dev/etherd/e0.3
then we should shoot for this. I couldn't figure out how to get there
using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
If anyone can suggest a way to get this back using -A, please chime in.
The alternative is to recreate the array with this configuration hoping
the data blocks will all line up properly so the filesystem can be mounted
and data retrieved. It looks like the following command is the right
way to do this, but not being an expert I (and the client) would like
someone else to verify the sanity of this approach.
Will
mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
do what we want?
Linux-raid folks, please reply-to-all as we're probably all not on
the list.
Thanks for your help,
Sam
[-- Attachment #2: mdadm-e.0234 --]
[-- Type: text/plain, Size: 4122 bytes --]
/dev/etherd/e0.0:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Apr 21 12:45:07 2006
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : 4cc955da - correct
Events : 0.3488315
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 152 0 1 active sync /dev/etherd/e0.0
0 0 0 0 0 removed
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 0 spare /dev/etherd/e0.1
/dev/etherd/e0.2:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Apr 21 14:03:12 2006
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 3
Spare Devices : 1
Checksum : 4cc991e9 - correct
Events : 0.3493633
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 2 152 32 2 active sync /dev/etherd/e0.2
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
/dev/etherd/e0.3:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Apr 21 14:03:12 2006
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 3
Spare Devices : 1
Checksum : 4cc991fb - correct
Events : 0.3493633
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 152 48 3 active sync /dev/etherd/e0.3
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
/dev/etherd/e0.4:
Magic : a92b4efc
Version : 00.90.00
UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
Creation Time : Mon Jan 3 03:16:48 2005
Raid Level : raid5
Device Size : 195360896 (186.31 GiB 200.05 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Thu Apr 20 21:07:50 2006
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : 4cc84d59 - correct
Events : 0.3482550
Layout : left-asymmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 0 152 64 0 active sync /dev/etherd/e0.4
0 0 152 64 0 active sync /dev/etherd/e0.4
1 1 152 0 1 active sync /dev/etherd/e0.0
2 2 152 32 2 active sync /dev/etherd/e0.2
3 3 152 48 3 active sync /dev/etherd/e0.3
4 4 152 16 4 spare /dev/etherd/e0.1
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-21 23:11 Sam Hopkins
@ 2006-04-21 23:31 ` Mike Tran
2006-04-21 23:38 ` Mike Hardy
` (3 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Mike Tran @ 2006-04-21 23:31 UTC (permalink / raw)
To: Sam Hopkins; +Cc: linux-raid, jrs, support
Sam Hopkins wrote:
>Hello,
>
>I have a client with a failed raid5 that is in desperate need of the
>data that's on the raid. The attached file holds the mdadm -E
>superblocks that are hopefully the keys to the puzzle. Linux-raid
>folks, if you can give any help here it would be much appreciated.
>
># mdadm -V
>mdadm - v1.7.0 - 11 August 2004
># uname -a
>Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
>Here's my take:
>
>Logfiles show that last night drive /dev/etherd/e0.4 failed and around
>noon today /dev/etherd/e0.0 failed. This jibes with the superblock
>dates and info.
>
>My assessment is that since the last known good configuration was
>0 <missing>
>1 /dev/etherd/e0.0
>2 /dev/etherd/e0.2
>3 /dev/etherd/e0.3
>
>then we should shoot for this. I couldn't figure out how to get there
>using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
>If anyone can suggest a way to get this back using -A, please chime in.
>
>The alternative is to recreate the array with this configuration hoping
>the data blocks will all line up properly so the filesystem can be mounted
>and data retrieved. It looks like the following command is the right
>way to do this, but not being an expert I (and the client) would like
>someone else to verify the sanity of this approach.
>
>Will
>
>mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
>do what we want?
>
>Linux-raid folks, please reply-to-all as we're probably all not on
>the list.
>
>
>
Yes, I would re-create the array with 1 missing disk. mount read-only,
verify your data. If things are ok, remount read-write and remember to
add a new disk to fix the degrade array.
With the "missing" keyword, no resync/recovery, thus the data on disk
will be intact.
--
Regards,
Mike T.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-21 23:11 Sam Hopkins
2006-04-21 23:31 ` Mike Tran
@ 2006-04-21 23:38 ` Mike Hardy
2006-04-22 4:03 ` Molle Bestefich
` (2 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Mike Hardy @ 2006-04-21 23:38 UTC (permalink / raw)
To: Sam Hopkins; +Cc: linux-raid, jrs, support
Recreate the array from the constituent drives in the order you mention,
with 'missing' in place of the first drive that failed?
It won't resync because it has a missing drive.
If you created it correctly, the data will be there
If you didn't create it correctly, you can keep trying permutations of
4-disk arrays with one missing until you see your data, and you should
find it.
-Mike
Sam Hopkins wrote:
> Hello,
>
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid. The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle. Linux-raid
> folks, if you can give any help here it would be much appreciated.
>
> # mdadm -V
> mdadm - v1.7.0 - 11 August 2004
> # uname -a
> Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
> Here's my take:
>
> Logfiles show that last night drive /dev/etherd/e0.4 failed and around
> noon today /dev/etherd/e0.0 failed. This jibes with the superblock
> dates and info.
>
> My assessment is that since the last known good configuration was
> 0 <missing>
> 1 /dev/etherd/e0.0
> 2 /dev/etherd/e0.2
> 3 /dev/etherd/e0.3
>
> then we should shoot for this. I couldn't figure out how to get there
> using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
> If anyone can suggest a way to get this back using -A, please chime in.
>
> The alternative is to recreate the array with this configuration hoping
> the data blocks will all line up properly so the filesystem can be mounted
> and data retrieved. It looks like the following command is the right
> way to do this, but not being an expert I (and the client) would like
> someone else to verify the sanity of this approach.
>
> Will
>
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
> do what we want?
>
> Linux-raid folks, please reply-to-all as we're probably all not on
> the list.
>
> Thanks for your help,
>
> Sam
>
>
> ------------------------------------------------------------------------
>
> /dev/etherd/e0.0:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
> Creation Time : Mon Jan 3 03:16:48 2005
> Raid Level : raid5
> Device Size : 195360896 (186.31 GiB 200.05 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Fri Apr 21 12:45:07 2006
> State : clean
> Active Devices : 3
> Working Devices : 4
> Failed Devices : 1
> Spare Devices : 1
> Checksum : 4cc955da - correct
> Events : 0.3488315
>
> Layout : left-asymmetric
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> this 1 152 0 1 active sync /dev/etherd/e0.0
>
> 0 0 0 0 0 removed
> 1 1 152 0 1 active sync /dev/etherd/e0.0
> 2 2 152 32 2 active sync /dev/etherd/e0.2
> 3 3 152 48 3 active sync /dev/etherd/e0.3
> 4 4 152 16 0 spare /dev/etherd/e0.1
> /dev/etherd/e0.2:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
> Creation Time : Mon Jan 3 03:16:48 2005
> Raid Level : raid5
> Device Size : 195360896 (186.31 GiB 200.05 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Fri Apr 21 14:03:12 2006
> State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 3
> Spare Devices : 1
> Checksum : 4cc991e9 - correct
> Events : 0.3493633
>
> Layout : left-asymmetric
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> this 2 152 32 2 active sync /dev/etherd/e0.2
>
> 0 0 0 0 0 removed
> 1 1 0 0 1 faulty removed
> 2 2 152 32 2 active sync /dev/etherd/e0.2
> 3 3 152 48 3 active sync /dev/etherd/e0.3
> 4 4 152 16 4 spare /dev/etherd/e0.1
> /dev/etherd/e0.3:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
> Creation Time : Mon Jan 3 03:16:48 2005
> Raid Level : raid5
> Device Size : 195360896 (186.31 GiB 200.05 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Fri Apr 21 14:03:12 2006
> State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 3
> Spare Devices : 1
> Checksum : 4cc991fb - correct
> Events : 0.3493633
>
> Layout : left-asymmetric
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> this 3 152 48 3 active sync /dev/etherd/e0.3
>
> 0 0 0 0 0 removed
> 1 1 0 0 1 faulty removed
> 2 2 152 32 2 active sync /dev/etherd/e0.2
> 3 3 152 48 3 active sync /dev/etherd/e0.3
> 4 4 152 16 4 spare /dev/etherd/e0.1
> /dev/etherd/e0.4:
> Magic : a92b4efc
> Version : 00.90.00
> UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
> Creation Time : Mon Jan 3 03:16:48 2005
> Raid Level : raid5
> Device Size : 195360896 (186.31 GiB 200.05 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>
> Update Time : Thu Apr 20 21:07:50 2006
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : 4cc84d59 - correct
> Events : 0.3482550
>
> Layout : left-asymmetric
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> this 0 152 64 0 active sync /dev/etherd/e0.4
>
> 0 0 152 64 0 active sync /dev/etherd/e0.4
> 1 1 152 0 1 active sync /dev/etherd/e0.0
> 2 2 152 32 2 active sync /dev/etherd/e0.2
> 3 3 152 48 3 active sync /dev/etherd/e0.3
> 4 4 152 16 4 spare /dev/etherd/e0.1
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-21 23:11 Sam Hopkins
2006-04-21 23:31 ` Mike Tran
2006-04-21 23:38 ` Mike Hardy
@ 2006-04-22 4:03 ` Molle Bestefich
2006-04-22 7:43 ` David Greaves
2006-04-22 8:51 ` David Greaves
4 siblings, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 4:03 UTC (permalink / raw)
To: Sam Hopkins; +Cc: linux-raid, jrs, support
Sam Hopkins wrote:
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
While it should work, a bit drastic perhaps?
I'd start with mdadm --assemble --force.
With --force, mdadm will pull the event counter of the most-recently
failed drive up to current status which should give you a readable
array.
After that, you could try running a check by echo'ing "check" into
"sync_action".
If the check succeeds, fine, hotadd the last drive to your array and
MD will start resync'ing.
If the check fails because of a bad block, you'll have to make a decision.
Live with the lost blocks, or try and reconstruct from the first kicked disk.
I posted a patch this week that will allow you to forcefully get the
array started with all of the disks - but beware, MD wasn't made with
this in mind and will probably be confused and sometimes pick data
from the first-kicked drive over data from the other drives. Only
forcefully start the array with all drives if you absolutely have
to...
Oh, and I'm not an expert by any means, so take everything I say with
a grain of salt :-).
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: data recovery on raid5
2006-04-21 23:11 Sam Hopkins
` (2 preceding siblings ...)
2006-04-22 4:03 ` Molle Bestefich
@ 2006-04-22 7:43 ` David Greaves
2006-04-22 8:51 ` David Greaves
4 siblings, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22 7:43 UTC (permalink / raw)
To: Sam Hopkins; +Cc: linux-raid, jrs, support
Sam Hopkins wrote:
> Hello,
>
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid. The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle. Linux-raid
> folks, if you can give any help here it would be much appreciated.
>
Have you read the archive? There were a couple of similar problems
earlier this month.
take a look at 2 April 06 - "help recreating a raid5"
Also "Re: help wanted - 6-disk raid5 borked: _ _ U U U U"
> # mdadm -V
> mdadm - v1.7.0 - 11 August 2004
>
Can't hurt to upgrade mdadm
> # uname -a
> Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
> Here's my take:
>
> Logfiles show that last night drive /dev/etherd/e0.4 failed and around
> noon today /dev/etherd/e0.0 failed. This jibes with the superblock
> dates and info.
>
> My assessment is that since the last known good configuration was
> 0 <missing>
> 1 /dev/etherd/e0.0
> 2 /dev/etherd/e0.2
> 3 /dev/etherd/e0.3
>
> then we should shoot for this. I couldn't figure out how to get there
> using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
> If anyone can suggest a way to get this back using -A, please chime in.
>
See the patch Molle provided - it seemed to work for him and took the
guesswork out of the create parameters.
I personally didn't use it since Neil didn't bless it :)
> The alternative is to recreate the array with this configuration hoping
> the data blocks will all line up properly so the filesystem can be mounted
> and data retrieved. It looks like the following command is the right
> way to do this, but not being an expert I (and the client) would like
> someone else to verify the sanity of this approach.
>
> Will
>
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
> do what we want?
>
It looks right to me - but see comments below...
Also, can you take disk images of the devices (dd if=/dev/etherd/e0.0
of=/somewhere/e0.0.img) to allow for retries?
> ------------------------------------------------------------------------
>
> /dev/etherd/e0.0:
> Events : 0.3488315
> /dev/etherd/e0.2:
> Events : 0.3493633
> /dev/etherd/e0.3:
> Events : 0.3493633
> /dev/etherd/e0.4:
> Events : 0.3482550
>
I don't know precisely what 'Events' are but I read this as being a lot
of activity on e0.[23] after e0.0 went down.
I think that's odd.
Maybe the kernel isn't stopping the device when it degrades - I seem to
remember something like this but I'm probably wrong... archives again...
This shouldn't affect the situation you're now in (horse,bolt,door etc)
but fixing it may make life better should another problem like this
occur - or it may not. Eventually there may be info in a wiki to help
understand this stuff.
HTH
David
--
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: data recovery on raid5
2006-04-21 23:11 Sam Hopkins
` (3 preceding siblings ...)
2006-04-22 7:43 ` David Greaves
@ 2006-04-22 8:51 ` David Greaves
4 siblings, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22 8:51 UTC (permalink / raw)
To: Sam Hopkins; +Cc: linux-raid, jrs, support
Sam Hopkins wrote:
> Hello,
>
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid. The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle. Linux-raid
> folks, if you can give any help here it would be much appreciated.
>
snip
> Linux-raid folks, please reply-to-all as we're probably all not on
> the list.
>
If you're going to post messages to public mailing lists (and solicit
help and private cc's!!!) then you should not be using mechanisms like
the one below. Please Google if you don't understand why not.
I've been getting so much junk mail that I'm resorting to
a draconian mechanism to avoid the mail. In order
to make sure that there's a real person sending mail, I'm
asking you to explicitly enable access. To do that, send
mail to sah at this domain with the token:
qSGTt
in the subject of your mail message. After that, you
shouldn't get any bounces from me. Sorry if this is
an inconvenience.
David
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-04-23 2:46 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-22 18:57 data recovery on raid5 Jonathan
2006-04-22 19:48 ` Molle Bestefich
2006-04-22 20:07 ` Jonathan
2006-04-22 20:22 ` Molle Bestefich
2006-04-22 20:32 ` Jonathan
2006-04-22 20:38 ` Molle Bestefich
2006-04-22 20:55 ` Jonathan
2006-04-22 21:17 ` Molle Bestefich
2006-04-22 21:42 ` Carlos Carvalho
2006-04-22 22:58 ` Molle Bestefich
2006-04-22 22:30 ` David Greaves
2006-04-22 23:17 ` Christian Pedaschus
2006-04-22 20:51 ` Molle Bestefich
2006-04-22 20:28 ` Carlos Carvalho
2006-04-23 2:46 ` Neil Brown
-- strict thread matches above, loose matches on Subject: below --
2006-04-21 23:11 Sam Hopkins
2006-04-21 23:31 ` Mike Tran
2006-04-21 23:38 ` Mike Hardy
2006-04-22 4:03 ` Molle Bestefich
2006-04-22 7:43 ` David Greaves
2006-04-22 8:51 ` David Greaves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).