Re: data recovery on raid5

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: data recovery on raid5
@ 2006-04-22 18:57 Jonathan
  2006-04-22 19:48 ` Molle Bestefich
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan @ 2006-04-22 18:57 UTC (permalink / raw)
  To: linux-raid

Having raid fail on friday evening is pretty bad timing - not there is 
perhaps any good time for such a thing.  I'm the sys-admin for the 
machine in question (apologies for starting a new thread rather than 
replying - I just subscribed to the list)

 From my reading, it seems like maybe:

mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792 
/dev/etherd/e0.[01234]

would be a thing to try?

Frankly, I'm terrified that I'll screw this up - I'm not too savy with raid.

following is a record of the only thing that I've done so far:

Please note that /dev/md1 is composed of 5 attitional drive which share 
the same hardware as the failed /dev/md0, but are in no other way related.

We're seriously considering sending the drives to a data recovery place 
and spending a bazillion bucks to recover the data.  if anyone reading 
this feels confident that they can help us rebuild this array and get us 
to a place where we can copy the data off of it. Please send mail to 
support@abhost.net.  We'll be happy to pay you for your services. - I'll 
post a summary of what we did when all is done.

help, please.

comparing the superblocks below with those posted yesterday, you can see 
that things have changed. I'm pulling my hair out - I hope I didn't bork 
our data.

-- Jonathan

hazel /tmp # df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/hda4               67G   5.8G    58G  10% /
udev                   526M   177k   526M   1% /dev
/dev/hda3              8.1G    34M   7.7G   1% /tmp
none                   526M      0   526M   0% /dev/shm
/dev/md1               591G    34M   561G   1% /md1
hazel /tmp # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
    level=5 devices=4 ctime=Mon Jan  3 03:16:48 2005
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
    level=5 devices=4 ctime=Mon Jan  3 03:16:48 2005
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
    size=720300416K  mtime=Wed Oct  5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
    level=5 devices=4 ctime=Mon Jan  3 03:16:48 2005
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /tmp # aoe-stat
    e0.0            eth1              up
    e0.1            eth1              up
    e0.2            eth1              up
    e0.3            eth1              up
    e0.4            eth1              up
    e0.5            eth1              up
    e0.6            eth1              up
    e0.7            eth1              up
    e0.8            eth1              up
    e0.9            eth1              up
hazel /tmp # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2] 
etherd/e0.6[1] etherd/e0.5[0]
      586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

md0 : active raid5 etherd/e0.3[3] etherd/e0.2[2] etherd/e0.0[1]
      586082688 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

unused devices: <none>
hazel /tmp # mkdir /md0
hazel /tmp # mount -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       or too many mounted file systems
hazel /tmp # mount -t ext2 -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       or too many mounted file systems
hazel /tmp # mdadm -S /dev/md0
hazel /tmp # aoe-stat
    e0.0            eth1              up
    e0.1            eth1              up
    e0.2            eth1              up
    e0.3            eth1              up
    e0.4            eth1              up
    e0.5            eth1              up
    e0.6            eth1              up
    e0.7            eth1              up
    e0.8            eth1              up
    e0.9            eth1              up
hazel /tmp # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2] 
etherd/e0.6[1] etherd/e0.5[0]
      586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

unused devices: <none>
hazel /tmp # mdadm -E /dev/etherd/e0.[01234]
/dev/etherd/e0.0:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : ec0bdbb3:f625880f:dbf65130:057d069c
  Creation Time : Fri Apr 21 22:56:18 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Apr 21 22:56:18 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 1742f65 - correct
         Events : 0.3493634

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1     152        0        1      active sync   /dev/etherd/e0.0

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 21 14:03:12 2006
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 1
       Checksum : 4cc991d7 - correct
         Events : 0.3493633

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     4     152       16        4      spare   /dev/etherd/e0.1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1
/dev/etherd/e0.2:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : ec0bdbb3:f625880f:dbf65130:057d069c
  Creation Time : Fri Apr 21 22:56:18 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Apr 21 22:56:18 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 1742f87 - correct
         Events : 0.3493634

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2     152       32        2      active sync   /dev/etherd/e0.2

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.3:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : ec0bdbb3:f625880f:dbf65130:057d069c
  Creation Time : Fri Apr 21 22:56:18 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Apr 21 22:56:18 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 1742f99 - correct
         Events : 0.3493634

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3     152       48        3      active sync   /dev/etherd/e0.3

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Thu Apr 20 21:07:50 2006
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 4cc84d59 - correct
         Events : 0.3482550

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     0     152       64        0      active sync   /dev/etherd/e0.4

   0     0     152       64        0      active sync   /dev/etherd/e0.4
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 18:57 data recovery on raid5 Jonathan
@ 2006-04-22 19:48 ` Molle Bestefich
  2006-04-22 20:07   ` Jonathan
  2006-04-23  2:46   ` Neil Brown
  0 siblings, 2 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 19:48 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

Jonathan wrote:
> # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]

I think you should have tried "mdadm --assemble --force" first, as I
proposed earlier.

By doing the above, you have effectively replaced your version 0.9.0
superblocks with version 0.9.2.  I don't know if version 0.9.2
superblocks are larger than 0.9.0, Neil hasn't responded to that yet. 
Potentially hazardous, who knows.

Anyway.
This is from your old superblock as described by Sam Hopkins:

> /dev/etherd/<blah>:
>      Chunk Size : 32K

This is from what you've just posted:
> /dev/etherd/<blah>:
>      Chunk Size : 64K

If I were you, I'd recreate your superblocks now, but with the correct
chunk size (use -c).

> We'll be happy to pay you for your services.

I'll be modest and charge you a penny per byte of data recovered, ho hum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 19:48 ` Molle Bestefich
@ 2006-04-22 20:07   ` Jonathan
  2006-04-22 20:22     ` Molle Bestefich
  2006-04-22 20:28     ` Carlos Carvalho
  2006-04-23  2:46   ` Neil Brown
  1 sibling, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:07 UTC (permalink / raw)
  To: linux-raid

I was already terrified of screwing things up -- now I'm afraid of 
making things worse

based on what was posted before is this a sensible thing to try?

mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]

Is what I've done to the superblock size recoverable?

I don't understand how mdadm --assemble would know what to do, which is 
why I didn't try it initially.  That said, obviouly my lack of 
understanding isn't helping one bit.

I don't think I can afford a penny per byte, but I'd happy part with 
hundreds of dollars to get the data back.  I would really like someone 
with more knowledge than me to hold my hand before I continue to make 
things worse.

help please - support@abhost.net

-- Jonathan


Molle Bestefich wrote:

>Jonathan wrote:
>  
>
>># mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>>    
>>
>
>I think you should have tried "mdadm --assemble --force" first, as I
>proposed earlier.
>
>By doing the above, you have effectively replaced your version 0.9.0
>superblocks with version 0.9.2.  I don't know if version 0.9.2
>superblocks are larger than 0.9.0, Neil hasn't responded to that yet. 
>Potentially hazardous, who knows.
>
>Anyway.
>This is from your old superblock as described by Sam Hopkins:
>
>  
>
>>/dev/etherd/<blah>:
>>     Chunk Size : 32K
>>    
>>
>
>This is from what you've just posted:
>  
>
>>/dev/etherd/<blah>:
>>     Chunk Size : 64K
>>    
>>
>
>If I were you, I'd recreate your superblocks now, but with the correct
>chunk size (use -c).
>
>  
>
>>We'll be happy to pay you for your services.
>>    
>>
>
>I'll be modest and charge you a penny per byte of data recovered, ho hum.
>  
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:07   ` Jonathan
@ 2006-04-22 20:22     ` Molle Bestefich
  2006-04-22 20:32       ` Jonathan
  2006-04-22 20:28     ` Carlos Carvalho
  1 sibling, 1 reply; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:22 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

Jonathan wrote:
> I was already terrified of screwing things up
> now I'm afraid of making things worse

Adrenalin... makes life worth living there for a sec, doesn't it ;o)

> based on what was posted before is this a sensible thing to try?
> mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]

Yes, looks exactly right.

> Is what I've done to the superblock size recoverable?

I don't think you've done anything at all.
I just *don't know* if you have, that's all.

Was just trying to say that it wasn't super-cautious of you to begin
with, that's all :-).

> I don't understand how mdadm --assemble would know what to do,
> which is why I didn't try it initially.

By giving it --force, you tell it to forcefully mount the array even
though it might be damaged.
That means including some disks (the freshest ones) that are out of sync.

That help?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:22     ` Molle Bestefich
@ 2006-04-22 20:32       ` Jonathan
  2006-04-22 20:38         ` Molle Bestefich
  2006-04-22 20:51         ` Molle Bestefich
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:32 UTC (permalink / raw)
  To: linux-raid


Well, the block sizes are back to 32k now, but I still had no luck 
mounting /dev/md0 once I created the array.  below is a record of what I 
just tried:

how safe should the following be?

mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792 
/dev/etherd/e0.[01234]

I am *really* not interested in making my situation worse.

-- Jonathan

hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing 
/dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
    level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
    level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
    size=720300416K  mtime=Wed Oct  5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
    level=5 devices=4 ctime=Fri Apr 21 22:56:18 2006
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /virtual # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2] 
etherd/e0.6[1] etherd/e0.5[0]
      586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

md0 : active raid5 etherd/e0.3[3] etherd/e0.2[2] etherd/e0.0[1]
      586082688 blocks level 5, 32k chunk, algorithm 2 [4/3] [_UUU]

unused devices: <none>
hazel /virtual # mount -t ext2 -r /dev/md0 /md0
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       or too many mounted file systems
hazel /virtual # mdadm -S /dev/md0
hazel /virtual # cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 etherd/e0.9[4] etherd/e0.8[3] etherd/e0.7[2] 
etherd/e0.6[1] etherd/e0.5[0]
      586082688 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

unused devices: <none>
hazel /virtual # mdadm -E /dev/etherd/e0.[01234]
/dev/etherd/e0.0:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 518b5d59:44292ca3:6c358813:c6f00804
  Creation Time : Sat Apr 22 13:25:40 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sat Apr 22 13:25:40 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6aaa56f - correct
         Events : 0.3493635

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     1     152        0        1      active sync   /dev/etherd/e0.0

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 21 14:03:12 2006
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 1
       Checksum : 4cc991d7 - correct
         Events : 0.3493633

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     4     152       16        4      spare   /dev/etherd/e0.1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1
/dev/etherd/e0.2:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 518b5d59:44292ca3:6c358813:c6f00804
  Creation Time : Sat Apr 22 13:25:40 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sat Apr 22 13:25:40 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6aaa591 - correct
         Events : 0.3493635

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     2     152       32        2      active sync   /dev/etherd/e0.2

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.3:
          Magic : a92b4efc
        Version : 00.90.02
           UUID : 518b5d59:44292ca3:6c358813:c6f00804
  Creation Time : Sat Apr 22 13:25:40 2006
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sat Apr 22 13:25:40 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 6aaa5a3 - correct
         Events : 0.3493635

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     3     152       48        3      active sync   /dev/etherd/e0.3

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
/dev/etherd/e0.4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Thu Apr 20 21:07:50 2006
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 4cc84d59 - correct
         Events : 0.3482550

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     0     152       64        0      active sync   /dev/etherd/e0.4

   0     0     152       64        0      active sync   /dev/etherd/e0.4
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:32       ` Jonathan
@ 2006-04-22 20:38         ` Molle Bestefich
  2006-04-22 20:55           ` Jonathan
  2006-04-22 20:51         ` Molle Bestefich
  1 sibling, 1 reply; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:38 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

Jonathan wrote:
> Well, the block sizes are back to 32k now, but I still had no luck
> mounting /dev/md0 once I created the array.

Ahem, I missed something.
Sorry, the 'a' was hard to spot.

Your array used layout : left-asymmetric, while the superblock you've
just created has layout: left-symmetric.

Try again, but add the option "--parity=left-asymmetric"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:38         ` Molle Bestefich
@ 2006-04-22 20:55           ` Jonathan
  2006-04-22 21:17             ` Molle Bestefich
  2006-04-22 23:17             ` Christian Pedaschus
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan @ 2006-04-22 20:55 UTC (permalink / raw)
  To: linux-raid

hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5 
--parity=left-asymmetric missing /dev/etherd/e0.[023]
mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
    size=720300416K  mtime=Wed Oct  5 16:39:28 2005
mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
Continue creating array? y
mdadm: array /dev/md0 started.
hazel /virtual # mount -t ext2 -r /dev/md0 /md0
hazel /virtual # df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/hda4               67G   5.8G    58G  10% /
udev                   526M   177k   526M   1% /dev
/dev/hda3              8.1G    34M   7.7G   1% /tmp
none                   526M      0   526M   0% /dev/shm
/dev/md1               591G    11G   551G   2% /virtual
/dev/md0               591G    54G   507G  10% /md0

now I'm doing a:

(cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)

thank you thank you thank you thank you thank you thank you


Molle Bestefich wrote:

>Jonathan wrote:
>  
>
>>Well, the block sizes are back to 32k now, but I still had no luck
>>mounting /dev/md0 once I created the array.
>>    
>>
>
>Ahem, I missed something.
>Sorry, the 'a' was hard to spot.
>
>Your array used layout : left-asymmetric, while the superblock you've
>just created has layout: left-symmetric.
>
>Try again, but add the option "--parity=left-asymmetric"
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:55           ` Jonathan
@ 2006-04-22 21:17             ` Molle Bestefich
  2006-04-22 21:42               ` Carlos Carvalho
  2006-04-22 22:30               ` David Greaves
  2006-04-22 23:17             ` Christian Pedaschus
  1 sibling, 2 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 21:17 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

Jonathan wrote:
> /dev/md0               591G    54G   507G  10% /md0

Hrm.  Have you been hassling us for a mere 54G worth of data?
(Just kidding :-D.)

> now I'm doing a:
> (cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)

I wonder - if you have the disk space, why not take David Greaves'
advice and create a backup of the individual disks before fiddling
with md superblocks?

Anyway, a quick cheat sheet might come in handy:
* Hot-add a disk to an array (outdated raid data on that disk won't be used):
  # mdadm /dev/md1 --add /dev/etherd/e0.xyz

* Check that parity is OK for an entire array:
  # echo check > /sys/block/md1/md/sync_action

* Manually start a resync:
  # echo repair > /sys/block/md1/md/sync_action

* Experiment with MDADM by using sparse files:
  # dd if=/dev/zero of=sparse0 bs=1M seek=200000 count=1
  # dd if=/dev/zero of=sparse1 bs=1M seek=200000 count=1
  # dd if=/dev/zero of=sparse2 bs=1M seek=200000 count=1
  # dd if=/dev/zero of=sparse3 bs=1M seek=200000 count=1
  # losetup /dev/loop0 sparse0
  # losetup /dev/loop1 sparse1
  # losetup /dev/loop2 sparse2
  # losetup /dev/loop3 sparse3
  # mdadm -C /dev/md5 -c 32 -p left-a -n 4 -l 5 missing /dev/loop{1,2,3}

> thank you thank you thank you thank you thank you thank you

np.
(wait, does that mean I won't get my money? ;-))

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 21:17             ` Molle Bestefich
@ 2006-04-22 21:42               ` Carlos Carvalho
  2006-04-22 22:58                 ` Molle Bestefich
  2006-04-22 22:30               ` David Greaves
  1 sibling, 1 reply; 21+ messages in thread
From: Carlos Carvalho @ 2006-04-22 21:42 UTC (permalink / raw)
  To: linux-raid

Molle Bestefich (molle.bestefich@gmail.com) wrote on 22 April 2006 23:17:
 >Jonathan wrote:
 >> thank you thank you thank you thank you thank you thank you

Nice job Molle!

 >np.
 >(wait, does that mean I won't get my money? ;-))

At least you showed someone a new meaning of "support"...

Is it my turn now? Just look at this:

md1 : active raid5 sdp2[15](F) sdo2[16](F) sdn2[13] sdm2[12] sdl2[11] sdk2[10] sdj2[9] sdi2[8] sdh2[17](F) sdg2[18](F) sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      4198656 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]
      
md3 : active raid5 sdp5[15](F) sdo5[16](F) sdn5[13] sdm5[12] sdl5[11] sdk5[10] sdj5[9] sdi5[8] sdh5[17](F) sdg5[18](F) sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] sda5[0]
      588487424 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]
      
md4 : active raid5 sdp6[15](S) sdo6[16](F) sdn6[13] sdm6[12] sdl6[11] sdk6[10] sdj6[9] sdi6[8] sdh6[17](F) sdg6[18](F) sdf6[5] sde6[4] sdd6[3] sdc6[2] sdb6[1] sda6[0]
      2141065472 blocks level 5, 128k chunk, algorithm 2 [15/12] [UUUUUU__UUUUUU_]

This is a 15-array + spare where 3 or 4 disks dropped, *very* likely
because of these miserable crappy sata power connectors :-( :-( It's
unbelievable that the industry changes from a working standard to an
unreliable trash in such common and widespread devices...

I'm trying to get enough stamina to put the damn thing back on line...
Or at least try to :-(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 21:42               ` Carlos Carvalho
@ 2006-04-22 22:58                 ` Molle Bestefich
  0 siblings, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 22:58 UTC (permalink / raw)
  To: Carlos Carvalho; +Cc: linux-raid

Carlos Carvalho wrote:
> Is it my turn now?

Hah, nah, based on your postings you seem to know more about MD than I do ;-)

> [UUUUUU__UUUUUU_]
>
> This is a 15-array + spare where 3 or 4 disks dropped, *very* likely
> because of these miserable crappy sata power connectors :-( :-(

They really suck, don't they.
I had a SATA data connector that was slightly skewed the other day, no
more than 1.5mm, and it caused random read errors.  Gah.

> It's unbelievable that the industry changes from a working standard
> to an unreliable trash in such common and widespread devices...

Yes. Sigh..

> I'm trying to get enough stamina to put the damn thing back on line...

Guess it's just --assemble --force and then cross your fingers...

Hmm, your mdstat does look a bit weird.
Why wasn't the spare taken into use on md4?

Perhaps you should check the event counters.
Just in case the spares got added late in the game, got half-rebuilt
and the MD crapped out?
Wouldn't want to include them in the assemble in that case...
(Dunno if it could even happen.  Hopefully not =))

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 21:17             ` Molle Bestefich
  2006-04-22 21:42               ` Carlos Carvalho
@ 2006-04-22 22:30               ` David Greaves
  1 sibling, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22 22:30 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: Jonathan, linux-raid

Molle Bestefich wrote:
> Anyway, a quick cheat sheet might come in handy:
>   
Which is why I posted about a wiki a few days back :)

I'm progressing it and I'll see if we can't get something up.

There's a lot of info on the list and it would be nice to get it a
little more focused...

David

-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:55           ` Jonathan
  2006-04-22 21:17             ` Molle Bestefich
@ 2006-04-22 23:17             ` Christian Pedaschus
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Pedaschus @ 2006-04-22 23:17 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

nice to hear you got your data back.
now it's perhaps a good time to donate some money to some
ppls/oss-projects for saving your ass ;) ;)

greets, chris



Jonathan wrote:

> hazel /virtual # mdadm -C /dev/md0 -c 32 -n 4 -l 5
> --parity=left-asymmetric missing /dev/etherd/e0.[023]
> mdadm: /dev/etherd/e0.0 appears to be part of a raid array:
>    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> mdadm: /dev/etherd/e0.2 appears to be part of a raid array:
>    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> mdadm: /dev/etherd/e0.3 appears to contain an ext2fs file system
>    size=720300416K  mtime=Wed Oct  5 16:39:28 2005
> mdadm: /dev/etherd/e0.3 appears to be part of a raid array:
>    level=5 devices=4 ctime=Sat Apr 22 13:25:40 2006
> Continue creating array? y
> mdadm: array /dev/md0 started.
> hazel /virtual # mount -t ext2 -r /dev/md0 /md0
> hazel /virtual # df -H
> Filesystem             Size   Used  Avail Use% Mounted on
> /dev/hda4               67G   5.8G    58G  10% /
> udev                   526M   177k   526M   1% /dev
> /dev/hda3              8.1G    34M   7.7G   1% /tmp
> none                   526M      0   526M   0% /dev/shm
> /dev/md1               591G    11G   551G   2% /virtual
> /dev/md0               591G    54G   507G  10% /md0
>
> now I'm doing a:
>
> (cd /md0 && tar cf - . ) | (cd /virtual/recover/ && tar xvfp -)
>
> thank you thank you thank you thank you thank you thank you
>
>
> Molle Bestefich wrote:
>
>> Jonathan wrote:
>>  
>>
>>> Well, the block sizes are back to 32k now, but I still had no luck
>>> mounting /dev/md0 once I created the array.
>>>   
>>
>>
>> Ahem, I missed something.
>> Sorry, the 'a' was hard to spot.
>>
>> Your array used layout : left-asymmetric, while the superblock you've
>> just created has layout: left-symmetric.
>>
>> Try again, but add the option "--parity=left-asymmetric"
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:32       ` Jonathan
  2006-04-22 20:38         ` Molle Bestefich
@ 2006-04-22 20:51         ` Molle Bestefich
  1 sibling, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22 20:51 UTC (permalink / raw)
  To: Jonathan; +Cc: linux-raid

Jonathan wrote:
> how safe should the following be?
>
> mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792
> /dev/etherd/e0.[01234]

You can hardly do --assemble anymore.
After you have recreated superblocks on some of the devices, those are
conceptually part of a different raid array.  At least as seen by MD.

> I am *really* not interested in making my situation worse.

We'll keep going till you got your data back..
Recreating superblocks again on e0.{0,2,3} can't hurt, since you've
already done this and thereby nuked the old superblocks.

You can shake your own hand and thank yourself now (oh, and Sam too)
for posting all the debug output you have.  Otherwise we would
probably never have spotted nor known about the parity/chunk size
differences :o).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 20:07   ` Jonathan
  2006-04-22 20:22     ` Molle Bestefich
@ 2006-04-22 20:28     ` Carlos Carvalho
  1 sibling, 0 replies; 21+ messages in thread
From: Carlos Carvalho @ 2006-04-22 20:28 UTC (permalink / raw)
  To: linux-raid

Jonathan (jrs@abhost.net) wrote on 22 April 2006 13:07:
 >I was already terrified of screwing things up -- now I'm afraid of 
 >making things worse
 >
 >based on what was posted before is this a sensible thing to try?
 >
 >mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing /dev/etherd/e0.[023]
 >
 >Is what I've done to the superblock size recoverable?

Raid metadata are stored at the end of the partition, in a small area.
Perhaps some overwrite of data happened, I don't know, but it'd be
small. If you re-make the superblock with the right chunk size you
can read the data back.

I'd suggest following the help you're getting on the list, and if you
don't understand something ask before running any commands. Also don't
do anything that causes writes to the disks (such as fsck) before
getting the raid back...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-22 19:48 ` Molle Bestefich
  2006-04-22 20:07   ` Jonathan
@ 2006-04-23  2:46   ` Neil Brown
  1 sibling, 0 replies; 21+ messages in thread
From: Neil Brown @ 2006-04-23  2:46 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: Jonathan, linux-raid

On Saturday April 22, molle.bestefich@gmail.com wrote:
> Jonathan wrote:
> > # mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
> 
> I think you should have tried "mdadm --assemble --force" first, as I
> proposed earlier.
> 
> By doing the above, you have effectively replaced your version 0.9.0
> superblocks with version 0.9.2.  I don't know if version 0.9.2
> superblocks are larger than 0.9.0, Neil hasn't responded to that yet. 
> Potentially hazardous, who knows.

There is no difference in the superblock between 0.90.0 and 0.90.2.

md has always used version numbers, but always in a confusing way.
There should be two completely separate version numbers: the version
for the format of the superblock, and the version for the software
implementation.  md confuses these two.

To try to sort it out, I have decided that:
 - The 'major' version number is the overall choice of superblock
    This is currently 0 or 1
 - The 'minor' version encodes minor variation in the superblock.
    For version 1, this is different locations (there are other bits
    in the superblock to allow new fields to be added)
    For version 0, it is currently only used to make sure old software
      doesn't try to assemble an array which is undergoing a
      shape, as that would confuse it totals.

 - The 'patchlevel' is used to indicate feature availability in
   the implementation.  It really should be stored in the superblock,
   but it is for historical reasons.  It is not checked when
   validating a superblock.
  To quote from md.h

/*
 * MD_PATCHLEVEL_VERSION indicates kernel functionality.
 * >=1 means different superblock formats are selectable using SET_ARRAY_INFO
 *     and major_version/minor_version accordingly
 * >=2 means that Internal bitmaps are supported by setting MD_SB_BITMAP_PRESENT
 *     in the super status byte
 * >=3 means that bitmap superblock version 4 is supported, which uses
 *     little-ending representation rather than host-endian
 */

Hope that helps.

NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* data recovery on raid5
@ 2006-04-21 23:11 Sam Hopkins
  2006-04-21 23:31 ` Mike Tran
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Sam Hopkins @ 2006-04-21 23:11 UTC (permalink / raw)
  To: linux-raid; +Cc: jrs, support

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

Hello,

I have a client with a failed raid5 that is in desperate need of the
data that's on the raid.  The attached file holds the mdadm -E
superblocks that are hopefully the keys to the puzzle.  Linux-raid
folks, if you can give any help here it would be much appreciated.

# mdadm -V
mdadm - v1.7.0 - 11 August 2004
# uname -a
Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux

Here's my take:

Logfiles show that last night drive /dev/etherd/e0.4 failed and around
noon today /dev/etherd/e0.0 failed.  This jibes with the superblock
dates and info.

My assessment is that since the last known good configuration was
0 <missing>
1 /dev/etherd/e0.0
2 /dev/etherd/e0.2
3 /dev/etherd/e0.3

then we should shoot for this.  I couldn't figure out how to get there
using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
If anyone can suggest a way to get this back using -A, please chime in.

The alternative is to recreate the array with this configuration hoping
the data blocks will all line up properly so the filesystem can be mounted
and data retrieved.  It looks like the following command is the right
way to do this, but not being an expert I (and the client) would like
someone else to verify the sanity of this approach.

Will

mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]

do what we want?

Linux-raid folks, please reply-to-all as we're probably all not on
the list.

Thanks for your help,

Sam

[-- Attachment #2: mdadm-e.0234 --]
[-- Type: text/plain, Size: 4122 bytes --]

/dev/etherd/e0.0:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 21 12:45:07 2006
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 4cc955da - correct
         Events : 0.3488315

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     1     152        0        1      active sync   /dev/etherd/e0.0

   0     0       0        0        0      removed
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        0      spare   /dev/etherd/e0.1
/dev/etherd/e0.2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 21 14:03:12 2006
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 1
       Checksum : 4cc991e9 - correct
         Events : 0.3493633

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     2     152       32        2      active sync   /dev/etherd/e0.2

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1
/dev/etherd/e0.3:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Fri Apr 21 14:03:12 2006
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 1
       Checksum : 4cc991fb - correct
         Events : 0.3493633

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     3     152       48        3      active sync   /dev/etherd/e0.3

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1
/dev/etherd/e0.4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
  Creation Time : Mon Jan  3 03:16:48 2005
     Raid Level : raid5
    Device Size : 195360896 (186.31 GiB 200.05 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0

    Update Time : Thu Apr 20 21:07:50 2006
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 4cc84d59 - correct
         Events : 0.3482550

         Layout : left-asymmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     0     152       64        0      active sync   /dev/etherd/e0.4

   0     0     152       64        0      active sync   /dev/etherd/e0.4
   1     1     152        0        1      active sync   /dev/etherd/e0.0
   2     2     152       32        2      active sync   /dev/etherd/e0.2
   3     3     152       48        3      active sync   /dev/etherd/e0.3
   4     4     152       16        4      spare   /dev/etherd/e0.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-21 23:11 Sam Hopkins
@ 2006-04-21 23:31 ` Mike Tran
  2006-04-21 23:38 ` Mike Hardy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Mike Tran @ 2006-04-21 23:31 UTC (permalink / raw)
  To: Sam Hopkins; +Cc: linux-raid, jrs, support

Sam Hopkins wrote:

>Hello,
>
>I have a client with a failed raid5 that is in desperate need of the
>data that's on the raid.  The attached file holds the mdadm -E
>superblocks that are hopefully the keys to the puzzle.  Linux-raid
>folks, if you can give any help here it would be much appreciated.
>
># mdadm -V
>mdadm - v1.7.0 - 11 August 2004
># uname -a
>Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
>Here's my take:
>
>Logfiles show that last night drive /dev/etherd/e0.4 failed and around
>noon today /dev/etherd/e0.0 failed.  This jibes with the superblock
>dates and info.
>
>My assessment is that since the last known good configuration was
>0 <missing>
>1 /dev/etherd/e0.0
>2 /dev/etherd/e0.2
>3 /dev/etherd/e0.3
>
>then we should shoot for this.  I couldn't figure out how to get there
>using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
>If anyone can suggest a way to get this back using -A, please chime in.
>
>The alternative is to recreate the array with this configuration hoping
>the data blocks will all line up properly so the filesystem can be mounted
>and data retrieved.  It looks like the following command is the right
>way to do this, but not being an expert I (and the client) would like
>someone else to verify the sanity of this approach.
>
>Will
>
>mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
>do what we want?
>
>Linux-raid folks, please reply-to-all as we're probably all not on
>the list.
>
>  
>
Yes, I would re-create the array with 1 missing disk.  mount read-only, 
verify your data.  If things are ok, remount read-write and remember to 
add a new disk to fix the degrade array.

With the "missing" keyword, no resync/recovery, thus the data on disk 
will be intact.

--
Regards,
Mike T.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-21 23:11 Sam Hopkins
  2006-04-21 23:31 ` Mike Tran
@ 2006-04-21 23:38 ` Mike Hardy
  2006-04-22  4:03 ` Molle Bestefich
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Mike Hardy @ 2006-04-21 23:38 UTC (permalink / raw)
  To: Sam Hopkins; +Cc: linux-raid, jrs, support


Recreate the array from the constituent drives in the order you mention,
with 'missing' in place of the first drive that failed?

It won't resync because it has a missing drive.

If you created it correctly, the data will be there

If you didn't create it correctly, you can keep trying permutations of
4-disk arrays with one missing until you see your data, and you should
find it.

-Mike

Sam Hopkins wrote:
> Hello,
> 
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid.  The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle.  Linux-raid
> folks, if you can give any help here it would be much appreciated.
> 
> # mdadm -V
> mdadm - v1.7.0 - 11 August 2004
> # uname -a
> Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
> 
> Here's my take:
> 
> Logfiles show that last night drive /dev/etherd/e0.4 failed and around
> noon today /dev/etherd/e0.0 failed.  This jibes with the superblock
> dates and info.
> 
> My assessment is that since the last known good configuration was
> 0 <missing>
> 1 /dev/etherd/e0.0
> 2 /dev/etherd/e0.2
> 3 /dev/etherd/e0.3
> 
> then we should shoot for this.  I couldn't figure out how to get there
> using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
> If anyone can suggest a way to get this back using -A, please chime in.
> 
> The alternative is to recreate the array with this configuration hoping
> the data blocks will all line up properly so the filesystem can be mounted
> and data retrieved.  It looks like the following command is the right
> way to do this, but not being an expert I (and the client) would like
> someone else to verify the sanity of this approach.
> 
> Will
> 
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
> 
> do what we want?
> 
> Linux-raid folks, please reply-to-all as we're probably all not on
> the list.
> 
> Thanks for your help,
> 
> Sam
> 
> 
> ------------------------------------------------------------------------
> 
> /dev/etherd/e0.0:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
>   Creation Time : Mon Jan  3 03:16:48 2005
>      Raid Level : raid5
>     Device Size : 195360896 (186.31 GiB 200.05 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Fri Apr 21 12:45:07 2006
>           State : clean
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : 4cc955da - correct
>          Events : 0.3488315
> 
>          Layout : left-asymmetric
>      Chunk Size : 32K
> 
>       Number   Major   Minor   RaidDevice State
> this     1     152        0        1      active sync   /dev/etherd/e0.0
> 
>    0     0       0        0        0      removed
>    1     1     152        0        1      active sync   /dev/etherd/e0.0
>    2     2     152       32        2      active sync   /dev/etherd/e0.2
>    3     3     152       48        3      active sync   /dev/etherd/e0.3
>    4     4     152       16        0      spare   /dev/etherd/e0.1
> /dev/etherd/e0.2:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
>   Creation Time : Mon Jan  3 03:16:48 2005
>      Raid Level : raid5
>     Device Size : 195360896 (186.31 GiB 200.05 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Fri Apr 21 14:03:12 2006
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 3
>   Spare Devices : 1
>        Checksum : 4cc991e9 - correct
>          Events : 0.3493633
> 
>          Layout : left-asymmetric
>      Chunk Size : 32K
> 
>       Number   Major   Minor   RaidDevice State
> this     2     152       32        2      active sync   /dev/etherd/e0.2
> 
>    0     0       0        0        0      removed
>    1     1       0        0        1      faulty removed
>    2     2     152       32        2      active sync   /dev/etherd/e0.2
>    3     3     152       48        3      active sync   /dev/etherd/e0.3
>    4     4     152       16        4      spare   /dev/etherd/e0.1
> /dev/etherd/e0.3:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
>   Creation Time : Mon Jan  3 03:16:48 2005
>      Raid Level : raid5
>     Device Size : 195360896 (186.31 GiB 200.05 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Fri Apr 21 14:03:12 2006
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 3
>   Spare Devices : 1
>        Checksum : 4cc991fb - correct
>          Events : 0.3493633
> 
>          Layout : left-asymmetric
>      Chunk Size : 32K
> 
>       Number   Major   Minor   RaidDevice State
> this     3     152       48        3      active sync   /dev/etherd/e0.3
> 
>    0     0       0        0        0      removed
>    1     1       0        0        1      faulty removed
>    2     2     152       32        2      active sync   /dev/etherd/e0.2
>    3     3     152       48        3      active sync   /dev/etherd/e0.3
>    4     4     152       16        4      spare   /dev/etherd/e0.1
> /dev/etherd/e0.4:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 8fe1fe85:eeb90460:c525faab:cdaab792
>   Creation Time : Mon Jan  3 03:16:48 2005
>      Raid Level : raid5
>     Device Size : 195360896 (186.31 GiB 200.05 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Thu Apr 20 21:07:50 2006
>           State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 1
>        Checksum : 4cc84d59 - correct
>          Events : 0.3482550
> 
>          Layout : left-asymmetric
>      Chunk Size : 32K
> 
>       Number   Major   Minor   RaidDevice State
> this     0     152       64        0      active sync   /dev/etherd/e0.4
> 
>    0     0     152       64        0      active sync   /dev/etherd/e0.4
>    1     1     152        0        1      active sync   /dev/etherd/e0.0
>    2     2     152       32        2      active sync   /dev/etherd/e0.2
>    3     3     152       48        3      active sync   /dev/etherd/e0.3
>    4     4     152       16        4      spare   /dev/etherd/e0.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-21 23:11 Sam Hopkins
  2006-04-21 23:31 ` Mike Tran
  2006-04-21 23:38 ` Mike Hardy
@ 2006-04-22  4:03 ` Molle Bestefich
  2006-04-22  7:43 ` David Greaves
  2006-04-22  8:51 ` David Greaves
  4 siblings, 0 replies; 21+ messages in thread
From: Molle Bestefich @ 2006-04-22  4:03 UTC (permalink / raw)
  To: Sam Hopkins; +Cc: linux-raid, jrs, support

Sam Hopkins wrote:
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]

While it should work, a bit drastic perhaps?
I'd start with mdadm --assemble --force.

With --force, mdadm will pull the event counter of the most-recently
failed drive up to current status which should give you a readable
array.

After that, you could try running a check by echo'ing "check" into
"sync_action".
If the check succeeds, fine, hotadd the last drive to your array and
MD will start resync'ing.

If the check fails because of a bad block, you'll have to make a decision.
Live with the lost blocks, or try and reconstruct from the first kicked disk.

I posted a patch this week that will allow you to forcefully get the
array started with all of the disks - but beware, MD wasn't made with
this in mind and will probably be confused and sometimes pick data
from the first-kicked drive over data from the other drives.  Only
forcefully start the array with all drives if you absolutely have
to...

Oh, and I'm not an expert by any means, so take everything I say with
a grain of salt :-).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-21 23:11 Sam Hopkins
                   ` (2 preceding siblings ...)
  2006-04-22  4:03 ` Molle Bestefich
@ 2006-04-22  7:43 ` David Greaves
  2006-04-22  8:51 ` David Greaves
  4 siblings, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22  7:43 UTC (permalink / raw)
  To: Sam Hopkins; +Cc: linux-raid, jrs, support

Sam Hopkins wrote:
> Hello,
>
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid.  The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle.  Linux-raid
> folks, if you can give any help here it would be much appreciated.
>   
Have you read the archive? There were a couple of similar problems
earlier this month.
take a look at 2 April 06 - "help recreating a raid5"
Also "Re: help wanted - 6-disk raid5 borked: _ _ U U U U"

> # mdadm -V
> mdadm - v1.7.0 - 11 August 2004
>   
Can't hurt to upgrade mdadm
> # uname -a
> Linux hazel 2.6.13-gentoo-r5 #1 SMP Sat Jan 21 13:24:15 PST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
>
> Here's my take:
>
> Logfiles show that last night drive /dev/etherd/e0.4 failed and around
> noon today /dev/etherd/e0.0 failed.  This jibes with the superblock
> dates and info.
>
> My assessment is that since the last known good configuration was
> 0 <missing>
> 1 /dev/etherd/e0.0
> 2 /dev/etherd/e0.2
> 3 /dev/etherd/e0.3
>
> then we should shoot for this.  I couldn't figure out how to get there
> using mdadm -A since /dev/etherd/e0.0 isn't in sync with e0.2 or e0.3.
> If anyone can suggest a way to get this back using -A, please chime in.
>   
See the patch Molle provided - it seemed to work for him and took the
guesswork out of the create parameters.
I personally didn't use it since Neil didn't bless it :)
> The alternative is to recreate the array with this configuration hoping
> the data blocks will all line up properly so the filesystem can be mounted
> and data retrieved.  It looks like the following command is the right
> way to do this, but not being an expert I (and the client) would like
> someone else to verify the sanity of this approach.
>
> Will
>
> mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
>
> do what we want?
>   
It looks right to me - but see comments below...
Also, can you take disk images of the devices (dd if=/dev/etherd/e0.0
of=/somewhere/e0.0.img) to allow for retries?
> ------------------------------------------------------------------------
>
> /dev/etherd/e0.0:
>          Events : 0.3488315
> /dev/etherd/e0.2:
>          Events : 0.3493633
> /dev/etherd/e0.3:
>          Events : 0.3493633
> /dev/etherd/e0.4:
>          Events : 0.3482550
>   
I don't know precisely what 'Events' are but I read this as being a lot
of activity on e0.[23] after e0.0 went down.
I think that's odd.
Maybe the kernel isn't stopping the device when it degrades - I seem to
remember something like this but I'm probably wrong... archives again...

This shouldn't affect the situation you're now in (horse,bolt,door etc)
but fixing it may make life better should another problem like this
occur - or it may not. Eventually there may be info in a wiki to help
understand this stuff.

HTH

David

-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: data recovery on raid5
  2006-04-21 23:11 Sam Hopkins
                   ` (3 preceding siblings ...)
  2006-04-22  7:43 ` David Greaves
@ 2006-04-22  8:51 ` David Greaves
  4 siblings, 0 replies; 21+ messages in thread
From: David Greaves @ 2006-04-22  8:51 UTC (permalink / raw)
  To: Sam Hopkins; +Cc: linux-raid, jrs, support

Sam Hopkins wrote:
> Hello,
>
> I have a client with a failed raid5 that is in desperate need of the
> data that's on the raid.  The attached file holds the mdadm -E
> superblocks that are hopefully the keys to the puzzle.  Linux-raid
> folks, if you can give any help here it would be much appreciated.
>   
snip
> Linux-raid folks, please reply-to-all as we're probably all not on
> the list.
>   
If you're going to post messages to public mailing lists (and solicit
help and private cc's!!!) then you should not be using mechanisms like
the one below. Please Google if you don't understand why not.

I've been getting so much junk mail that I'm resorting to
a draconian mechanism to avoid the mail.  In order
to make sure that there's a real person sending mail, I'm
asking you to explicitly enable access.  To do that, send
mail to sah at this domain with the token:
	qSGTt
in the subject of your mail message.  After that, you
shouldn't get any bounces from me.  Sorry if this is
an inconvenience.

David


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-04-23  2:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-22 18:57 data recovery on raid5 Jonathan
2006-04-22 19:48 ` Molle Bestefich
2006-04-22 20:07   ` Jonathan
2006-04-22 20:22     ` Molle Bestefich
2006-04-22 20:32       ` Jonathan
2006-04-22 20:38         ` Molle Bestefich
2006-04-22 20:55           ` Jonathan
2006-04-22 21:17             ` Molle Bestefich
2006-04-22 21:42               ` Carlos Carvalho
2006-04-22 22:58                 ` Molle Bestefich
2006-04-22 22:30               ` David Greaves
2006-04-22 23:17             ` Christian Pedaschus
2006-04-22 20:51         ` Molle Bestefich
2006-04-22 20:28     ` Carlos Carvalho
2006-04-23  2:46   ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2006-04-21 23:11 Sam Hopkins
2006-04-21 23:31 ` Mike Tran
2006-04-21 23:38 ` Mike Hardy
2006-04-22  4:03 ` Molle Bestefich
2006-04-22  7:43 ` David Greaves
2006-04-22  8:51 ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).