linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid5 recovery dramas.
@ 2008-06-24  6:05 Mark Davies
  2008-06-26  2:43 ` Mark Davies
  2008-06-27 10:28 ` Neil Brown
  0 siblings, 2 replies; 8+ messages in thread
From: Mark Davies @ 2008-06-24  6:05 UTC (permalink / raw)
  To: linux-raid

Hi all,

Hoping to find some information to help me recover my software raid5 array.

Some background information first (excuse the hostname)

uname -a
Linux Fuckyfucky3 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 
GNU/Linux


It's a debian box that initially had 4 disks in a software raid5 array.

The problem started when I attempted to add another disk and grow the 
array.  I'd already done this from 3-4 disks using the instruction on 
this page:  "http://scotgate.org/?p=107".

However this time I unmounted the volume, but didn't do a fsck before 
starting.  I also discovered that for some reason mdadm wasn't 
monitoring the array.

Bad mistakes obviously - and I hope I've learnt from them.

Short version is that two of the disks had errors on them, and so mdadm 
disabled those disks about 50MB into the reshape.  Both failed SMART 
tests subsequently.

I bought two new disks, and used dd-recue to make copies of them, which 
seemed to work well.

Now however I can't restart the array.

I can see all 5 superblocks:

:~# mdadm --examine /dev/sd?1
/dev/sda1:
           Magic : a92b4efc
         Version : 01
     Feature Map : 0x4
      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
            Name : 'Fuckyfucky3':1
   Creation Time : Sun Dec 23 01:28:08 2007
      Raid Level : raid5
    Raid Devices : 5

     Device Size : 976767856 (465.76 GiB 500.11 GB)
      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
       Used Size : 976767488 (465.76 GiB 500.10 GB)
    Super Offset : 976767984 sectors
           State : clean
     Device UUID : 5b38c5a2:798c6793:91ad6d1e:9cfee153

   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
   Delta Devices : 1 (4->5)

     Update Time : Fri May 16 23:55:29 2008
        Checksum : 5354498d - correct
          Events : 1420762

          Layout : left-symmetric
      Chunk Size : 128K

     Array Slot : 3 (failed, 1, failed, 2, failed, 0)
    Array State : uuU__ 3 failed
/dev/sdb1:
           Magic : a92b4efc
         Version : 01
     Feature Map : 0x4
      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
            Name : 'Fuckyfucky3':1
   Creation Time : Sun Dec 23 01:28:08 2007
      Raid Level : raid5
    Raid Devices : 5

     Device Size : 976767856 (465.76 GiB 500.11 GB)
      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
       Used Size : 976767488 (465.76 GiB 500.10 GB)
    Super Offset : 976767984 sectors
           State : clean
     Device UUID : 673ba6d4:6c46fd55:745c9c93:3fa8bf21

   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
   Delta Devices : 1 (4->5)

     Update Time : Fri May 16 23:55:29 2008
        Checksum : 8ad75f10 - correct
          Events : 1420762

          Layout : left-symmetric
      Chunk Size : 128K

     Array Slot : 1 (failed, 1, failed, 2, failed, 0)
    Array State : uUu__ 3 failed
/dev/sdc1:
           Magic : a92b4efc
         Version : 01
     Feature Map : 0x4
      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
            Name : 'Fuckyfucky3':1
   Creation Time : Sun Dec 23 01:28:08 2007
      Raid Level : raid5
    Raid Devices : 5

     Device Size : 976767856 (465.76 GiB 500.11 GB)
      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
       Used Size : 976767488 (465.76 GiB 500.10 GB)
    Super Offset : 976767984 sectors
           State : clean
     Device UUID : 99b87c50:a919bd63:599a135f:9af385ba

   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
   Delta Devices : 1 (4->5)

     Update Time : Fri May 16 23:55:29 2008
        Checksum : 78ab38c3 - correct
          Events : 1420762

          Layout : left-symmetric
      Chunk Size : 128K

     Array Slot : 5 (failed, 1, failed, 2, failed, 0)
    Array State : Uuu__ 3 failed
/dev/sdd1:
           Magic : a92b4efc
         Version : 01
     Feature Map : 0x4
      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
            Name : 'Fuckyfucky3':1
   Creation Time : Sun Dec 23 01:28:08 2007
      Raid Level : raid5
    Raid Devices : 5

     Device Size : 976767856 (465.76 GiB 500.11 GB)
      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
       Used Size : 976767488 (465.76 GiB 500.10 GB)
    Super Offset : 976767984 sectors
           State : clean
     Device UUID : 89201477:8e950d20:9193016d:f5c9deb0

   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
   Delta Devices : 1 (4->5)

     Update Time : Fri May 16 23:55:29 2008
        Checksum : 5fc43e52 - correct
          Events : 0

          Layout : left-symmetric
      Chunk Size : 128K

     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
    Array State : uuu__ 3 failed
/dev/sde1:
           Magic : a92b4efc
         Version : 01
     Feature Map : 0x4
      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
            Name : 'Fuckyfucky3':1
   Creation Time : Sun Dec 23 01:28:08 2007
      Raid Level : raid5
    Raid Devices : 5

     Device Size : 976767856 (465.76 GiB 500.11 GB)
      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
       Used Size : 976767488 (465.76 GiB 500.10 GB)
    Super Offset : 976767984 sectors
           State : clean
     Device UUID : 89b53542:d1d820bc:f2ece884:4785869a

   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
   Delta Devices : 1 (4->5)

     Update Time : Fri May 16 23:55:29 2008
        Checksum : c89dd220 - correct
          Events : 1418968

          Layout : left-symmetric
      Chunk Size : 128K

     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
    Array State : uuu__ 3 failed




When I try to start the array, I get:

~# mdadm --assemble --verbose /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 
/dev/sdd1 /dev/sde1
mdadm: looking for devices for /dev/md1
mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 2.
mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot -1.
mdadm: /dev/sde1 is identified as a member of /dev/md1, slot -1.
mdadm: added /dev/sdb1 to /dev/md1 as 1
mdadm: added /dev/sda1 to /dev/md1 as 2
mdadm: no uptodate device for slot 3 of /dev/md1
mdadm: no uptodate device for slot 4 of /dev/md1
mdadm: added /dev/sdd1 to /dev/md1 as -1
mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy
mdadm: added /dev/sdc1 to /dev/md1 as 0
mdadm: /dev/md1 assembled from 3 drives and -1 spares - not enough to 
start the array.




Any help would be much appreciated.   If I can provide any more 
information, just ask.

As to why /dev/sde1 is busy, I don't know.  lsof shows no files open.


Regards,


Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-24  6:05 raid5 recovery dramas Mark Davies
@ 2008-06-26  2:43 ` Mark Davies
  2008-06-26 13:38   ` David Greaves
  2008-06-27 10:28 ` Neil Brown
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Davies @ 2008-06-26  2:43 UTC (permalink / raw)
  To: linux-raid

No takers?  Is there a different list anyone can suggest I repost this 
to, and any extra information I could include?

I found a link to a mdadm create/permutation script

http://linux-raid.osdl.org/index.php/Permute_array.pl

Would that appear to be useful in my situation?

My problematic array was created with mdadm version:

mdadm --version
mdadm - v2.5.6 - 9 November 2006

If I was to boot with a LiveCD (to get around this:

mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy

error, would the version of mdadm have to be the same, or just more recent?

Oh, and I'm willing to send a sixpack of beer or whatever in thanks.  :)



Regards,


Mark.



Mark Davies wrote:
> Hi all,
> 
> Hoping to find some information to help me recover my software raid5 array.
> 
> Some background information first (excuse the hostname)
> 
> uname -a
> Linux Fuckyfucky3 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 
> GNU/Linux
> 
> 
> It's a debian box that initially had 4 disks in a software raid5 array.
> 
> The problem started when I attempted to add another disk and grow the 
> array.  I'd already done this from 3-4 disks using the instruction on 
> this page:  "http://scotgate.org/?p=107".
> 
> However this time I unmounted the volume, but didn't do a fsck before 
> starting.  I also discovered that for some reason mdadm wasn't 
> monitoring the array.
> 
> Bad mistakes obviously - and I hope I've learnt from them.
> 
> Short version is that two of the disks had errors on them, and so mdadm 
> disabled those disks about 50MB into the reshape.  Both failed SMART 
> tests subsequently.
> 
> I bought two new disks, and used dd-recue to make copies of them, which 
> seemed to work well.
> 
> Now however I can't restart the array.
> 
> I can see all 5 superblocks:
> 
> :~# mdadm --examine /dev/sd?1
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x4
>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>            Name : 'Fuckyfucky3':1
>   Creation Time : Sun Dec 23 01:28:08 2007
>      Raid Level : raid5
>    Raid Devices : 5
> 
>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>    Super Offset : 976767984 sectors
>           State : clean
>     Device UUID : 5b38c5a2:798c6793:91ad6d1e:9cfee153
> 
>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri May 16 23:55:29 2008
>        Checksum : 5354498d - correct
>          Events : 1420762
> 
>          Layout : left-symmetric
>      Chunk Size : 128K
> 
>     Array Slot : 3 (failed, 1, failed, 2, failed, 0)
>    Array State : uuU__ 3 failed
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x4
>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>            Name : 'Fuckyfucky3':1
>   Creation Time : Sun Dec 23 01:28:08 2007
>      Raid Level : raid5
>    Raid Devices : 5
> 
>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>    Super Offset : 976767984 sectors
>           State : clean
>     Device UUID : 673ba6d4:6c46fd55:745c9c93:3fa8bf21
> 
>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri May 16 23:55:29 2008
>        Checksum : 8ad75f10 - correct
>          Events : 1420762
> 
>          Layout : left-symmetric
>      Chunk Size : 128K
> 
>     Array Slot : 1 (failed, 1, failed, 2, failed, 0)
>    Array State : uUu__ 3 failed
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x4
>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>            Name : 'Fuckyfucky3':1
>   Creation Time : Sun Dec 23 01:28:08 2007
>      Raid Level : raid5
>    Raid Devices : 5
> 
>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>    Super Offset : 976767984 sectors
>           State : clean
>     Device UUID : 99b87c50:a919bd63:599a135f:9af385ba
> 
>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri May 16 23:55:29 2008
>        Checksum : 78ab38c3 - correct
>          Events : 1420762
> 
>          Layout : left-symmetric
>      Chunk Size : 128K
> 
>     Array Slot : 5 (failed, 1, failed, 2, failed, 0)
>    Array State : Uuu__ 3 failed
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x4
>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>            Name : 'Fuckyfucky3':1
>   Creation Time : Sun Dec 23 01:28:08 2007
>      Raid Level : raid5
>    Raid Devices : 5
> 
>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>    Super Offset : 976767984 sectors
>           State : clean
>     Device UUID : 89201477:8e950d20:9193016d:f5c9deb0
> 
>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri May 16 23:55:29 2008
>        Checksum : 5fc43e52 - correct
>          Events : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 128K
> 
>     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>    Array State : uuu__ 3 failed
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 01
>     Feature Map : 0x4
>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>            Name : 'Fuckyfucky3':1
>   Creation Time : Sun Dec 23 01:28:08 2007
>      Raid Level : raid5
>    Raid Devices : 5
> 
>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>    Super Offset : 976767984 sectors
>           State : clean
>     Device UUID : 89b53542:d1d820bc:f2ece884:4785869a
> 
>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>   Delta Devices : 1 (4->5)
> 
>     Update Time : Fri May 16 23:55:29 2008
>        Checksum : c89dd220 - correct
>          Events : 1418968
> 
>          Layout : left-symmetric
>      Chunk Size : 128K
> 
>     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>    Array State : uuu__ 3 failed
> 
> 
> 
> 
> When I try to start the array, I get:
> 
> ~# mdadm --assemble --verbose /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 
> /dev/sdd1 /dev/sde1
> mdadm: looking for devices for /dev/md1
> mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 2.
> mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 1.
> mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot -1.
> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot -1.
> mdadm: added /dev/sdb1 to /dev/md1 as 1
> mdadm: added /dev/sda1 to /dev/md1 as 2
> mdadm: no uptodate device for slot 3 of /dev/md1
> mdadm: no uptodate device for slot 4 of /dev/md1
> mdadm: added /dev/sdd1 to /dev/md1 as -1
> mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy
> mdadm: added /dev/sdc1 to /dev/md1 as 0
> mdadm: /dev/md1 assembled from 3 drives and -1 spares - not enough to 
> start the array.
> 
> 
> 
> 
> Any help would be much appreciated.   If I can provide any more 
> information, just ask.
> 
> As to why /dev/sde1 is busy, I don't know.  lsof shows no files open.
> 
> 
> Regards,
> 
> 
> Mark.
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-26  2:43 ` Mark Davies
@ 2008-06-26 13:38   ` David Greaves
  2008-06-26 14:25     ` Mark Davies
  0 siblings, 1 reply; 8+ messages in thread
From: David Greaves @ 2008-06-26 13:38 UTC (permalink / raw)
  To: Mark Davies, Neil Brown; +Cc: linux-raid

Mark Davies wrote:
> No takers?  Is there a different list anyone can suggest I repost this
> to, and any extra information I could include?

You are in the right place - but this may be a nasty problem.
I'd wait for Neil to comment (cc'ed to attract his attention to this one)

You've grown an array from 4-5 and had a 2 disk failure part way through - ouch!!

However, you've recovered the 2 failed disks using ddrescue but of course the
superblock event counts are wrong.

It may be that a simple --assemble --force would work. I've not had enough
experience of failed grow operations.

the /dev/sde1 problem *may* be caused by lvm - try stopping that. However doing
this from an uptodate rescue CD sounds sensible.

You *don't* want to mess with --create and --permute. That's almost guaranteed
to kill the array in this case (due to the reshape).

David

> 
> I found a link to a mdadm create/permutation script
> 
> http://linux-raid.osdl.org/index.php/Permute_array.pl
> 
> Would that appear to be useful in my situation?
> 
> My problematic array was created with mdadm version:
> 
> mdadm --version
> mdadm - v2.5.6 - 9 November 2006
> 
> If I was to boot with a LiveCD (to get around this:
> 
> mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy
> 
> error, would the version of mdadm have to be the same, or just more recent?
> 
> Oh, and I'm willing to send a sixpack of beer or whatever in thanks.  :)
> 
> 
> 
> Regards,
> 
> 
> Mark.
> 
> 
> 
> Mark Davies wrote:
>> Hi all,
>>
>> Hoping to find some information to help me recover my software raid5
>> array.
>>
>> Some background information first (excuse the hostname)
>>
>> uname -a
>> Linux Fuckyfucky3 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686
>> GNU/Linux
>>
>>
>> It's a debian box that initially had 4 disks in a software raid5 array.
>>
>> The problem started when I attempted to add another disk and grow the
>> array.  I'd already done this from 3-4 disks using the instruction on
>> this page:  "http://scotgate.org/?p=107".
>>
>> However this time I unmounted the volume, but didn't do a fsck before
>> starting.  I also discovered that for some reason mdadm wasn't
>> monitoring the array.
>>
>> Bad mistakes obviously - and I hope I've learnt from them.
>>
>> Short version is that two of the disks had errors on them, and so
>> mdadm disabled those disks about 50MB into the reshape.  Both failed
>> SMART tests subsequently.
>>
>> I bought two new disks, and used dd-recue to make copies of them,
>> which seemed to work well.
>>
>> Now however I can't restart the array.
>>
>> I can see all 5 superblocks:
>>
>> :~# mdadm --examine /dev/sd?1
>> /dev/sda1:
>>           Magic : a92b4efc
>>         Version : 01
>>     Feature Map : 0x4
>>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>>            Name : 'Fuckyfucky3':1
>>   Creation Time : Sun Dec 23 01:28:08 2007
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>>    Super Offset : 976767984 sectors
>>           State : clean
>>     Device UUID : 5b38c5a2:798c6793:91ad6d1e:9cfee153
>>
>>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>>   Delta Devices : 1 (4->5)
>>
>>     Update Time : Fri May 16 23:55:29 2008
>>        Checksum : 5354498d - correct
>>          Events : 1420762
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>     Array Slot : 3 (failed, 1, failed, 2, failed, 0)
>>    Array State : uuU__ 3 failed
>> /dev/sdb1:
>>           Magic : a92b4efc
>>         Version : 01
>>     Feature Map : 0x4
>>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>>            Name : 'Fuckyfucky3':1
>>   Creation Time : Sun Dec 23 01:28:08 2007
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>>    Super Offset : 976767984 sectors
>>           State : clean
>>     Device UUID : 673ba6d4:6c46fd55:745c9c93:3fa8bf21
>>
>>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>>   Delta Devices : 1 (4->5)
>>
>>     Update Time : Fri May 16 23:55:29 2008
>>        Checksum : 8ad75f10 - correct
>>          Events : 1420762
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>     Array Slot : 1 (failed, 1, failed, 2, failed, 0)
>>    Array State : uUu__ 3 failed
>> /dev/sdc1:
>>           Magic : a92b4efc
>>         Version : 01
>>     Feature Map : 0x4
>>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>>            Name : 'Fuckyfucky3':1
>>   Creation Time : Sun Dec 23 01:28:08 2007
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>>    Super Offset : 976767984 sectors
>>           State : clean
>>     Device UUID : 99b87c50:a919bd63:599a135f:9af385ba
>>
>>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>>   Delta Devices : 1 (4->5)
>>
>>     Update Time : Fri May 16 23:55:29 2008
>>        Checksum : 78ab38c3 - correct
>>          Events : 1420762
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>     Array Slot : 5 (failed, 1, failed, 2, failed, 0)
>>    Array State : Uuu__ 3 failed
>> /dev/sdd1:
>>           Magic : a92b4efc
>>         Version : 01
>>     Feature Map : 0x4
>>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>>            Name : 'Fuckyfucky3':1
>>   Creation Time : Sun Dec 23 01:28:08 2007
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>>    Super Offset : 976767984 sectors
>>           State : clean
>>     Device UUID : 89201477:8e950d20:9193016d:f5c9deb0
>>
>>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>>   Delta Devices : 1 (4->5)
>>
>>     Update Time : Fri May 16 23:55:29 2008
>>        Checksum : 5fc43e52 - correct
>>          Events : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>>    Array State : uuu__ 3 failed
>> /dev/sde1:
>>           Magic : a92b4efc
>>         Version : 01
>>     Feature Map : 0x4
>>      Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>>            Name : 'Fuckyfucky3':1
>>   Creation Time : Sun Dec 23 01:28:08 2007
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>     Device Size : 976767856 (465.76 GiB 500.11 GB)
>>      Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>>       Used Size : 976767488 (465.76 GiB 500.10 GB)
>>    Super Offset : 976767984 sectors
>>           State : clean
>>     Device UUID : 89b53542:d1d820bc:f2ece884:4785869a
>>
>>   Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>>   Delta Devices : 1 (4->5)
>>
>>     Update Time : Fri May 16 23:55:29 2008
>>        Checksum : c89dd220 - correct
>>          Events : 1418968
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>     Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>>    Array State : uuu__ 3 failed
>>
>>
>>
>>
>> When I try to start the array, I get:
>>
>> ~# mdadm --assemble --verbose /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1
>> /dev/sdd1 /dev/sde1
>> mdadm: looking for devices for /dev/md1
>> mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 2.
>> mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 1.
>> mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
>> mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot -1.
>> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot -1.
>> mdadm: added /dev/sdb1 to /dev/md1 as 1
>> mdadm: added /dev/sda1 to /dev/md1 as 2
>> mdadm: no uptodate device for slot 3 of /dev/md1
>> mdadm: no uptodate device for slot 4 of /dev/md1
>> mdadm: added /dev/sdd1 to /dev/md1 as -1
>> mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy
>> mdadm: added /dev/sdc1 to /dev/md1 as 0
>> mdadm: /dev/md1 assembled from 3 drives and -1 spares - not enough to
>> start the array.
>>
>>
>>
>>
>> Any help would be much appreciated.   If I can provide any more
>> information, just ask.
>>
>> As to why /dev/sde1 is busy, I don't know.  lsof shows no files open.
>>
>>
>> Regards,
>>
>>
>> Mark.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-26 13:38   ` David Greaves
@ 2008-06-26 14:25     ` Mark Davies
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Davies @ 2008-06-26 14:25 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

Hi David,

Thanks for your reply.  Good summary of events too.

>> the /dev/sde1 problem *may* be caused by lvm - try stopping that. However doing
>> this from an uptodate rescue CD sounds sensible.

I'm not running LVM - there's only one ext3 partition on that array.

Will try a live CD and see what that does.

So it's not overly critical if the LiveCD has a slightly different 
kernel version and version of mdadm?


Cheers,


Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-24  6:05 raid5 recovery dramas Mark Davies
  2008-06-26  2:43 ` Mark Davies
@ 2008-06-27 10:28 ` Neil Brown
  2008-06-27 11:14   ` Mark Davies
  1 sibling, 1 reply; 8+ messages in thread
From: Neil Brown @ 2008-06-27 10:28 UTC (permalink / raw)
  To: Mark Davies; +Cc: linux-raid

On Tuesday June 24, mark@curly.ii.net wrote:
> Hi all,
> 
> Hoping to find some information to help me recover my software raid5 array.

You are in a rather stick situation.

Neither  sdd1 or sde1 know where they belong in the array.  If they
did, then  "mdadm --assemble --force" would probably be able to help
you (I should test that).  But they don't.

Do you have any boot logs from before you started the reshape that
show which device fills which slot in the array?

sdd1 has an event count of 0.  That is really odd.  Any idea how that
happened?  Did you remove it from the array and try to add it back?
That wouldn't have been a good idea.

I'm at a bit of a loss as to what to suggest.  The data is mostly
there, but getting it back is tricky.

What you need to do is 
   choose one of sdd and sde which you think is device  '3'
     (sdc is 0, sdb is 1, sda is 2).
   rewrite the metadata to assert this fact
   assemble the array read-only with sd[abc] and the one you choose
   read the data to make sure it is all where
   switch to read-write so the reshape competes, leaving you with
    a degraded array
   add the other drive and let it recover.

The early steps in particular are not easy.

I'll try to find some time to experiment, but I cannot promise
anything.

If you can remember everything you tried to do (maybe in
.bash_history) that might help.

NeilBrown



> 
> Some background information first (excuse the hostname)
> 
> uname -a
> Linux Fuckyfucky3 2.6.18-4-686 #1 SMP Wed May 9 23:03:12 UTC 2007 i686 
> GNU/Linux
> 
> 
> It's a debian box that initially had 4 disks in a software raid5 array.
> 
> The problem started when I attempted to add another disk and grow the 
> array.  I'd already done this from 3-4 disks using the instruction on 
> this page:  "http://scotgate.org/?p=107".
> 
> However this time I unmounted the volume, but didn't do a fsck before 
> starting.  I also discovered that for some reason mdadm wasn't 
> monitoring the array.
> 
> Bad mistakes obviously - and I hope I've learnt from them.
> 
> Short version is that two of the disks had errors on them, and so mdadm 
> disabled those disks about 50MB into the reshape.  Both failed SMART 
> tests subsequently.
> 
> I bought two new disks, and used dd-recue to make copies of them, which 
> seemed to work well.
> 
> Now however I can't restart the array.
> 
> I can see all 5 superblocks:
> 
> :~# mdadm --examine /dev/sd?1
> /dev/sda1:
>            Magic : a92b4efc
>          Version : 01
>      Feature Map : 0x4
>       Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>             Name : 'Fuckyfucky3':1
>    Creation Time : Sun Dec 23 01:28:08 2007
>       Raid Level : raid5
>     Raid Devices : 5
> 
>      Device Size : 976767856 (465.76 GiB 500.11 GB)
>       Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>        Used Size : 976767488 (465.76 GiB 500.10 GB)
>     Super Offset : 976767984 sectors
>            State : clean
>      Device UUID : 5b38c5a2:798c6793:91ad6d1e:9cfee153
> 
>    Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>    Delta Devices : 1 (4->5)
> 
>      Update Time : Fri May 16 23:55:29 2008
>         Checksum : 5354498d - correct
>           Events : 1420762
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>      Array Slot : 3 (failed, 1, failed, 2, failed, 0)
>     Array State : uuU__ 3 failed
> /dev/sdb1:
>            Magic : a92b4efc
>          Version : 01
>      Feature Map : 0x4
>       Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>             Name : 'Fuckyfucky3':1
>    Creation Time : Sun Dec 23 01:28:08 2007
>       Raid Level : raid5
>     Raid Devices : 5
> 
>      Device Size : 976767856 (465.76 GiB 500.11 GB)
>       Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>        Used Size : 976767488 (465.76 GiB 500.10 GB)
>     Super Offset : 976767984 sectors
>            State : clean
>      Device UUID : 673ba6d4:6c46fd55:745c9c93:3fa8bf21
> 
>    Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>    Delta Devices : 1 (4->5)
> 
>      Update Time : Fri May 16 23:55:29 2008
>         Checksum : 8ad75f10 - correct
>           Events : 1420762
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>      Array Slot : 1 (failed, 1, failed, 2, failed, 0)
>     Array State : uUu__ 3 failed
> /dev/sdc1:
>            Magic : a92b4efc
>          Version : 01
>      Feature Map : 0x4
>       Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>             Name : 'Fuckyfucky3':1
>    Creation Time : Sun Dec 23 01:28:08 2007
>       Raid Level : raid5
>     Raid Devices : 5
> 
>      Device Size : 976767856 (465.76 GiB 500.11 GB)
>       Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>        Used Size : 976767488 (465.76 GiB 500.10 GB)
>     Super Offset : 976767984 sectors
>            State : clean
>      Device UUID : 99b87c50:a919bd63:599a135f:9af385ba
> 
>    Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>    Delta Devices : 1 (4->5)
> 
>      Update Time : Fri May 16 23:55:29 2008
>         Checksum : 78ab38c3 - correct
>           Events : 1420762
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>      Array Slot : 5 (failed, 1, failed, 2, failed, 0)
>     Array State : Uuu__ 3 failed
> /dev/sdd1:
>            Magic : a92b4efc
>          Version : 01
>      Feature Map : 0x4
>       Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>             Name : 'Fuckyfucky3':1
>    Creation Time : Sun Dec 23 01:28:08 2007
>       Raid Level : raid5
>     Raid Devices : 5
> 
>      Device Size : 976767856 (465.76 GiB 500.11 GB)
>       Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>        Used Size : 976767488 (465.76 GiB 500.10 GB)
>     Super Offset : 976767984 sectors
>            State : clean
>      Device UUID : 89201477:8e950d20:9193016d:f5c9deb0
> 
>    Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>    Delta Devices : 1 (4->5)
> 
>      Update Time : Fri May 16 23:55:29 2008
>         Checksum : 5fc43e52 - correct
>           Events : 0
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>      Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>     Array State : uuu__ 3 failed
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 01
>      Feature Map : 0x4
>       Array UUID : 43eff327:8d1aa506:c0df2849:005c003f
>             Name : 'Fuckyfucky3':1
>    Creation Time : Sun Dec 23 01:28:08 2007
>       Raid Level : raid5
>     Raid Devices : 5
> 
>      Device Size : 976767856 (465.76 GiB 500.11 GB)
>       Array Size : 3907069952 (1863.04 GiB 2000.42 GB)
>        Used Size : 976767488 (465.76 GiB 500.10 GB)
>     Super Offset : 976767984 sectors
>            State : clean
>      Device UUID : 89b53542:d1d820bc:f2ece884:4785869a
> 
>    Reshape pos'n : 143872 (140.52 MiB 147.32 MB)
>    Delta Devices : 1 (4->5)
> 
>      Update Time : Fri May 16 23:55:29 2008
>         Checksum : c89dd220 - correct
>           Events : 1418968
> 
>           Layout : left-symmetric
>       Chunk Size : 128K
> 
>      Array Slot : 6 (failed, 1, failed, 2, failed, 0)
>     Array State : uuu__ 3 failed
> 
> 
> 
> 
> When I try to start the array, I get:
> 
> ~# mdadm --assemble --verbose /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1 
> /dev/sdd1 /dev/sde1
> mdadm: looking for devices for /dev/md1
> mdadm: /dev/sda1 is identified as a member of /dev/md1, slot 2.
> mdadm: /dev/sdb1 is identified as a member of /dev/md1, slot 1.
> mdadm: /dev/sdc1 is identified as a member of /dev/md1, slot 0.
> mdadm: /dev/sdd1 is identified as a member of /dev/md1, slot -1.
> mdadm: /dev/sde1 is identified as a member of /dev/md1, slot -1.
> mdadm: added /dev/sdb1 to /dev/md1 as 1
> mdadm: added /dev/sda1 to /dev/md1 as 2
> mdadm: no uptodate device for slot 3 of /dev/md1
> mdadm: no uptodate device for slot 4 of /dev/md1
> mdadm: added /dev/sdd1 to /dev/md1 as -1
> mdadm: failed to add /dev/sde1 to /dev/md1: Device or resource busy
> mdadm: added /dev/sdc1 to /dev/md1 as 0
> mdadm: /dev/md1 assembled from 3 drives and -1 spares - not enough to 
> start the array.
> 
> 
> 
> 
> Any help would be much appreciated.   If I can provide any more 
> information, just ask.
> 
> As to why /dev/sde1 is busy, I don't know.  lsof shows no files open.
> 
> 
> Regards,
> 
> 
> Mark.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-27 10:28 ` Neil Brown
@ 2008-06-27 11:14   ` Mark Davies
  2008-06-27 20:44     ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Davies @ 2008-06-27 11:14 UTC (permalink / raw)
  To: Neil Brown, linux-raid

Neil Brown wrote:
> You are in a rather stick situation.

Hmm, yes, I'm starting to realise that.
> 
> Neither  sdd1 or sde1 know where they belong in the array.  If they
> did, then  "mdadm --assemble --force" would probably be able to help
> you (I should test that).  But they don't.
> 
> Do you have any boot logs from before you started the reshape that
> show which device fills which slot in the array?
> 
Not that I can find, and the physical drives have changed since I used 
dd_rescue to recover from the bad sectors.

> sdd1 has an event count of 0.  That is really odd.  Any idea how that
> happened?  Did you remove it from the array and try to add it back?
> That wouldn't have been a good idea.
> 
I don't recall removing any drives, however it was a month or so ago 
that this saga started.  I was fairly careful to not do anything 
irreversable I think.

Just checked the bash history, and I didn't remove any drives.  Amusing 
history though - you can almost smell the desperation and fear in every 
entry.

> I'm at a bit of a loss as to what to suggest.  The data is mostly
> there, but getting it back is tricky.
> 
> What you need to do is 
>    choose one of sdd and sde which you think is device  '3'
>      (sdc is 0, sdb is 1, sda is 2).
>    rewrite the metadata to assert this fact
>    assemble the array read-only with sd[abc] and the one you choose
>    read the data to make sure it is all where
>    switch to read-write so the reshape competes, leaving you with
>     a degraded array
>    add the other drive and let it recover.
> 
> The early steps in particular are not easy.

Since there's only two options, what's to stop me taking a backup of the 
metadata, and then rewriting the metadata on one drive, mounting it, 
seeing if it makes sense.  If it does, great.  If it doesn't, then 
restore the metadata and repeat the process on the other drive.

Or am I missing an important step?


> 
> I'll try to find some time to experiment, but I cannot promise
> anything.
> 
> If you can remember everything you tried to do (maybe in
> .bash_history) that might help.
> 
> NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-27 11:14   ` Mark Davies
@ 2008-06-27 20:44     ` Neil Brown
  2008-06-30  7:03       ` Mark Davies
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2008-06-27 20:44 UTC (permalink / raw)
  To: Mark Davies; +Cc: linux-raid

On Friday June 27, mark@curly.ii.net wrote:
> > I'm at a bit of a loss as to what to suggest.  The data is mostly
> > there, but getting it back is tricky.
> > 
> > What you need to do is 
> >    choose one of sdd and sde which you think is device  '3'
> >      (sdc is 0, sdb is 1, sda is 2).
> >    rewrite the metadata to assert this fact
> >    assemble the array read-only with sd[abc] and the one you choose
> >    read the data to make sure it is all where
> >    switch to read-write so the reshape competes, leaving you with
> >     a degraded array
> >    add the other drive and let it recover.
> > 
> > The early steps in particular are not easy.
> 
> Since there's only two options, what's to stop me taking a backup of the 
> metadata, and then rewriting the metadata on one drive, mounting it, 
> seeing if it makes sense.  If it does, great.  If it doesn't, then 
> restore the metadata and repeat the process on the other drive.
> 
> Or am I missing an important step?

Yes, you could do that.
But re-writing the metadata is non-trivial, and I'm not confident
about how to start the array read-only.
  echo 1 > /sys/module/md-mod/parameters/start_ro
might do it, bit I would want to test with some scratch data first.

I would create some loop back devices over files and try to make a
similar situation and assemble the array and assure my self that
reshape doesn't start automatically (it shouldn't while the array is
readonly) before actually doing anything to the real devices.

NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: raid5 recovery dramas.
  2008-06-27 20:44     ` Neil Brown
@ 2008-06-30  7:03       ` Mark Davies
  0 siblings, 0 replies; 8+ messages in thread
From: Mark Davies @ 2008-06-30  7:03 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

Neil Brown wrote:

> I would create some loop back devices over files and try to make a
> similar situation and assemble the array and assure my self that
> reshape doesn't start automatically (it shouldn't while the array is
> readonly) before actually doing anything to the real devices.

Hmm, I understand what you're asking and what it means, however actually 
/doing/ it is beyond my skills at the moment.

I haven't had a chance to bring down the box and try booting from a 
liveCD, but based on the above, I don't think that's likely to work.

Thanks for your help, but I'm feeling a little discouraged at this point.


Cheers,


Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-30  7:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24  6:05 raid5 recovery dramas Mark Davies
2008-06-26  2:43 ` Mark Davies
2008-06-26 13:38   ` David Greaves
2008-06-26 14:25     ` Mark Davies
2008-06-27 10:28 ` Neil Brown
2008-06-27 11:14   ` Mark Davies
2008-06-27 20:44     ` Neil Brown
2008-06-30  7:03       ` Mark Davies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).