Re: RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact?

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Nicolas Jungers <nicolas@jungers.net>
To: "Tor Arne Vestbø" <torarnv@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact?
Date: Mon, 16 Aug 2010 18:37:56 +0200	[thread overview]
Message-ID: <4C696964.7030205@jungers.net> (raw)
In-Reply-To: <AANLkTinaGGH6H2cj3tXR8m0yef9YE65R4cHrQGoxwJfY@mail.gmail.com>

On 08/16/2010 06:27 PM, Tor Arne Vestbø wrote:
> On Mon, Aug 16, 2010 at 10:43 AM, Tim Small<tim@seoss.co.uk>  wrote:
>> On 16/08/10 07:12, Nicolas Jungers wrote:
>>>
>>> On 08/16/2010 07:54 AM, Tor Arne Vestbø wrote:
>>>>
>>>> You mean you sdc and sde plus either sdb or sdd, depending on which
>>>> one I think is more sane a this point?
>>>
>>> I'd try both.  Do a ddrescue of the failing one and try that (with copy of
>>> the others) and check what's coming out.
>>
>> As an alternative to using ddrescue, you could quickly prototype various
>> arrangements (without writing anything to the drives) using a device-mapper
>> copy-on-write mapping - I posted some details to the list a while back when
>> I was trying to use this to reconstruct a hw raid array...  Check the list
>> archives for details.
>
> Cool, here's what I tried:
>
> Created spares files for each of the devices
>
>    dd if=/dev/zero of=sdb_cow bs=1 count=0 seek=2GB
>
> Mapped that to a loop device
>
>    losetup /dev/loop1 sdb_cow
>
> Then ran the following for each device:
>
>    cow_size=`blockdev --getsize /dev/sdb1`
>    chunk_size=64
>    echo "0 $cow_size snapshot /dev/sdb1 /dev/loop1 p $chunk_size" |
> dmsetup create sdb1_cow
>
> After these were created I tried the following:
>
> # mdadm -v -C /dev/md0 -l5 -n4 /dev/mapper/sdb1_cow
> /dev/mapper/sdc1_cow missing /dev/mapper/sde1_cow
> mdadm: layout defaults to left-symmetric
> mdadm: chunk size defaults to 64K
> mdadm: /dev/mapper/sdb1_cow appears to be part of a raid array:
>      level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
> mdadm: /dev/mapper/sdc1_cow appears to be part of a raid array:
>      level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
> mdadm: /dev/mapper/sde1_cow appears to be part of a raid array:
>      level=raid5 devices=4 ctime=Sun Mar  2 22:52:53 2008
> mdadm: size set to 732571904K
> Continue creating array? Y
> mdadm: array /dev/md0 started.
>
> # mdadm --detail /dev/md0
> /dev/md0:
>          Version : 00.90
>    Creation Time : Mon Aug 16 18:20:06 2010
>       Raid Level : raid5
>       Array Size : 2197715712 (2095.91 GiB 2250.46 GB)
>    Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>     Raid Devices : 4
>    Total Devices : 3
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Mon Aug 16 18:20:06 2010
>            State : clean, degraded
>   Active Devices : 3
> Working Devices : 3
>   Failed Devices : 0
>    Spare Devices : 0
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>             UUID : 916ceaa2:b877a3cc:3973abef:31f2d600 (local to host monstre)
>           Events : 0.1
>
>      Number   Major   Minor   RaidDevice State
>         0     251        9        0      active sync   /dev/block/251:9
>         1     251       10        1      active sync   /dev/block/251:10
>         2       0        0        2      removed
>         3     251       12        3      active sync   /dev/block/251:12
>
> And I can now mount /dev/mapper/raid-home !
>
> The question now is, what next? Should I start copying things off to a
> backup, or run fsck first or something else to try to repair errors?
> Or perhaps are the 2GB sparse files to small for anything like that?

For me: first, copy everything.  You have an unreliable disk in the 
middle of your data.

N.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-08-16 16:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-15 18:52 RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact? Tor Arne Vestbø
2010-08-15 20:06 ` Tor Arne Vestbø
2010-08-15 22:33   ` Tor Arne Vestbø
2010-08-16  5:29     ` Nicolas Jungers
2010-08-16  5:59       ` Tor Arne Vestbø
     [not found]       ` <AANLkTim9gUa95AR1KZcyBp7qM8_PeO1O7Bh99R2P8ON9@mail.gmail.com>
2010-08-16  6:12         ` Nicolas Jungers
2010-08-16  8:43           ` Tim Small
2010-08-16 16:27             ` Tor Arne Vestbø
2010-08-16 16:37               ` Nicolas Jungers [this message]
2010-08-16 12:13           ` Tor Arne Vestbø
2010-08-16  5:49     ` Tor Arne Vestbø

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C696964.7030205@jungers.net \
    --to=nicolas@jungers.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=torarnv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox