Re: A few questions regarding RAID5/RAID6 recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: A few questions regarding RAID5/RAID6 recovery
Date: Tue, 26 Apr 2011 09:21:53 +0200	[thread overview]
Message-ID: <ip5rnd$g96$1@dough.gmane.org> (raw)
In-Reply-To: <002001cc0370$cd40ac90$67c205b0$@priv.hu>

On 25/04/2011 19:47, Kővári Péter wrote:
> Hi all,
>
> Since this is my first post here, let me first thank all developers
> for their great tool. It really is a wonderfull piece of soft. ;)
>
> I heard a lot of horror stories about the event, when a member of a
> raid5/6 array gets kicked off due to I/O errors, and then, after the
> replacement and during the recostruction, another drive fails, and
> the array become unusable. (For raid6, add another drive to the
> story, and the problem is the same, so let’s just talk about raid5
> now). I want to prepare myself for this kind of unlucky event, and
> build up a strategy that I can follow once it happens. (I hope never,
> but...)
>
> Let’s assume we have a 4 drives RAID5, that has been degraded, the
> failed drive has been replaced, then the rebuild process failed, and
> now we have an array with 2 good disks, one failed disk and one which
> is partially synchronized (the new one). And, we also have the disk
> out of the array, which was originally failed. If I assume, that both
> of the failed disks have some bad sectors but otherwise both are in
> an operative condition (can be dd-ed for example), then, except the
> unlikely event, when both disks have failed on the very same physical
> sector (chunk?), then theoretically the data is there and could be
> retrieved. So my question is, can we retrieve them by using mdadm and
> some „tricks”? I think of something like this:
>
> 1. I assemble (or --create --assume-clean) the array in degraded mode
> using the 2 good drives, and one of the 2 failed drives which has
> it's bad sectors behind the point than the other failed drive. 2. Add
> the new drive, let the array start rebuilding, and wait for the
> process go beyond the point where the other failed drive has it's bad
> sectors. 3. Stop/pause/??? the rebuild process. And - if possible -
> make a note of the exact sector (chunk) where the rebuild has been
> paused. 4. Assemble (or --create --assume-clean) the array again, but
> this time using the other failed drive, 5. Add the new drive again,
> and continue to rebuild from the point where the last rebuild has
> been paused. Since we are over the point where the failed disk has
> it's bad sectors, the rebuild should finish fine. 6. Finally remove
> the failed disk and replace it with another new drive.
>
> Can this be done using mdadm somehow?
>
> My next question is not really a question but rather a wish. In my
> point of view, the above written situation is by far the biggest
> weekness of not just linux software raid but all other harware raid
> solutions that i know of (don't know many, though). Even nowadays,
> when we use larger and largers disks. So i'm wondering if there is
> any raid or raid-kind solution that - along with redundancy, -
> provides some automatic stipe (chunk) reallocation feature? Something
> like modern hard disks do with their "reallocated sectors", something
> like: the raid driver reserves some chunks/stripes for
> "reallocation", and once an I/O error happens on any of the
> active/working chunks, then instead of kicking the disk off, it marks
> the stripe/chunk bad, and moves the data to one of the reserved ones,
> and continues (along with some warning of course). Only, if writing
> to the reserved chunk fails, would be necessary to immediately kick
> the member off.
>
> The other thing I wonder is why raid solutions (that i know of) use
> the "first remove the failed, then add the new" strategy instead of
> "add the new, I try to recover, then remove the failed" strategy.
> They use the former even when a spare drive is available, because -as
> far as i know - they won't utilize the failed disk for rebuild. Why?
> By using the latter strategy, it would be a joy to recover from
> situations like above.
>
> Thanks for your response.
>
> Best regards, Peter
>

You are not alone in these concerns.  A couple of months ago there was a 
long thread here about a roadmap for md raid.  The first two entries are 
a "bad block log" to allow reading of good blocks from a failing disk, 
and "hot replace" to sync a replacement disk before removing the failing 
one.  Being on a roadmap doesn't mean that these features will make it 
to md raid in the near future - but it does mean that there are already 
rough plans to solve these problems.

<http://neil.brown.name/blog/20110216044002>



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-04-26  7:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-25 17:47 A few questions regarding RAID5/RAID6 recovery Kővári Péter
2011-04-25 19:51 ` Ryan Wagoner
2011-04-26  7:21 ` David Brown [this message]
2011-04-26 16:13   ` Peter Kovari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ip5rnd$g96$1@dough.gmane.org' \
    --to=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).