* A few questions regarding RAID5/RAID6 recovery
@ 2011-04-25 17:47 Kővári Péter
2011-04-25 19:51 ` Ryan Wagoner
2011-04-26 7:21 ` David Brown
0 siblings, 2 replies; 4+ messages in thread
From: Kővári Péter @ 2011-04-25 17:47 UTC (permalink / raw)
To: linux-raid
Hi all,
Since this is my first post here, let me first thank all developers for their great tool. It really is a wonderfull piece of soft. ;)
I heard a lot of horror stories about the event, when a member of a raid5/6 array gets kicked off due to I/O errors, and then, after the replacement and during the recostruction, another drive fails, and the array become unusable. (For raid6, add another drive to the story, and the problem is the same, so let’s just talk about raid5 now). I want to prepare myself for this kind of unlucky event, and build up a strategy that I can follow once it happens. (I hope never, but...)
Let’s assume we have a 4 drives RAID5, that has been degraded, the failed drive has been replaced, then the rebuild process failed, and now we have an array with 2 good disks, one failed disk and one which is partially synchronized (the new one). And, we also have the disk out of the array, which was originally failed. If I assume, that both of the failed disks have some bad sectors but otherwise both are in an operative condition (can be dd-ed for example), then, except the unlikely event, when both disks have failed on the very same physical sector (chunk?), then theoretically the data is there and could be retrieved. So my question is, can we retrieve them by using mdadm and some „tricks”? I think of something like this:
1. I assemble (or --create --assume-clean) the array in degraded mode using the 2 good drives, and one of the 2 failed drives which has it's bad sectors behind the point than the other failed drive.
2. Add the new drive, let the array start rebuilding, and wait for the process go beyond the point where the other failed drive has it's bad sectors.
3. Stop/pause/??? the rebuild process. And - if possible - make a note of the exact sector (chunk) where the rebuild has been paused.
4. Assemble (or --create --assume-clean) the array again, but this time using the other failed drive,
5. Add the new drive again, and continue to rebuild from the point where the last rebuild has been paused. Since we are over the point where the failed disk has it's bad sectors, the rebuild should finish fine.
6. Finally remove the failed disk and replace it with another new drive.
Can this be done using mdadm somehow?
My next question is not really a question but rather a wish. In my point of view, the above written situation is by far the biggest weekness of not just linux software raid but all other harware raid solutions that i know of (don't know many, though). Even nowadays, when we use larger and largers disks. So i'm wondering if there is any raid or raid-kind solution that - along with redundancy, - provides some automatic stipe (chunk) reallocation feature? Something like modern hard disks do with their "reallocated sectors", something like: the raid driver reserves some chunks/stripes for "reallocation", and once an I/O error happens on any of the active/working chunks, then instead of kicking the disk off, it marks the stripe/chunk bad, and moves the data to one of the reserved ones, and continues (along with some warning of course). Only, if writing to the reserved chunk fails, would be necessary to immediately kick the member off.
The other thing I wonder is why raid solutions (that i know of) use the "first remove the failed, then add the new" strategy instead of "add the new, I try to recover, then remove the failed" strategy. They use the former even when a spare drive is available, because -as far as i know - they won't utilize the failed disk for rebuild. Why? By using the latter strategy, it would be a joy to recover from situations like above.
Thanks for your response.
Best regards,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: A few questions regarding RAID5/RAID6 recovery
2011-04-25 17:47 A few questions regarding RAID5/RAID6 recovery Kővári Péter
@ 2011-04-25 19:51 ` Ryan Wagoner
2011-04-26 7:21 ` David Brown
1 sibling, 0 replies; 4+ messages in thread
From: Ryan Wagoner @ 2011-04-25 19:51 UTC (permalink / raw)
To: linux-raid
2011/4/25 Kővári Péter <peter@kovari.priv.hu>:
> Hi all,
>
> Since this is my first post here, let me first thank all developers for their great tool. It really is a wonderfull piece of soft. ;)
>
> I heard a lot of horror stories about the event, when a member of a raid5/6 array gets kicked off due to I/O errors, and then, after the replacement and during the recostruction, another drive fails, and the array become unusable. (For raid6, add another drive to the story, and the problem is the same, so let’s just talk about raid5 now). I want to prepare myself for this kind of unlucky event, and build up a strategy that I can follow once it happens. (I hope never, but...)
From what I understand If you run weekly raid scrubs you will limit
the possibility of this happening. CentOS / RedHat already have this
scheduled. If not you can add a cron job to call check or repair. Make
sure you replace DEV with the device.
echo check > /sys/block/DEV/md/sync_action
I have had 3 x 1TB drives in RAID 5 for the past 2.5 years. I have not
had a drive kicked out or an error found. If an error is found, since
it is caught early, I should have a good probability of replacing the
failed drive without incurring another error.
Ryan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: A few questions regarding RAID5/RAID6 recovery
2011-04-25 17:47 A few questions regarding RAID5/RAID6 recovery Kővári Péter
2011-04-25 19:51 ` Ryan Wagoner
@ 2011-04-26 7:21 ` David Brown
2011-04-26 16:13 ` Peter Kovari
1 sibling, 1 reply; 4+ messages in thread
From: David Brown @ 2011-04-26 7:21 UTC (permalink / raw)
To: linux-raid
On 25/04/2011 19:47, Kővári Péter wrote:
> Hi all,
>
> Since this is my first post here, let me first thank all developers
> for their great tool. It really is a wonderfull piece of soft. ;)
>
> I heard a lot of horror stories about the event, when a member of a
> raid5/6 array gets kicked off due to I/O errors, and then, after the
> replacement and during the recostruction, another drive fails, and
> the array become unusable. (For raid6, add another drive to the
> story, and the problem is the same, so let’s just talk about raid5
> now). I want to prepare myself for this kind of unlucky event, and
> build up a strategy that I can follow once it happens. (I hope never,
> but...)
>
> Let’s assume we have a 4 drives RAID5, that has been degraded, the
> failed drive has been replaced, then the rebuild process failed, and
> now we have an array with 2 good disks, one failed disk and one which
> is partially synchronized (the new one). And, we also have the disk
> out of the array, which was originally failed. If I assume, that both
> of the failed disks have some bad sectors but otherwise both are in
> an operative condition (can be dd-ed for example), then, except the
> unlikely event, when both disks have failed on the very same physical
> sector (chunk?), then theoretically the data is there and could be
> retrieved. So my question is, can we retrieve them by using mdadm and
> some „tricks”? I think of something like this:
>
> 1. I assemble (or --create --assume-clean) the array in degraded mode
> using the 2 good drives, and one of the 2 failed drives which has
> it's bad sectors behind the point than the other failed drive. 2. Add
> the new drive, let the array start rebuilding, and wait for the
> process go beyond the point where the other failed drive has it's bad
> sectors. 3. Stop/pause/??? the rebuild process. And - if possible -
> make a note of the exact sector (chunk) where the rebuild has been
> paused. 4. Assemble (or --create --assume-clean) the array again, but
> this time using the other failed drive, 5. Add the new drive again,
> and continue to rebuild from the point where the last rebuild has
> been paused. Since we are over the point where the failed disk has
> it's bad sectors, the rebuild should finish fine. 6. Finally remove
> the failed disk and replace it with another new drive.
>
> Can this be done using mdadm somehow?
>
> My next question is not really a question but rather a wish. In my
> point of view, the above written situation is by far the biggest
> weekness of not just linux software raid but all other harware raid
> solutions that i know of (don't know many, though). Even nowadays,
> when we use larger and largers disks. So i'm wondering if there is
> any raid or raid-kind solution that - along with redundancy, -
> provides some automatic stipe (chunk) reallocation feature? Something
> like modern hard disks do with their "reallocated sectors", something
> like: the raid driver reserves some chunks/stripes for
> "reallocation", and once an I/O error happens on any of the
> active/working chunks, then instead of kicking the disk off, it marks
> the stripe/chunk bad, and moves the data to one of the reserved ones,
> and continues (along with some warning of course). Only, if writing
> to the reserved chunk fails, would be necessary to immediately kick
> the member off.
>
> The other thing I wonder is why raid solutions (that i know of) use
> the "first remove the failed, then add the new" strategy instead of
> "add the new, I try to recover, then remove the failed" strategy.
> They use the former even when a spare drive is available, because -as
> far as i know - they won't utilize the failed disk for rebuild. Why?
> By using the latter strategy, it would be a joy to recover from
> situations like above.
>
> Thanks for your response.
>
> Best regards, Peter
>
You are not alone in these concerns. A couple of months ago there was a
long thread here about a roadmap for md raid. The first two entries are
a "bad block log" to allow reading of good blocks from a failing disk,
and "hot replace" to sync a replacement disk before removing the failing
one. Being on a roadmap doesn't mean that these features will make it
to md raid in the near future - but it does mean that there are already
rough plans to solve these problems.
<http://neil.brown.name/blog/20110216044002>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: A few questions regarding RAID5/RAID6 recovery
2011-04-26 7:21 ` David Brown
@ 2011-04-26 16:13 ` Peter Kovari
0 siblings, 0 replies; 4+ messages in thread
From: Peter Kovari @ 2011-04-26 16:13 UTC (permalink / raw)
To: linux-raid
-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Brown
Sent: Tuesday, April 26, 2011 9:22 AM
To: linux-raid@vger.kernel.org
Subject: Re: A few questions regarding RAID5/RAID6 recovery
> You are not alone in these concerns. A couple of months ago there was a
> long thread here about a roadmap for md raid. The first two entries are
> a "bad block log" to allow reading of good blocks from a failing disk,
> and "hot replace" to sync a replacement disk before removing the failing
> one. Being on a roadmap doesn't mean that these features will make it
> to md raid in the near future - but it does mean that there are already
> rough plans to solve these problems.
> <http://neil.brown.name/blog/20110216044002>
Thank you David, this explains a lot. I hope we'll see this some day implemented.
Can you comment on my first question too please? Basically i'm just curious to know if there is a way to stop and restart the rebuilding process (and change/re-create the array in between them).
Btw, i read somewhere, that "--stop", then "--create --assume-clean" on an array works only on v0.9 superblocks, because v1.x overwrites existing data during create. It doesn't make sense for me - but i'm not sure -, so is this true? If not, then is it enough to use the same suberblock version for "--create" to make this work without data loss?
Thanks,
Peter
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-04-26 16:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-25 17:47 A few questions regarding RAID5/RAID6 recovery Kővári Péter
2011-04-25 19:51 ` Ryan Wagoner
2011-04-26 7:21 ` David Brown
2011-04-26 16:13 ` Peter Kovari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).