From: Benjamin ESTRABAUD <be@mpstor.com>
To: linux-raid@vger.kernel.org
Cc: Andreas Boman <aboman@midgaard.us>
Subject: Re: Failed during rebuild (raid5)
Date: Fri, 03 May 2013 12:38:47 +0100 [thread overview]
Message-ID: <5183A1C7.5000905@mpstor.com> (raw)
In-Reply-To: <51839E4F.7050102@midgaard.us>
On 03/05/13 12:23, Andreas Boman wrote:
> I have (had?) a raid 5 array, with 5 disks (1.5TB/EA), smartd warned
> one was getting bad, so I replaced it with an identical disk.
> I issued mdadm --manage --add /dev/md127 /dev/sdX
>
> The array seemed to be rebuilding, was at around 15% when I went to bed.
>
> This morning I came up to see the array degraded with two missing
> drives, another failed during the rebuild.
>
> I powered the system down, and since I have the disk smartd flagged as
> bad and tried to just plug that in and power up hoping to see the
> array come back up -no such luck (not enough disks).
>
Unfortunately this happens way too often: Your RAID members silently
fail over time. They will get some bad blocks, and you won't know about
it until you try to read or write one of the bad blocks. When that
happen a disk will get kicked out. At this stage you'll replace the
disk, not knowing that other areas of the other RAID members have also
failed. The only sensible option is to run a RAID 6 which dramatically
reduces the potential for double failure, or to run a RAID 5 but run a
weekly (at least) check of the entire array for badblocks, carefully
monitoring the smart reported changes after running the test (trying to
read the entire array will cause badblocks to be detected and
reallocated if any).
> I powered the system down again, and now I'm trying to evaluate my
> best options to recover. Hoping to have some good advice in my inbox
> when I get back from the office. I'll be able to boot the thing up and
> get log info this afternoon.
>
I had to recover an array like that twice. The most important is
probably to mitigate the data loss on the second drive that is failing
right now by "ddrescueing" all of its data on another drive before it
gets more damaged (the longer the failing drive is online the less
chance you have). Use GNU ddrescue for that purpose.
Once you have rescued the failing drive onto a new one, you could then
try to add that new recovered drive in place of the failing one and
start the resync as you did before.
Note that it would probably be worthwhile to ddrescue the initial drive
that you took out (if it is still good enough to do so) in case the
second drive cannot be recovered correctly or is missing some data.
Regards,
Ben.
> Thanks!
> Andreas
>
> (please cc me, not subscribed)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-05-03 11:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-03 11:23 Failed during rebuild (raid5) Andreas Boman
2013-05-03 11:38 ` Benjamin ESTRABAUD [this message]
2013-05-03 12:40 ` Robin Hill
2013-05-03 13:52 ` John Stoffel
2013-05-03 14:51 ` Phil Turmel
2013-05-03 16:23 ` John Stoffel
2013-05-03 16:32 ` Roman Mamedov
2013-05-04 14:48 ` maurice
2013-05-03 16:29 ` Mikael Abrahamsson
2013-05-03 19:29 ` John Stoffel
2013-05-04 4:14 ` Mikael Abrahamsson
2013-05-03 12:26 ` Ole Tange
2013-05-04 11:29 ` Andreas Boman
2013-05-05 14:00 ` Andreas Boman
2013-05-05 17:16 ` Andreas Boman
2013-05-06 1:10 ` Sam Bingner
2013-05-06 3:21 ` Phil Turmel
[not found] ` <51878BD0.9010809@midgaard.us>
2013-05-06 12:36 ` Phil Turmel
[not found] ` <5188189D.1060806@midgaard.us>
2013-05-07 0:39 ` Phil Turmel
2013-05-07 1:14 ` Andreas Boman
2013-05-07 1:46 ` Phil Turmel
2013-05-07 2:08 ` Andreas Boman
2013-05-07 2:16 ` Phil Turmel
2013-05-07 2:21 ` Andreas Boman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5183A1C7.5000905@mpstor.com \
--to=be@mpstor.com \
--cc=aboman@midgaard.us \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.