Re: Failed during rebuild (raid5)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin ESTRABAUD <be@mpstor.com>
To: linux-raid@vger.kernel.org
Cc: Andreas Boman <aboman@midgaard.us>
Subject: Re: Failed during rebuild (raid5)
Date: Fri, 03 May 2013 12:38:47 +0100	[thread overview]
Message-ID: <5183A1C7.5000905@mpstor.com> (raw)
In-Reply-To: <51839E4F.7050102@midgaard.us>

On 03/05/13 12:23, Andreas Boman wrote:
> I have (had?) a raid 5 array, with 5 disks (1.5TB/EA), smartd warned 
> one was getting bad, so I replaced it with an identical disk.
> I issued mdadm --manage --add /dev/md127 /dev/sdX
>
> The array seemed to be rebuilding, was at around 15% when I went to bed.
>
> This morning I came up to see the array degraded with two missing 
> drives, another failed during the rebuild.
>
> I powered the system down, and since I have the disk smartd flagged as 
> bad and tried to just plug that in and power up hoping to see the 
> array come back up -no such luck (not enough disks).
>
Unfortunately this happens way too often: Your RAID members silently 
fail over time. They will get some bad blocks, and you won't know about 
it until you try to read or write one of the bad blocks. When that 
happen a disk will get kicked out. At this stage you'll replace the 
disk, not knowing that other areas of the other RAID members have also 
failed. The only sensible option is to run a RAID 6 which dramatically 
reduces the potential for double failure, or to run a RAID 5 but run a 
weekly (at least) check of the entire array for badblocks, carefully 
monitoring the smart reported changes after running the test (trying to 
read the entire array will cause badblocks to be detected and 
reallocated if any).
> I powered the system down again, and now I'm trying to evaluate my 
> best options to recover. Hoping to have some good advice in my inbox 
> when I get back from the office. I'll be able to boot the thing up and 
> get log info this afternoon.
>
I had to recover an array like that twice. The most important is 
probably to mitigate the data loss on the second drive that is failing 
right now by "ddrescueing" all of its data on another drive before it 
gets more damaged (the longer the failing drive is online the less 
chance you have). Use GNU ddrescue for that purpose.

Once you have rescued the failing drive onto a new one, you could then 
try to add that new recovered drive in place of the failing one and 
start the resync as you did before.

Note that it would probably be worthwhile to ddrescue the initial drive 
that you took out (if it is still good enough to do so) in case the 
second drive cannot be recovered correctly or is missing some data.

Regards,
Ben.

> Thanks!
> Andreas
>
> (please cc me, not subscribed)
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2013-05-03 11:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03 11:23 Failed during rebuild (raid5) Andreas Boman
2013-05-03 11:38 ` Benjamin ESTRABAUD [this message]
2013-05-03 12:40   ` Robin Hill
2013-05-03 13:52     ` John Stoffel
2013-05-03 14:51       ` Phil Turmel
2013-05-03 16:23         ` John Stoffel
2013-05-03 16:32           ` Roman Mamedov
2013-05-04 14:48             ` maurice
2013-05-03 16:29       ` Mikael Abrahamsson
2013-05-03 19:29         ` John Stoffel
2013-05-04  4:14           ` Mikael Abrahamsson
2013-05-03 12:26 ` Ole Tange
2013-05-04 11:29   ` Andreas Boman
2013-05-05 14:00   ` Andreas Boman
2013-05-05 17:16     ` Andreas Boman
2013-05-06  1:10       ` Sam Bingner
2013-05-06  3:21       ` Phil Turmel
     [not found]         ` <51878BD0.9010809@midgaard.us>
2013-05-06 12:36           ` Phil Turmel
     [not found]             ` <5188189D.1060806@midgaard.us>
2013-05-07  0:39               ` Phil Turmel
2013-05-07  1:14                 ` Andreas Boman
2013-05-07  1:46                   ` Phil Turmel
2013-05-07  2:08                     ` Andreas Boman
2013-05-07  2:16                       ` Phil Turmel
2013-05-07  2:21                         ` Andreas Boman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5183A1C7.5000905@mpstor.com \
    --to=be@mpstor.com \
    --cc=aboman@midgaard.us \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.