Re: RAID 5 : recovery after failure

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robin Hill <robin@robinhill.me.uk>
To: Guillaume Betous <guillaume.betous@gmail.com>
Cc: Mikael Abrahamsson <swmike@swm.pp.se>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RAID 5 : recovery after failure
Date: Wed, 9 Oct 2013 10:14:48 +0100	[thread overview]
Message-ID: <20131009091448.GA13760@cthulhu.home.robinhill.me.uk> (raw)
In-Reply-To: <CAPbD+Ret3-2M1ir=0G5YP9VdGBptMAPLzCZRFga0jjv0Y6aEhQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2892 bytes --]

On Wed Oct 09, 2013 at 10:54:09AM +0200, Guillaume Betous wrote:

> I don't know if /dev/sdb is still usable, or if this was only a
> desynchro failure.
> How to know ?
> 
As sdb1 has already been marked spare, it'll need rebuilding anyway, so
it doesn't really matter. If there's a real issue with it then it'll
fail during the recovery process anyway. You can do a full read test on
it (either a long SMART test, a simple dd from it, or a read-only
badblocks test) if you want to check for issues though.

> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md127 : active raid5 sde1[2] sdb1[5](F) sdc1[0](F) sdd1[6] sdf1[4]
>       5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [__UU]
>       [============>........]  recovery = 62.0%
> (1212358192/1953511936) finish=854.7min speed=14451K/sec
> 
It looks like recovery was kicked off onto sde1, but sdc has failed
again during the rebuild. This would suggest a read error on sdc1
somewhere - dmesg should show some indication of what's happened.

You'll need to stop the array and sort out sdc before you can get it
going again. Use GNU ddrescue to image it onto another disk (preferably
one that wasn't originally a member of the array) - it may be able to
get all the data read (it tries somewhat harder than normal processes),
or you'll at least see how much is unreadable.

If it's all read okay then you can just re-run the force assembly using
that disk instead of sdc (make sure you explicitly list the devices to
use in the assembly command). Then add one of the other disks and wait
for the rebuild to complete (there may be no real issue with sdc - you
do sometimes get read errors on disks which are solved by simply
rewriting the data).

If not then you have to make a decision about whether there's few enough
unreadable blocks to continue with assembly (as above) and possibly end
up with some corrupt files, or whether you want to risk re-creating the
array using the other original member (I'd suggest doing a full read
test on that disk first though, as it may be in the same state).

If you're wanting to do a re-create then we'll need to revisit your
original array details to see parameters would be needed (and which
mdadm version you'll need to get the correct data offsets).

Once everything's back up and running, you really need to:
 - make sure the timeouts/ERC are set correctly at every boot
 - schedule array checks on a regular basis to pick up any read errors
   while they can still be corrected

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

     prev parent reply	other threads:[~2013-10-09  9:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-08 19:19 RAID 5 : recovery after failure Guillaume Betous
2013-10-08 21:06 ` Robin Hill
2013-10-09  6:22   ` Guillaume Betous
2013-10-09  6:54     ` Mikael Abrahamsson
2013-10-09  8:20       ` Guillaume Betous
2013-10-09  8:28         ` Mikael Abrahamsson
2013-10-09  8:54           ` Guillaume Betous
2013-10-09  9:14             ` Robin Hill [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131009091448.GA13760@cthulhu.home.robinhill.me.uk \
    --to=robin@robinhill.me.uk \
    --cc=guillaume.betous@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=swmike@swm.pp.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).