Re: mdadm raid5 single drive fail, single drive out of sync terror

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robin Hill <robin@robinhill.me.uk>
To: Jon Robison <narfman0@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm raid5 single drive fail, single drive out of sync terror
Date: Wed, 26 Nov 2014 15:49:22 +0000	[thread overview]
Message-ID: <20141126154922.GA12222@cthulhu.home.robinhill.me.uk> (raw)
In-Reply-To: <5475ECDC.6070309@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

On Wed Nov 26, 2014 at 10:08:12AM -0500, Jon Robison wrote:

> Hi all!
> 
> I upgraded to mdadm-3.3-7.fc20.x86_64, and my raid5 array would no 
> longer recognize /dev/sdb1 in my raid 5 array (which is normally 
> /dev/sd[b-f]1). I `mdadm --detail --scan`,  which resulted in a degraded 
> array, then added /dev/sdb1, and it started rebuilding happily until 25% 
> or so, when another failure seemed to occur.
> 
> I am convinced the data is fine on /dev/sd[c-f]1, and that somehow I 
> just need to inform mdadm about that, but they got out of sync and 
> /dev/sde1 thinks the array is AAAAA while the others think its AAA.. . 
> The drives also seem to think e is bad because f said e was bad or some 
> weird stuff, and sde1 is behind by ~50 events or so. That error hasn't 
> shown itself recently. I fear sdb is bad and sde is going to go soon.
> 
> Results of `mdadm --examine /dev/sd[b-f]1` are here 
> http://dpaste.com/2Z7CPVY
> 
> I'm scared and alone. Everything is off and sitting as above, though e 
> 50 events behind and out of synch. New drives coming Friday and backup 
> is of course a bit old. I'm petrified to execute `mdadm --create 
> --assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/sdf1 /dev/sdd1 
> /dev/sdc1 /dev/sde1 missing`, but that seems my next option unless ya'll 
> know better. I tried `mdadm --assemble -f /dev/md0 /dev/sdf1 /dev/sdd1 
> /dev/sdc1 /dev/sde1` and it said something like can't start with only 3 
> devices (which I wouldn't expect because examine still shows 4, just 
> that they are out of sync and I thought that was -f's express purpose in 
> assemble mode). Anyone have any suggestions? Thanks!

It looks like this is a bug in 3.3 (the checkin logs show something
similar anyway). I'd advise getting 3.3.1 or 3.3.2 and retrying the
forced assembly.

If it failed during the rebuild, that would suggest there's an
unreadable block on sde though, which means you'll hit the same issue
again when you try to rebuild sdb. You'll need to:
    - image sde to a new disk (via ddrescue)
    - assemble the array
    - add another new disk in to rebuild
    - once the rebuild has completed, force a fsck on the array
      (fsck -f /dev/md0) as the unreadable block may have caused some
      filesystem corruption. It may also cause some file corruption, but
      that's not something that can be easily checked.

These read errors can be picked up and fixed by running regular array
checks (echo check > /sys/block/md0/md/sync_action). Most distributions
have these set up in cron, so make sure that's in there and enabled.

The failed disks may actually be okay (sde particularly), so I'd advise
checking SMART stats and running full badblocks write tests on them. If
the badblocks tests run okay and there's no increase in reallocated
sectors reported in SMART, they should be perfectly okay for re-use.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

next prev parent reply	other threads:[~2014-11-26 15:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-26 15:08 mdadm raid5 single drive fail, single drive out of sync terror Jon Robison
2014-11-26 15:47 ` Phil Turmel
2014-11-26 15:49 ` Robin Hill [this message]
2014-11-26 16:13   ` Robison, Jon (CMG-Atlanta)
2014-11-26 16:38     ` Robin Hill
2014-11-28 17:00   ` Robison, Jon (CMG-Atlanta)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141126154922.GA12222@cthulhu.home.robinhill.me.uk \
    --to=robin@robinhill.me.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=narfman0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).