Re: Help raid10 recovery from 2 disks removed

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Dag Nygren <dag@newtech.fi>
To: Mikael Abrahamsson <swmike@swm.pp.se>
Cc: yuji_touya@yokogawa-digital.com, linux-raid@vger.kernel.org
Subject: Re: Help raid10 recovery from 2 disks removed
Date: Fri, 25 Oct 2013 10:27:42 +0300	[thread overview]
Message-ID: <2614300.keHkAfZ48K@eseries.newtech.fi> (raw)
In-Reply-To: <alpine.DEB.2.02.1310241441280.1838@uplift.swm.pp.se>

On Thursday 24 October 2013 14:44:14 Mikael Abrahamsson wrote:
> On Thu, 24 Oct 2013, yuji_touya@yokogawa-digital.com wrote:
> 
> > Here's syslog entries about raid10 and smartctl output.
> > sdb seems to have too many bad blocks. Is that the reason why sdb was kicked out?
> 
> Most likely.
> 
> > I'm going to copy files from /dev/md0 to anywhere else as soon as possible.
> > Should I repair filesystem before copying? (like xfs_repair /dev/md0)
> 
> What you need to do now is to use dd_rescue or equivalent to copy the data 
> off of sdb to a good drive. Stop the array first. This means you'll lose 
> data on the bad blocks. After this is done, and you have assembled the 
> array with the good drive with (most of) the data from sdb, start the 
> array, then hot-add in sdc and let things sync up. You should now have 
> redundancy.

all!

Just had a fight with this myself, also using Seagate drives.
And I don't think he needs to loose any data, nor use ddrescue here.

Just enabling scterc (which is disabled by default and will be
after a power down of the drive), setting the timeout 
and then running a repair on the array
fixed it for me as md was smart enough to try to rewrite the
sector(s) that had failed and with scterc the drive would then reallocate
the failed sector. 
I thought I had this done, but a syntax error in the script had
prevented it from working.. :-( )

The working script I ran for this was:
=============================
# Set up RAID drive timeouts
for x in b c d e
do
        smartctl -l scterc,70,70 /dev/sd$x
        echo 180 >/sys/block/sd$x/device/timeout
done
==============================

After taht run "echo "repair" >/sys/block/md0/md/sync_action"

This should move the 112 count for your "Pending" sectors to "Reallocated_Sector_Ct"
in the smartctl output and fix your array.
After that again you should readd the drive that has been missing almost since
the initialization of the array and keep a close eye on the error counts there.

You should also keep an eye on the Reallocated_Sector_Ct for sdb though.
Your 112 is still below the health limit for Seagate's (200), but it is
fairly high and indicates a "not so good" drive.
If the count goes over 200 Seagate will replace the drive.

If someone with more insight has objections to the procedure above, please
tell me. But this worked for me.

> Also check why you didn't get notification that sdc wasn't part of the 
> array, usually mdmon or equivalent will send email about these events.

Good advice! Set up the smartctl email address!

Best
Dag

next prev parent reply	other threads:[~2013-10-25  7:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-24  5:10 Help raid10 recovery from 2 disks removed yuji_touya
2013-10-24  8:54 ` Mikael Abrahamsson
2013-10-24 10:14   ` yuji_touya
2013-10-24 12:16     ` Phil Turmel
2013-10-25 10:47       ` yuji_touya
2013-10-25 12:07         ` Mikael Abrahamsson
2013-10-25 12:09         ` Phil Turmel
2013-10-24 12:44     ` Mikael Abrahamsson
2013-10-25  7:27       ` Dag Nygren [this message]
2013-10-25  8:24         ` Mikael Abrahamsson
2013-10-25  8:34           ` Dag Nygren
2013-10-25 10:08         ` yuji_touya
2013-10-25 12:21         ` Phil Turmel
2013-10-25 16:05           ` Dag Nygren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2614300.keHkAfZ48K@eseries.newtech.fi \
    --to=dag@newtech.fi \
    --cc=linux-raid@vger.kernel.org \
    --cc=swmike@swm.pp.se \
    --cc=yuji_touya@yokogawa-digital.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox