All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Karel Walters <karel.walters@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [Recovery] RAID10 hdd failureS help requested
Date: Tue, 24 Sep 2013 13:09:55 -0400	[thread overview]
Message-ID: <5241C763.6040306@turmel.org> (raw)
In-Reply-To: <CAB4fJqerQy7PJzK4+WSNAh7YCcHmwoAqB5vMrXeSYqzWawAS+A@mail.gmail.com>

Hi Karel,

On 09/24/2013 12:28 PM, Karel Walters wrote:
> Will find a way to do proper scrubbing and alter the timeouts on startup.
>> for x in /sys/block/sd[d-h]/device/timeout ; do echo 180 >$x ; done
> done!

Good.

>> { In the future, buy drives that wake up with ERC enabled (like your WD
>> Reds), or at least capable of enabling ERC (at every powerup). }
> Reds are on the desk next to me and will replace the raid array.

Very Good.  Mind you, the Seagates are good enough drives, they just
aren't suited to raid arrays.  Changing the driver timeouts will get you
by, but when you do encounter an error, the three minute pause will kick
many applications in the teeth.  I have a few Seagates like this kicking
around that I use for offsite backups.

>> Next, you will have to figure out which of the bumped drives belongs in
>> which slot in the array.  An old dmesg (from before the failures) or an
>> archived "mdadm --detail" would tell us that.  This is important,
>> because you *will* need to use --create --assume-clean as the drives are
>> now marked as spare--the info needed for forced assembly is gone.
> 
> This is a problem for me and maybe a harsh lesson, I added an old
> dmesg output at the end but I' m not to sure about it.

Yes, that dmesg did the trick.  The drive that failed first was #3, and
the drive the failed second was #4.  You should create a list of which
drive serial number corresponds to which raid device role, with a third
column showing the current device name.

Then we can construct an "mdadm --create --assume-clean" command that
generates the correct order.  And I would leave the partially synced
spare out entirely.

Then, to deal with the large number of pending events, you'll need to do
a "check" scrub with a very low speed limit.  To keep you from exceeding
the 10/hour read error limit in the MD kernel driver.

{ Or you can scrub at full speed until it kicks drives out, then force
assemble and restart the scrub.  Many times over in your case. }

Phil

  parent reply	other threads:[~2013-09-24 17:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-24 13:12 [Recovery] RAID10 hdd failureS help requested Karel Walters
2013-09-24 14:23 ` Phil Turmel
     [not found]   ` <CAB4fJqezb0sWcUUgRPd4BXoWr3hNBp725gv8xnMOPmcqU8RiRw@mail.gmail.com>
2013-09-24 15:50     ` Phil Turmel
     [not found]       ` <CAB4fJqerQy7PJzK4+WSNAh7YCcHmwoAqB5vMrXeSYqzWawAS+A@mail.gmail.com>
2013-09-24 17:09         ` Phil Turmel [this message]
2013-09-24 18:18           ` Karel Walters
2013-09-24 19:05             ` Phil Turmel
2013-09-24 19:14               ` Karel Walters
2013-09-24 21:19                 ` Phil Turmel
2013-09-25 12:55                   ` Karel Walters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5241C763.6040306@turmel.org \
    --to=philip@turmel.org \
    --cc=karel.walters@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.