Re: need a little help rebuilding a raid 10

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Greg Freemyer <greg.freemyer@gmail.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: need a little help rebuilding a raid 10
Date: Tue, 06 Dec 2011 09:52:11 -0500	[thread overview]
Message-ID: <4EDE2C1B.8000008@turmel.org> (raw)
In-Reply-To: <CAGpXXZKWB7qpRtWK7GuohuF-OOuAGQ_tSODCnLCouS+Z-SWZDA@mail.gmail.com>

Hi Greg,

On 12/06/2011 09:11 AM, Greg Freemyer wrote:
> Hmm...
> 
> My rebuild failed.  At first glance I had both a failed drive and a failed slot?
> 
> What I don't understand is I have I/O errors in /var/log/messages from
> when the rebuild failed over night.

Something in your system is untrustworthy.

> But this morning, hdparm --read-sector is reading the "bad" sectors fine.

What does smartctl say about your drives (all of them)?

> I already tried replacing the drive and the replacement drive also
> reported media errors during the rebuild, that's why I came to believe
> I had a bad slot.
> 
> Now I have non-repeatable media errors.
> 
> fyi: I have the problem drive connected via eSata now, so it's a
> different controller totally than where it was when the failure first
> occurred.

Are the errors in /var/log/messages only from that drive?  If so, then that
drive is probably toast.

> Any thoughts?

Your prior e-mail said that you re-created the array.  I didn't see that you
had definitively nailed down the problem at that point, so it probably wasn't
a good idea.  In particular, it destroys all prior metadata on the array
members.  If you didn't keep the output of "mdadm -E" for each drive, that
information is now lost.

In general, "--create" is a last resort, and only to be used for recovery
when you have absolute confidence you understand the layout (mdadm -E
printouts of the original array).  "--assemble --force" is the proper step
after "--assemble" fails.

I would completely scrub the questionable drive with random data, run a long
smartctl test on it, and replace it if it reports any re-allocated sectors at
that point.

I would also run long smartctl tests on the other drives, looking for pending
sectors or re-allocated sectors.  If any, I would plan on replacements for
them as well, and would try to validate the content of your files.  You do
have a backup to compare against, after all.

If you are running a Debian-based distro, and the array contains your rootfs,
you might find "debsums" useful.

HTH,

Phil

next prev parent reply	other threads:[~2011-12-06 14:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-06  2:05 need a little help rebuilding a raid 10 Greg Freemyer
2011-12-06 14:11 ` Greg Freemyer
2011-12-06 14:39   ` Robin Hill
2011-12-06 14:52   ` Phil Turmel [this message]
2011-12-07  1:35     ` Greg Freemyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EDE2C1B.8000008@turmel.org \
    --to=philip@turmel.org \
    --cc=greg.freemyer@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.