All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Klauer <Andreas.Klauer@metamorpher.de>
To: Alexander Shenkin <al@shenkin.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: recovering failed raid5
Date: Fri, 28 Oct 2016 15:33:04 +0200	[thread overview]
Message-ID: <20161028133304.GA11564@metamorpher.de> (raw)
In-Reply-To: <715b259f-1e56-9606-edc4-3e5c4d57744b@shenkin.org>

On Fri, Oct 28, 2016 at 01:22:31PM +0100, Alexander Shenkin wrote:
> One remaining question: is sdc definitely toast?

In my opinion a drive is toast starting from the very first reallocated/ 
pending/uncorrectable sector, your drive has several of those and that's 
only the ones the drive already knows about - there may be more.

> Or, is it possible that the Timeout Mismatch (as mentioned by Robin Hill; 
> thanks Robin) is flagging the drive as failed, when something else is at 
> play and perhaps the drive is actually fine?

I don't believe in timeout mismatches, either. The timeouts are generous. 
Waiting for a disk to wake from standby is not a problem, and that takes 
ages already. If a disk gets stuck even longer in error correction limbo 
and it gets kicked because of it - IMHO that's the right call.

A disk that is unable to read its data, a disk that refuses to write data, 
a disk that needs help from the RAID layer to correct its errors, 
should be kicked because it's not able to pull its own weight.

You need drives that work without errors, without outside help, because 
during a rebuild, when the RAID is already degraded, there won't be any 
outside help. Either the disks work or your RAID is dead.

RAID redundancy is supposed to allow disks be replaced. (mdadm --replace)
If you use it instead to keep fixing errors on other disks, there is not 
any real redundancy left. In a RAID, if one of your disks has errors, 
you get rid of it as soon as possible.

Your RAID did not fail because of timeouts or not. It's not important. 
It failed because you didn't notice broken disks in time and you had two. 
Testing, monitoring, actually acting on the first error, is important. 

People have different opinions on this. Someone might argue.
It's up to you what risks to take.

Regards
Andreas Klauer

  reply	other threads:[~2016-10-28 13:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27 15:06 recovering failed raid5 Alexander Shenkin
2016-10-27 16:04 ` Andreas Klauer
2016-10-28 12:22   ` Alexander Shenkin
2016-10-28 13:33     ` Andreas Klauer [this message]
2016-10-28 21:16       ` Phil Turmel
2016-10-28 23:45         ` Andreas Klauer
2016-10-29  2:52           ` Edward Kuns
2016-10-29  2:53           ` Phil Turmel
2016-10-29  8:46           ` Mikael Abrahamsson
2016-10-29 10:29       ` Roman Mamedov
2016-10-29 12:02         ` Andreas Klauer
2016-10-30 16:18           ` Phil Turmel
2016-10-28 13:36     ` Robin Hill
2016-10-31 10:44       ` Alexander Shenkin
2016-10-31 11:09         ` Andreas Klauer
2016-10-31 15:19         ` Robin Hill
2016-10-31 16:26         ` Wols Lists
2016-10-31 16:28       ` Wols Lists
2016-11-16  9:04       ` Alexander Shenkin
2016-11-16 11:14         ` Andreas Klauer
2016-11-16 13:27           ` Alexander Shenkin
2016-11-16 13:59             ` Andreas Klauer
2016-11-16 15:35         ` Wols Lists
2016-11-16 15:50           ` Alexander Shenkin
2016-11-16 16:38             ` Wols Lists
2017-01-05 12:08               ` Alexander Shenkin
2016-10-31 16:31     ` Wols Lists
2016-10-27 16:26 ` Roman Mamedov
2016-10-27 20:34 ` Robin Hill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161028133304.GA11564@metamorpher.de \
    --to=andreas.klauer@metamorpher.de \
    --cc=al@shenkin.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.