From: Wols Lists <antlists@youngman.org.uk>
To: Nix <nix@esperi.org.uk>, Chris Murphy <lists@colorremedies.com>
Cc: David Brown <david.brown@hesbynett.no>,
"Ravi (Tom) Hale" <ravi@hale.ee>,
Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: Fault tolerance with badblocks
Date: Tue, 9 May 2017 21:52:39 +0100 [thread overview]
Message-ID: <59122C17.8010801@youngman.org.uk> (raw)
In-Reply-To: <87fugd4wdv.fsf@esperi.org.uk>
On 09/05/17 21:18, Nix wrote:
> (Neil said: "Similarly a RAID6 with inconsistent P and Q could well not
> be able to identify a single block which is "wrong" and even if it could
> there is a small possibility that the identified block isn't wrong, but
> the other blocks are all inconsistent in such a way as to accidentally
> point to it. The probability of this is rather small, but it is
> non-zero". As far as I can tell the probability of this is exactly the
> same as that of multiple read errors in a single stripe -- possibly far
> lower, if you need not only multiple wrong P and Q values but *precisely
> mis-chosen* ones. If that wasn't acceptably rare, you wouldn't be using
> RAID-6 to begin with.
This to me is the crux of the argument.
What is the probability of CORRECTLY identifying a single-disk error?
What is the probability of WRONGLY mistaking a multi-disk error for a
single-disk error?
My gut instinct is that the second scenario is much less likely. So, in
that case, the current setup is that we DELIBERATELY CORRUPT a
recoverable error because of the TINY risk that we might have got it
wrong. Picking probabilities at random, let's say the first probability
is 99 in a hundred, the second is one in a thousand.
On a four-disk raid-6, that means we're throwing away about 500 chances
of recovering the correct data, so that on one occasion we can avoid
corruption. To me that's an insane trade-off.
Neil goes on about "what if a write fails? What if the power goes down?
What if what if?" Those are the wrong questions!!! The correct question
is "can we identify the difference between a single-disk failure and a
multi-disk failure". We don't care what *caused* that failure.
If the power goes down and only the first disk in a stripe is written,
we can correct it back to what it was. If only the last disk failed to
be written, we can correct it back to what it should have been. If at
least two disks are written and at least two disks are not, CAN WE
DETECT THAT? Surely we can - we don't care how many disks are or aren't
written - in that scenario surely all the parities mess up. In which
case we give up and say "corrupt data". Which is no different from at
present other than at present we fix the parity and pretend nothing is
wrong :-(
The problem is that at present we fix the parity and pretend nothing is
wrong when the reality is we *could* have corrected the data, if we
could have been bothered.
So we have to write an mdfsck. Okay. So we have to make sure that no
filesystems on the array are mounted. Okay, that's a bit harder. So we
have to assume that sysadmins are sensible beings who don't screw things
up - okay that's a lot harder :-) But we shouldn't be throwing away LOTS
of data that's easy to recover, because we MIGHT "recover" data that's
wrong.
Yes, yes, I know - code welcome ... :-)
Cheers,
Wol
next prev parent reply other threads:[~2017-05-09 20:52 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05 4:03 ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20 ` Anthony Youngman
2017-05-06 11:21 ` Ravi (Tom) Hale
2017-05-06 13:00 ` Wols Lists
2017-05-08 14:50 ` Nix
2017-05-08 18:00 ` Anthony Youngman
2017-05-09 10:11 ` David Brown
2017-05-09 10:18 ` Nix
2017-05-08 19:02 ` Phil Turmel
2017-05-08 19:52 ` Nix
2017-05-08 20:27 ` Anthony Youngman
2017-05-09 9:53 ` Nix
2017-05-09 11:09 ` David Brown
2017-05-09 11:27 ` Nix
2017-05-09 11:58 ` David Brown
2017-05-09 17:25 ` Chris Murphy
2017-05-09 19:44 ` Wols Lists
2017-05-10 3:53 ` Chris Murphy
2017-05-10 4:49 ` Wols Lists
2017-05-10 17:18 ` Chris Murphy
2017-05-16 3:20 ` NeilBrown
2017-05-10 5:00 ` Dave Stevens
2017-05-10 16:44 ` Edward Kuns
2017-05-10 18:09 ` Chris Murphy
2017-05-09 20:18 ` Nix
2017-05-09 20:52 ` Wols Lists [this message]
2017-05-10 8:41 ` David Brown
2017-05-09 21:06 ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14 ` Nix
2017-05-16 3:27 ` NeilBrown
2017-05-16 9:13 ` Nix
2017-05-16 21:11 ` NeilBrown
2017-05-16 21:46 ` Nix
2017-05-18 0:07 ` Shaohua Li
2017-05-19 4:53 ` NeilBrown
2017-05-19 10:31 ` Nix
2017-05-19 16:48 ` Shaohua Li
2017-06-02 12:28 ` Nix
2017-05-19 4:49 ` NeilBrown
2017-05-19 10:32 ` Nix
2017-05-19 16:55 ` Shaohua Li
2017-05-21 22:00 ` NeilBrown
2017-05-09 19:16 ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01 ` Nix
2017-05-09 20:57 ` Wols Lists
2017-05-09 21:22 ` Nix
2017-05-09 21:23 ` Phil Turmel
2017-05-09 21:32 ` NeilBrown
2017-05-10 19:03 ` Nix
2017-05-09 16:05 ` Chris Murphy
2017-05-09 17:49 ` Wols Lists
2017-05-10 3:06 ` Chris Murphy
2017-05-08 20:56 ` Phil Turmel
2017-05-09 10:28 ` Nix
2017-05-09 10:50 ` Reindl Harald
2017-05-09 11:15 ` Nix
2017-05-09 11:48 ` Reindl Harald
2017-05-09 16:11 ` Nix
2017-05-09 16:46 ` Reindl Harald
2017-05-09 7:37 ` David Brown
2017-05-09 9:58 ` Nix
2017-05-09 10:28 ` Brad Campbell
2017-05-09 10:40 ` Nix
2017-05-09 12:15 ` Tim Small
2017-05-09 15:30 ` Nix
2017-05-05 20:23 ` Peter Grandi
2017-05-05 22:14 ` Nix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59122C17.8010801@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=david.brown@hesbynett.no \
--cc=linux-raid@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=nix@esperi.org.uk \
--cc=ravi@hale.ee \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.