From: Phil Turmel <philip@turmel.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: "linux-raid@vger.kernel.org List" <linux-raid@vger.kernel.org>
Subject: Re: Questions about bitrot and RAID 5/6
Date: Fri, 24 Jan 2014 12:03:43 -0500 [thread overview]
Message-ID: <52E29CEF.7030408@turmel.org> (raw)
In-Reply-To: <0466C42D-F51E-41D1-B220-F38EB43C1A38@colorremedies.com>
On 01/24/2014 11:11 AM, Chris Murphy wrote:
>
> On Jan 24, 2014, at 6:22 AM, Phil Turmel <philip@turmel.org> wrote:
>>
>> No, they aren't improbable. That's my point. For consumer drives, you
>> can expect a new URE every 12T or so read, on average.
>
> - Define URE.
Unrecoverable Read Error. Also known as a non-recoverable read error or
an uncorrectable read.
> Western Digital, HGST, and Seagate don't use the term URE/unrecoverable read error. They use, respectively:
>
> non-recoverable read error per bits read
> error rate, non-recoverable, per bits read
> nonrecoverable Read Errors per Bits Read, Max
>
> These are all identical terms?
These are statements about *rates* of UREs. But yes, identical.
> - How does the URE manifest? That is, does the drive always report a read error such as this?
>
> ata3.00: cmd c8/00:08:55:e8:8d/00:00:00:00:00/e2 tag 0 dma 4096 in
> es 51/40:00:56:e8:8d/00:00:00:00:00/02 Emask 0x9 (media error)
> ata3.00: status: { DRDY ERR }
> ata3.00: error: { UNC }
Yes. I'm not sure if { DRDY ERR } is always present.
> Or does URE include silent data corruption, and disk failure?
No, and no.
> - How many bits of loss occur with one URE?
Complete physical sector. The error correction codes on the market
operate on entire physical sectors. Once the correcting capacity of the
code is exceeded, the math involved can no longer identify which bits in
the sector were corrupted, so the whole sector must be declared unknown.
Google "Reed-Solomon" for an introduction to such codes.
>> Your comments suggest you've completely discounted the fact that
>> published URE rates are now close to, or within, drive capacities.
>>
>> Spend some time with the math and you will be very concerned.
>
> Yeah I tried that a year ago and when it came to really super basic questions, no one was willing to answer them and the thread died as if we don't actually know what we're talking about. So I think some rather basic definitions are in order and an agreement that we don't get to redefine mathematics by saying a max error rate is a mean.
>
> http://www.spinics.net/lists/raid/msg41669.html
I participated in that thread. Some of your comments there imply that
the math is simple. It's not (unless you are whiz with statistics).
Look at the Poisson distribution I referenced and the computation
examples I gave.
Note that a statement about the rate of a randomly occurring error is
implicitly stating an average. The specification sheets state that the
rate (an average) will not exceed (max) a certain value within the
warranteed life of the drive. Two UREs occurring much less than 10^14
bits apart don't violate the spec. A long series of UREs averaging out
to less than 10^14 bits apart would be a violation.
Note that the rate does change over time. A brand new drive in good
condition can have a rate much less than the per 10^14 bits spec. But a
drive that is approaching or past its warranty life can be expected to
be close to it. (Or the manufacturers would claim that better
performance due to marketing pressure.)
Regards,
Phil
next prev parent reply other threads:[~2014-01-24 17:03 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-20 20:34 Questions about bitrot and RAID 5/6 Mason Loring Bliss
2014-01-20 21:46 ` NeilBrown
2014-01-20 22:55 ` Peter Grandi
2014-01-21 9:18 ` David Brown
2014-01-21 17:19 ` Mason Loring Bliss
2014-01-22 10:40 ` David Brown
2014-01-23 0:48 ` Chris Murphy
2014-01-23 8:18 ` David Brown
2014-01-23 17:28 ` Chris Murphy
2014-01-23 18:53 ` Phil Turmel
2014-01-23 21:38 ` Chris Murphy
2014-01-24 13:22 ` Phil Turmel
2014-01-24 16:11 ` Chris Murphy
2014-01-24 17:03 ` Phil Turmel [this message]
2014-01-24 17:59 ` Chris Murphy
2014-01-24 18:12 ` Phil Turmel
2014-01-24 19:32 ` Chris Murphy
2014-01-24 19:57 ` Phil Turmel
2014-01-24 20:54 ` Chris Murphy
2014-01-25 10:23 ` Dag Nygren
2014-01-25 15:48 ` Phil Turmel
2014-01-25 17:44 ` Stan Hoeppner
2014-01-27 3:34 ` Chris Murphy
2014-01-27 7:16 ` Mikael Abrahamsson
2014-01-27 18:20 ` Chris Murphy
2014-01-30 10:22 ` Mikael Abrahamsson
2014-01-30 20:59 ` Chris Murphy
2014-01-27 3:20 ` Chris Murphy
2014-01-25 17:56 ` Wilson Jonathan
2014-01-27 4:07 ` Chris Murphy
2014-01-23 22:06 ` David Brown
2014-01-23 22:02 ` David Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52E29CEF.7030408@turmel.org \
--to=philip@turmel.org \
--cc=linux-raid@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).