* On the subject of RAID-6 corruption recovery
@ 2007-12-28 2:58 H. Peter Anvin
2007-12-28 14:38 ` Bill Davidsen
2008-01-04 23:59 ` Thiemo Nagel
0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2007-12-28 2:58 UTC (permalink / raw)
To: Linux RAID Mailing List
I got a private email a while ago from Thiemo Nagel claiming that some
of the conclusions in my RAID-6 paper was incorrect. This was combined
with a "proof" which was plain wrong, and could easily be disproven
using basic enthropy accounting (i.e. how much information is around to
play with.)
However, it did cause me to clarify the text portion a little bit. In
particular, *in practice* in may be possible to *probabilistically*
detect multidisk corruption. Probabilistic detection means that the
detection is not guaranteed, but it can be taken advantage of
opportunistically.
In particular, if you follow the algorithm of section 4 of my paper, you
end up with a corrupt disk number, but the result is a vector, not a
scalar. This is because the algorithm is executed on the P* and Q*
error vectors on a byte by byte basis.
In the common case of a single disk corruption, what you will typically
see is an error pattern that has a consistent value interrupted by
correct bytes (P* = Q* = {00}); this is due to bytes which still had the
random value by chance. For the z values which can be computed (recall,
z is only well-defined if P* and Q* are != {00}), they should match.
There are two patterns which are likely to indicate multi-disk
corruption and where recovery software should trip out and raise hell:
* z >= n: the computed error disk doesn't exist.
Obviously, if "the corrupt disk" is a disk that can't exist, we
have a bigger problem.
This is probabilistic, since as n approaches 255, the
probability of detection goes to zero.
* Inconsistent z numbers (or spurious P and Q references)
If the calculation for which disk is corrupt jumps around
within a single sector, there is likely a problem.
It's worth noting in all of this that there is 258 possible outcomes of
the complete error analysis algorithm - 255 possible D errors (z
values), P error, Q error, and no error. If these are to be analyzed as
an array, it can't be solely a byte array.
That this set is complete is shown by the fact that out of 65536
possible (P, Q) states, this corresponds to:
1 state no error
255 states P error (the 256th state is a no-error state!)
255 states Q error
255*255 states D error (n = 255 is maximum for byte-oriented RAID-6)
... for a total of 65536 states.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2007-12-28 2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
@ 2007-12-28 14:38 ` Bill Davidsen
2007-12-28 17:34 ` H. Peter Anvin
2008-01-04 23:59 ` Thiemo Nagel
1 sibling, 1 reply; 13+ messages in thread
From: Bill Davidsen @ 2007-12-28 14:38 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux RAID Mailing List
H. Peter Anvin wrote:
> I got a private email a while ago from Thiemo Nagel claiming that some
> of the conclusions in my RAID-6 paper was incorrect. This was
> combined with a "proof" which was plain wrong, and could easily be
> disproven using basic enthropy accounting (i.e. how much information
> is around to play with.)
>
> However, it did cause me to clarify the text portion a little bit. In
> particular, *in practice* in may be possible to *probabilistically*
> detect multidisk corruption. Probabilistic detection means that the
> detection is not guaranteed, but it can be taken advantage of
> opportunistically.
If this means that there can be no false positives for multidisk
corruption but may be false negatives, fine. If it means something else,
please restate one more time.
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2007-12-28 14:38 ` Bill Davidsen
@ 2007-12-28 17:34 ` H. Peter Anvin
0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2007-12-28 17:34 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Linux RAID Mailing List
Bill Davidsen wrote:
> H. Peter Anvin wrote:
>> I got a private email a while ago from Thiemo Nagel claiming that some
>> of the conclusions in my RAID-6 paper was incorrect. This was
>> combined with a "proof" which was plain wrong, and could easily be
>> disproven using basic enthropy accounting (i.e. how much information
>> is around to play with.)
>>
>> However, it did cause me to clarify the text portion a little bit. In
>> particular, *in practice* in may be possible to *probabilistically*
>> detect multidisk corruption. Probabilistic detection means that the
>> detection is not guaranteed, but it can be taken advantage of
>> opportunistically.
>
> If this means that there can be no false positives for multidisk
> corruption but may be false negatives, fine. If it means something else,
> please restate one more time.
>
Pretty much. False negatives are quite serious, since they will imply a
course of action which will introduce further corruption.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2007-12-28 2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
2007-12-28 14:38 ` Bill Davidsen
@ 2008-01-04 23:59 ` Thiemo Nagel
2008-01-05 0:03 ` H. Peter Anvin
1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-04 23:59 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux RAID Mailing List
Dear hpa,
H. Peter Anvin wrote:
> I got a private email a while ago from Thiemo Nagel claiming that
> some of the conclusions in my RAID-6 paper was incorrect. This was
> combined with a "proof" which was plain wrong, and could easily be
> disproven using basic enthropy accounting (i.e. how much information
> is around to play with.)
>
> However, it did cause me to clarify the text portion a little bit. In
> particular, *in practice* in may be possible to *probabilistically*
> detect multidisk corruption. Probabilistic detection means that the
> detection is not guaranteed, but it can be taken advantage of
> opportunistically.
Thank you very much for setting me straight concerning some of my
misconceptions about raid 6. Yet, the point that I was trying to make
was that the statement "multidisc corruption cannot be detected" -- while
correct in a mathematical sense -- is misleading when considering
practical application, and I feel confirmed in that by your reply.
> There are two patterns which are likely to indicate multi-disk
> corruption and where recovery software should trip out and raise
> hell:
>
> * z >= n: the computed error disk doesn't exist.
>
> Obviously, if "the corrupt disk" is a disk that can't exist, we have
> a bigger problem.
>
> This is probabilistic, since as n approaches 255, the probability of
> detection goes to zero.
>
> * Inconsistent z numbers (or spurious P and Q references)
>
> If the calculation for which disk is corrupt jumps around within a
> single sector, there is likely a problem.
Inverting your argumentation, that means when we don't see z >= n or
inconsistent z numbers, multidisc corruption can be excluded statistically.
For errors occurring on the level of hard disk blocks (signature: most
bytes of the block have D errors, all with same z), the probability for
multidisc corruption to go undetected is ((n-1)/256)**512. This might
pose a problem in the limiting case of n=255, however for practical
applications, this probability is negligible as it drops off
exponentially with decreasing n:
n=255 p=1.8%
n=250 p=6.8e-7
n=240 p=5.3e-16
n=10 p=3.6e-745
So it seems to me that for that case, implementing recovery would be
safe (maybe limit it to n<240).
For errors occurring on the byte level (signature: only one byte of a
sector has D error, all other bytes have no error), multidisc corruption
is highly unlikely due to a different argumentation: Since 511 out of
512 bytes are ok, it can be concluded, that for errors in this specific
sector, there is no correlation between the individual disks. That
means, that the probability for double corruption is approximately
8*(n-1)*BER, and the bit error rate (BER) should be low. (For
comparison: Some vendors specify 1e-15 as probability of unrecoverable
read error (per bit that is read). I'd assume that the probability of
silent read errors is much lower, at least for the disk itself; however
additional errors might be introduced in (S)ATA transfer or in the
controller.)
For that case, too, it seems to me that implementing recovery could do
no harm.
Kind regards,
Thiemo Nagel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-04 23:59 ` Thiemo Nagel
@ 2008-01-05 0:03 ` H. Peter Anvin
2008-01-05 0:41 ` Thiemo Nagel
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05 0:03 UTC (permalink / raw)
To: Thiemo Nagel; +Cc: Linux RAID Mailing List
Thiemo Nagel wrote:
>
> Inverting your argumentation, that means when we don't see z >= n or
> inconsistent z numbers, multidisc corruption can be excluded statistically.
>
> For errors occurring on the level of hard disk blocks (signature: most
> bytes of the block have D errors, all with same z), the probability for
> multidisc corruption to go undetected is ((n-1)/256)**512. This might
> pose a problem in the limiting case of n=255, however for practical
> applications, this probability is negligible as it drops off
> exponentially with decreasing n:
>
That assumes fully random data distribution, which is almost certainly a
false assumption.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-05 0:03 ` H. Peter Anvin
@ 2008-01-05 0:41 ` Thiemo Nagel
2008-01-05 0:45 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-05 0:41 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux RAID Mailing List
>> For errors occurring on the level of hard disk blocks (signature: most
>> bytes of the block have D errors, all with same z), the probability for
>> multidisc corruption to go undetected is ((n-1)/256)**512. This might
>> pose a problem in the limiting case of n=255, however for practical
>> applications, this probability is negligible as it drops off
>> exponentially with decreasing n:
>>
>
> That assumes fully random data distribution, which is almost certainly a
> false assumption.
Agreed. This means, that the formula only serves to specify a lower limit
to the probability. However, is there an argumentation, why a pathologic
case would be probable, i.e. why the probability would be likely to
*vastly* deviate from the theoretical limit? And if there is, would that
argumentation not apply to other raid 6 operations (like "check") also?
And would it help to use different Galois field generators at different
positions in a sector instead of using a uniform generator?
Kind regards,
Thiemo Nagel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-05 0:41 ` Thiemo Nagel
@ 2008-01-05 0:45 ` H. Peter Anvin
2008-01-05 1:25 ` Thiemo Nagel
2008-01-07 9:28 ` Thiemo Nagel
0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05 0:45 UTC (permalink / raw)
To: Thiemo Nagel; +Cc: Linux RAID Mailing List
Thiemo Nagel wrote:
>>> For errors occurring on the level of hard disk blocks (signature: most
>>> bytes of the block have D errors, all with same z), the probability for
>>> multidisc corruption to go undetected is ((n-1)/256)**512. This might
>>> pose a problem in the limiting case of n=255, however for practical
>>> applications, this probability is negligible as it drops off
>>> exponentially with decreasing n:
>>>
>> That assumes fully random data distribution, which is almost certainly a
>> false assumption.
>
> Agreed. This means, that the formula only serves to specify a lower limit
> to the probability. However, is there an argumentation, why a pathologic
> case would be probable, i.e. why the probability would be likely to
> *vastly* deviate from the theoretical limit? And if there is, would that
> argumentation not apply to other raid 6 operations (like "check") also?
> And would it help to use different Galois field generators at different
> positions in a sector instead of using a uniform generator?
>
What you call "pathologic" cases when it comes to real-world data are
very common. It is not at all unusual to find sectors filled with only
a constant (usually zero, but not always), in which case your **512
becomes **1.
It doesn't mean it's not worthwhile, but don't try to claim it is
anything other than opportunistic.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-05 0:45 ` H. Peter Anvin
@ 2008-01-05 1:25 ` Thiemo Nagel
2008-01-05 1:49 ` H. Peter Anvin
2008-01-07 9:28 ` Thiemo Nagel
1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-05 1:25 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux RAID Mailing List
> Thiemo Nagel wrote:
>>>> For errors occurring on the level of hard disk blocks (signature: most
>>>> bytes of the block have D errors, all with same z), the probability
>>>> for
>>>> multidisc corruption to go undetected is ((n-1)/256)**512. This might
>>>> pose a problem in the limiting case of n=255, however for practical
>>>> applications, this probability is negligible as it drops off
>>>> exponentially with decreasing n:
>>>>
>>> That assumes fully random data distribution, which is almost certainly
>>> a
>>> false assumption.
>>
>> Agreed. This means, that the formula only serves to specify a lower
>> limit
>> to the probability. However, is there an argumentation, why a
>> pathologic
>> case would be probable, i.e. why the probability would be likely to
>> *vastly* deviate from the theoretical limit? And if there is, would
>> that
>> argumentation not apply to other raid 6 operations (like "check") also?
>> And would it help to use different Galois field generators at different
>> positions in a sector instead of using a uniform generator?
>>
>
> What you call "pathologic" cases when it comes to real-world data are
> very common. It is not at all unusual to find sectors filled with only
> a constant (usually zero, but not always), in which case your **512
> becomes **1.
That's why I was asking about the generator. Theoretically, this
situation might be countered by using a (pseudo-)random pattern of
generators for the different bytes of a sector, though I'm not sure
whether it is worth the effort.
Kind regards,
Thiemo Nagel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-05 1:25 ` Thiemo Nagel
@ 2008-01-05 1:49 ` H. Peter Anvin
0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05 1:49 UTC (permalink / raw)
To: Thiemo Nagel; +Cc: Linux RAID Mailing List
Thiemo Nagel wrote:
>
> That's why I was asking about the generator. Theoretically, this
> situation might be countered by using a (pseudo-)random pattern of
> generators for the different bytes of a sector, though I'm not sure
> whether it is worth the effort.
>
Changing the generator is mathematically equivalent to changing the
order of the drives, so no, that wouldn't help (and would make the
common computations a lot more expensive.)
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-05 0:45 ` H. Peter Anvin
2008-01-05 1:25 ` Thiemo Nagel
@ 2008-01-07 9:28 ` Thiemo Nagel
2008-01-07 9:58 ` Mattias Wadenstein
1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-07 9:28 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux RAID Mailing List
> What you call "pathologic" cases when it comes to real-world data are
> very common. It is not at all unusual to find sectors filled with only
> a constant (usually zero, but not always), in which case your **512
> becomes **1.
Of course it would be easy to check how many of the 512 Bytes are really
different on a case-by-case basis and correct the exponent accordingly,
and only perform the recovery when the corrected probability of
introducing an error is sufficiently low.
Kind regards,
Thiemo Nagel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-07 9:28 ` Thiemo Nagel
@ 2008-01-07 9:58 ` Mattias Wadenstein
2008-01-07 11:10 ` Thiemo Nagel
2008-01-07 17:20 ` H. Peter Anvin
0 siblings, 2 replies; 13+ messages in thread
From: Mattias Wadenstein @ 2008-01-07 9:58 UTC (permalink / raw)
To: Thiemo Nagel; +Cc: H. Peter Anvin, Linux RAID Mailing List
On Mon, 7 Jan 2008, Thiemo Nagel wrote:
>> What you call "pathologic" cases when it comes to real-world data are very
>> common. It is not at all unusual to find sectors filled with only a
>> constant (usually zero, but not always), in which case your **512 becomes
>> **1.
>
> Of course it would be easy to check how many of the 512 Bytes are really
> different on a case-by-case basis and correct the exponent accordingly, and
> only perform the recovery when the corrected probability of introducing an
> error is sufficiently low.
What is the alternative to recovery, really? Just erroring out and letting
the admin deal with it, or blindly assume that the parity is wrong?
/Mattias Wadenstein
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-07 9:58 ` Mattias Wadenstein
@ 2008-01-07 11:10 ` Thiemo Nagel
2008-01-07 17:20 ` H. Peter Anvin
1 sibling, 0 replies; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-07 11:10 UTC (permalink / raw)
To: Mattias Wadenstein; +Cc: H. Peter Anvin, Linux RAID Mailing List
Mattias Wadenstein wrote:
> On Mon, 7 Jan 2008, Thiemo Nagel wrote:
>
>>> What you call "pathologic" cases when it comes to real-world data are
>>> very common. It is not at all unusual to find sectors filled with
>>> only a constant (usually zero, but not always), in which case your
>>> **512 becomes **1.
>>
>> Of course it would be easy to check how many of the 512 Bytes are
>> really different on a case-by-case basis and correct the exponent
>> accordingly, and only perform the recovery when the corrected
>> probability of introducing an error is sufficiently low.
>
> What is the alternative to recovery, really? Just erroring out and
> letting the admin deal with it, blindly assume that the parity is wrong?
Currently, 'repair' does blind recalculation of parity. The only
benefit of that is (correct me if I'm wrong) to ascertain repeated reads
return identical data.
The last time I checked, there was not even a warning message.
Kind regards,
Thiemo Nagel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: On the subject of RAID-6 corruption recovery
2008-01-07 9:58 ` Mattias Wadenstein
2008-01-07 11:10 ` Thiemo Nagel
@ 2008-01-07 17:20 ` H. Peter Anvin
1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-07 17:20 UTC (permalink / raw)
To: Mattias Wadenstein; +Cc: Thiemo Nagel, Linux RAID Mailing List
Mattias Wadenstein wrote:
> On Mon, 7 Jan 2008, Thiemo Nagel wrote:
>
>>> What you call "pathologic" cases when it comes to real-world data are
>>> very common. It is not at all unusual to find sectors filled with
>>> only a constant (usually zero, but not always), in which case your
>>> **512 becomes **1.
>>
>> Of course it would be easy to check how many of the 512 Bytes are
>> really different on a case-by-case basis and correct the exponent
>> accordingly, and only perform the recovery when the corrected
>> probability of introducing an error is sufficiently low.
>
> What is the alternative to recovery, really? Just erroring out and
> letting the admin deal with it, or blindly assume that the parity is wrong?
>
Erroring out. Only thing to do at that point.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-01-07 17:20 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-28 2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
2007-12-28 14:38 ` Bill Davidsen
2007-12-28 17:34 ` H. Peter Anvin
2008-01-04 23:59 ` Thiemo Nagel
2008-01-05 0:03 ` H. Peter Anvin
2008-01-05 0:41 ` Thiemo Nagel
2008-01-05 0:45 ` H. Peter Anvin
2008-01-05 1:25 ` Thiemo Nagel
2008-01-05 1:49 ` H. Peter Anvin
2008-01-07 9:28 ` Thiemo Nagel
2008-01-07 9:58 ` Mattias Wadenstein
2008-01-07 11:10 ` Thiemo Nagel
2008-01-07 17:20 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).