On the subject of RAID-6 corruption recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* On the subject of RAID-6 corruption recovery
@ 2007-12-28  2:58 H. Peter Anvin
  2007-12-28 14:38 ` Bill Davidsen
  2008-01-04 23:59 ` Thiemo Nagel
  0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2007-12-28  2:58 UTC (permalink / raw)
  To: Linux RAID Mailing List

I got a private email a while ago from Thiemo Nagel claiming that some 
of the conclusions in my RAID-6 paper was incorrect.  This was combined 
with a "proof" which was plain wrong, and could easily be disproven 
using basic enthropy accounting (i.e. how much information is around to 
play with.)

However, it did cause me to clarify the text portion a little bit.  In 
particular, *in practice* in may be possible to *probabilistically* 
detect multidisk corruption.  Probabilistic detection means that the 
detection is not guaranteed, but it can be taken advantage of 
opportunistically.

In particular, if you follow the algorithm of section 4 of my paper, you 
end up with a corrupt disk number, but the result is a vector, not a 
scalar.  This is because the algorithm is executed on the P* and Q* 
error vectors on a byte by byte basis.

In the common case of a single disk corruption, what you will typically 
see is an error pattern that has a consistent value interrupted by 
correct bytes (P* = Q* = {00}); this is due to bytes which still had the 
random value by chance.  For the z values which can be computed (recall, 
z is only well-defined if P* and Q* are != {00}), they should match.

There are two patterns which are likely to indicate multi-disk 
corruption and where recovery software should trip out and raise hell:

* z >= n: the computed error disk doesn't exist.

	Obviously, if "the corrupt disk" is a disk that can't exist, we
	have a bigger problem.

	This is probabilistic, since as n approaches 255, the
	probability of detection goes to zero.

* Inconsistent z numbers (or spurious P and Q references)

	If the calculation for which disk is corrupt jumps around
	within a single sector, there is likely a problem.

It's worth noting in all of this that there is 258 possible outcomes of 
the complete error analysis algorithm - 255 possible D errors (z 
values), P error, Q error, and no error.  If these are to be analyzed as 
an array, it can't be solely a byte array.

That this set is complete is shown by the fact that out of 65536 
possible (P, Q) states, this corresponds to:

1 state no error
255 states P error (the 256th state is a no-error state!)
255 states Q error
255*255 states D error (n = 255 is maximum for byte-oriented RAID-6)

... for a total of 65536 states.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2007-12-28  2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
@ 2007-12-28 14:38 ` Bill Davidsen
  2007-12-28 17:34   ` H. Peter Anvin
  2008-01-04 23:59 ` Thiemo Nagel
  1 sibling, 1 reply; 13+ messages in thread
From: Bill Davidsen @ 2007-12-28 14:38 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux RAID Mailing List

H. Peter Anvin wrote:
> I got a private email a while ago from Thiemo Nagel claiming that some 
> of the conclusions in my RAID-6 paper was incorrect.  This was 
> combined with a "proof" which was plain wrong, and could easily be 
> disproven using basic enthropy accounting (i.e. how much information 
> is around to play with.)
>
> However, it did cause me to clarify the text portion a little bit.  In 
> particular, *in practice* in may be possible to *probabilistically* 
> detect multidisk corruption.  Probabilistic detection means that the 
> detection is not guaranteed, but it can be taken advantage of 
> opportunistically.

If this means that there can be no false positives for multidisk 
corruption but may be false negatives, fine. If it means something else, 
please restate one more time.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2007-12-28 14:38 ` Bill Davidsen
@ 2007-12-28 17:34   ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2007-12-28 17:34 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID Mailing List

Bill Davidsen wrote:
> H. Peter Anvin wrote:
>> I got a private email a while ago from Thiemo Nagel claiming that some 
>> of the conclusions in my RAID-6 paper was incorrect.  This was 
>> combined with a "proof" which was plain wrong, and could easily be 
>> disproven using basic enthropy accounting (i.e. how much information 
>> is around to play with.)
>>
>> However, it did cause me to clarify the text portion a little bit.  In 
>> particular, *in practice* in may be possible to *probabilistically* 
>> detect multidisk corruption.  Probabilistic detection means that the 
>> detection is not guaranteed, but it can be taken advantage of 
>> opportunistically.
> 
> If this means that there can be no false positives for multidisk 
> corruption but may be false negatives, fine. If it means something else, 
> please restate one more time.
> 

Pretty much.  False negatives are quite serious, since they will imply a 
course of action which will introduce further corruption.

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2007-12-28  2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
  2007-12-28 14:38 ` Bill Davidsen
@ 2008-01-04 23:59 ` Thiemo Nagel
  2008-01-05  0:03   ` H. Peter Anvin
  1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-04 23:59 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux RAID Mailing List

Dear hpa,

H. Peter Anvin wrote:
> I got a private email a while ago from Thiemo Nagel claiming that
> some of the conclusions in my RAID-6 paper was incorrect.  This was
> combined with a "proof" which was plain wrong, and could easily be
> disproven using basic enthropy accounting (i.e. how much information
> is around to play with.)
>
> However, it did cause me to clarify the text portion a little bit. In
> particular, *in practice* in may be possible to *probabilistically*
> detect multidisk corruption.  Probabilistic detection means that the
> detection is not guaranteed, but it can be taken advantage of
> opportunistically.

Thank you very much for setting me straight concerning some of my
misconceptions about raid 6.  Yet, the point that I was trying to make
was that the statement "multidisc corruption cannot be detected" -- while
correct in a mathematical sense -- is misleading when considering
practical application, and I feel confirmed in that by your reply.

> There are two patterns which are likely to indicate multi-disk
> corruption and where recovery software should trip out and raise
> hell:
>
> * z >= n: the computed error disk doesn't exist.
>
> Obviously, if "the corrupt disk" is a disk that can't exist, we have
> a bigger problem.
>
> This is probabilistic, since as n approaches 255, the probability of
> detection goes to zero.
>
> * Inconsistent z numbers (or spurious P and Q references)
>
> If the calculation for which disk is corrupt jumps around within a
> single sector, there is likely a problem.

Inverting your argumentation, that means when we don't see z >= n or
inconsistent z numbers, multidisc corruption can be excluded statistically.

For errors occurring on the level of hard disk blocks (signature: most
bytes of the block have D errors, all with same z), the probability for
multidisc corruption to go undetected is ((n-1)/256)**512.  This might
pose a problem in the limiting case of n=255, however for practical
applications, this probability is negligible as it drops off
exponentially with decreasing n:

n=255  p=1.8%
n=250  p=6.8e-7
n=240  p=5.3e-16
n=10   p=3.6e-745

So it seems to me that for that case, implementing recovery would be
safe (maybe limit it to n<240).

For errors occurring on the byte level (signature: only one byte of a
sector has D error, all other bytes have no error), multidisc corruption
is highly unlikely due to a different argumentation:  Since 511 out of
512 bytes are ok, it can be concluded, that for errors in this specific
sector, there is no correlation between the individual disks.  That
means, that the probability for double corruption is approximately
8*(n-1)*BER, and the bit error rate (BER) should be low.  (For
comparison:  Some vendors specify 1e-15 as probability of unrecoverable
read error (per bit that is read).  I'd assume that the probability of
silent read errors is much lower, at least for the disk itself; however
additional errors might be introduced in (S)ATA transfer or in the
controller.)

For that case, too, it seems to me that implementing recovery could do
no harm.

Kind regards,

Thiemo Nagel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-04 23:59 ` Thiemo Nagel
@ 2008-01-05  0:03   ` H. Peter Anvin
  2008-01-05  0:41     ` Thiemo Nagel
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05  0:03 UTC (permalink / raw)
  To: Thiemo Nagel; +Cc: Linux RAID Mailing List

Thiemo Nagel wrote:
> 
> Inverting your argumentation, that means when we don't see z >= n or
> inconsistent z numbers, multidisc corruption can be excluded statistically.
> 
> For errors occurring on the level of hard disk blocks (signature: most
> bytes of the block have D errors, all with same z), the probability for
> multidisc corruption to go undetected is ((n-1)/256)**512.  This might
> pose a problem in the limiting case of n=255, however for practical
> applications, this probability is negligible as it drops off
> exponentially with decreasing n:
> 

That assumes fully random data distribution, which is almost certainly a 
false assumption.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-05  0:03   ` H. Peter Anvin
@ 2008-01-05  0:41     ` Thiemo Nagel
  2008-01-05  0:45       ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-05  0:41 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux RAID Mailing List

>> For errors occurring on the level of hard disk blocks (signature: most
>> bytes of the block have D errors, all with same z), the probability for
>> multidisc corruption to go undetected is ((n-1)/256)**512.  This might
>> pose a problem in the limiting case of n=255, however for practical
>> applications, this probability is negligible as it drops off
>> exponentially with decreasing n:
>>
>
> That assumes fully random data distribution, which is almost certainly a
> false assumption.

Agreed.  This means, that the formula only serves to specify a lower limit
to the probability.  However, is there an argumentation, why a pathologic
case would be probable, i.e. why the probability would be likely to
*vastly* deviate from the theoretical limit?  And if there is, would that
argumentation not apply to other raid 6 operations (like "check") also? 
And would it help to use different Galois field generators at different
positions in a sector instead of using a uniform generator?

Kind regards,

Thiemo Nagel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-05  0:41     ` Thiemo Nagel
@ 2008-01-05  0:45       ` H. Peter Anvin
  2008-01-05  1:25         ` Thiemo Nagel
  2008-01-07  9:28         ` Thiemo Nagel
  0 siblings, 2 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05  0:45 UTC (permalink / raw)
  To: Thiemo Nagel; +Cc: Linux RAID Mailing List

Thiemo Nagel wrote:
>>> For errors occurring on the level of hard disk blocks (signature: most
>>> bytes of the block have D errors, all with same z), the probability for
>>> multidisc corruption to go undetected is ((n-1)/256)**512.  This might
>>> pose a problem in the limiting case of n=255, however for practical
>>> applications, this probability is negligible as it drops off
>>> exponentially with decreasing n:
>>>
>> That assumes fully random data distribution, which is almost certainly a
>> false assumption.
> 
> Agreed.  This means, that the formula only serves to specify a lower limit
> to the probability.  However, is there an argumentation, why a pathologic
> case would be probable, i.e. why the probability would be likely to
> *vastly* deviate from the theoretical limit?  And if there is, would that
> argumentation not apply to other raid 6 operations (like "check") also? 
> And would it help to use different Galois field generators at different
> positions in a sector instead of using a uniform generator?
> 

What you call "pathologic" cases when it comes to real-world data are 
very common.  It is not at all unusual to find sectors filled with only 
a constant (usually zero, but not always), in which case your **512 
becomes **1.

It doesn't mean it's not worthwhile, but don't try to claim it is 
anything other than opportunistic.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-05  0:45       ` H. Peter Anvin
@ 2008-01-05  1:25         ` Thiemo Nagel
  2008-01-05  1:49           ` H. Peter Anvin
  2008-01-07  9:28         ` Thiemo Nagel
  1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-05  1:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux RAID Mailing List

> Thiemo Nagel wrote:
>>>> For errors occurring on the level of hard disk blocks (signature: most
>>>> bytes of the block have D errors, all with same z), the probability
>>>> for
>>>> multidisc corruption to go undetected is ((n-1)/256)**512.  This might
>>>> pose a problem in the limiting case of n=255, however for practical
>>>> applications, this probability is negligible as it drops off
>>>> exponentially with decreasing n:
>>>>
>>> That assumes fully random data distribution, which is almost certainly
>>> a
>>> false assumption.
>>
>> Agreed.  This means, that the formula only serves to specify a lower
>> limit
>> to the probability.  However, is there an argumentation, why a
>> pathologic
>> case would be probable, i.e. why the probability would be likely to
>> *vastly* deviate from the theoretical limit?  And if there is, would
>> that
>> argumentation not apply to other raid 6 operations (like "check") also?
>> And would it help to use different Galois field generators at different
>> positions in a sector instead of using a uniform generator?
>>
>
> What you call "pathologic" cases when it comes to real-world data are
> very common.  It is not at all unusual to find sectors filled with only
> a constant (usually zero, but not always), in which case your **512
> becomes **1.

That's why I was asking about the generator.  Theoretically, this
situation might be countered by using a (pseudo-)random pattern of
generators for the different bytes of a sector, though I'm not sure
whether it is worth the effort.

Kind regards,

Thiemo Nagel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-05  1:25         ` Thiemo Nagel
@ 2008-01-05  1:49           ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-05  1:49 UTC (permalink / raw)
  To: Thiemo Nagel; +Cc: Linux RAID Mailing List

Thiemo Nagel wrote:
> 
> That's why I was asking about the generator.  Theoretically, this
> situation might be countered by using a (pseudo-)random pattern of
> generators for the different bytes of a sector, though I'm not sure
> whether it is worth the effort.
> 

Changing the generator is mathematically equivalent to changing the 
order of the drives, so no, that wouldn't help (and would make the 
common computations a lot more expensive.)

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-05  0:45       ` H. Peter Anvin
  2008-01-05  1:25         ` Thiemo Nagel
@ 2008-01-07  9:28         ` Thiemo Nagel
  2008-01-07  9:58           ` Mattias Wadenstein
  1 sibling, 1 reply; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-07  9:28 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux RAID Mailing List

> What you call "pathologic" cases when it comes to real-world data are 
> very common.  It is not at all unusual to find sectors filled with only 
> a constant (usually zero, but not always), in which case your **512 
> becomes **1.

Of course it would be easy to check how many of the 512 Bytes are really 
different on a case-by-case basis and correct the exponent accordingly, 
and only perform the recovery when the corrected probability of 
introducing an error is sufficiently low.

Kind regards,

Thiemo Nagel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-07  9:28         ` Thiemo Nagel
@ 2008-01-07  9:58           ` Mattias Wadenstein
  2008-01-07 11:10             ` Thiemo Nagel
  2008-01-07 17:20             ` H. Peter Anvin
  0 siblings, 2 replies; 13+ messages in thread
From: Mattias Wadenstein @ 2008-01-07  9:58 UTC (permalink / raw)
  To: Thiemo Nagel; +Cc: H. Peter Anvin, Linux RAID Mailing List

On Mon, 7 Jan 2008, Thiemo Nagel wrote:

>> What you call "pathologic" cases when it comes to real-world data are very 
>> common.  It is not at all unusual to find sectors filled with only a 
>> constant (usually zero, but not always), in which case your **512 becomes 
>> **1.
>
> Of course it would be easy to check how many of the 512 Bytes are really 
> different on a case-by-case basis and correct the exponent accordingly, and 
> only perform the recovery when the corrected probability of introducing an 
> error is sufficiently low.

What is the alternative to recovery, really? Just erroring out and letting 
the admin deal with it, or blindly assume that the parity is wrong?

/Mattias Wadenstein

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-07  9:58           ` Mattias Wadenstein
@ 2008-01-07 11:10             ` Thiemo Nagel
  2008-01-07 17:20             ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: Thiemo Nagel @ 2008-01-07 11:10 UTC (permalink / raw)
  To: Mattias Wadenstein; +Cc: H. Peter Anvin, Linux RAID Mailing List

Mattias Wadenstein wrote:
> On Mon, 7 Jan 2008, Thiemo Nagel wrote:
> 
>>> What you call "pathologic" cases when it comes to real-world data are 
>>> very common.  It is not at all unusual to find sectors filled with 
>>> only a constant (usually zero, but not always), in which case your 
>>> **512 becomes **1.
>>
>> Of course it would be easy to check how many of the 512 Bytes are 
>> really different on a case-by-case basis and correct the exponent 
>> accordingly, and only perform the recovery when the corrected 
>> probability of introducing an error is sufficiently low.
> 
> What is the alternative to recovery, really? Just erroring out and 
> letting the admin deal with it, blindly assume that the parity is wrong?

Currently, 'repair' does blind recalculation of parity.  The only 
benefit of that is (correct me if I'm wrong) to ascertain repeated reads 
return identical data.

The last time I checked, there was not even a warning message.

Kind regards,

Thiemo Nagel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: On the subject of RAID-6 corruption recovery
  2008-01-07  9:58           ` Mattias Wadenstein
  2008-01-07 11:10             ` Thiemo Nagel
@ 2008-01-07 17:20             ` H. Peter Anvin
  1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2008-01-07 17:20 UTC (permalink / raw)
  To: Mattias Wadenstein; +Cc: Thiemo Nagel, Linux RAID Mailing List

Mattias Wadenstein wrote:
> On Mon, 7 Jan 2008, Thiemo Nagel wrote:
> 
>>> What you call "pathologic" cases when it comes to real-world data are 
>>> very common.  It is not at all unusual to find sectors filled with 
>>> only a constant (usually zero, but not always), in which case your 
>>> **512 becomes **1.
>>
>> Of course it would be easy to check how many of the 512 Bytes are 
>> really different on a case-by-case basis and correct the exponent 
>> accordingly, and only perform the recovery when the corrected 
>> probability of introducing an error is sufficiently low.
> 
> What is the alternative to recovery, really? Just erroring out and 
> letting the admin deal with it, or blindly assume that the parity is wrong?
> 

Erroring out.  Only thing to do at that point.

	-hpa

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-01-07 17:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-28  2:58 On the subject of RAID-6 corruption recovery H. Peter Anvin
2007-12-28 14:38 ` Bill Davidsen
2007-12-28 17:34   ` H. Peter Anvin
2008-01-04 23:59 ` Thiemo Nagel
2008-01-05  0:03   ` H. Peter Anvin
2008-01-05  0:41     ` Thiemo Nagel
2008-01-05  0:45       ` H. Peter Anvin
2008-01-05  1:25         ` Thiemo Nagel
2008-01-05  1:49           ` H. Peter Anvin
2008-01-07  9:28         ` Thiemo Nagel
2008-01-07  9:58           ` Mattias Wadenstein
2008-01-07 11:10             ` Thiemo Nagel
2008-01-07 17:20             ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).