From: Thiemo Nagel <thiemo.nagel@ph.tum.de>
To: Eyal Lebedinsky <eyal@eyal.emu.id.au>, Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid6 check/repair
Date: Fri, 30 Nov 2007 15:42:08 +0100 [thread overview]
Message-ID: <47502140.1080601@ph.tum.de> (raw)
In-Reply-To: <474F4880.8080300@eyal.emu.id.au>
Dear Neil and Eyal,
Eyal Lebedinsky wrote:
> Neil Brown wrote:
>> It would seem that either you or Peter Anvin is mistaken.
>>
>> On page 9 of
>> http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
>> at the end of section 4 it says:
>>
>> Finally, as a word of caution it should be noted that RAID-6 by
>> itself cannot even detect, never mind recover from, dual-disk
>> corruption. If two disks are corrupt in the same byte positions,
>> the above algorithm will in general introduce additional data
>> corruption by corrupting a third drive.
>
> The above a/b/c cases are not correct for raid6. While we can detect
> 0, 1 or 2 errors, any higher number of errors will be misidentified as
> one of these.
>
> The cases we will always see are:
> a) no errors - nothing to do
> b) one error - correct it
> c) two errors -report? take the raid down? recalc syndromes?
> and any other case will always appear as *one* of these (not as [c]).
I still don't agree. I'll explain the algorithm for error handling that
I have in mind, maybe you can point out if I'm mistaken at some point.
We have n data blocks D1...Dn and two parities P (XOR) and Q
(Reed-Solomon). I assume the existence of two functions to calculate
the parities
P = calc_P(D1, ..., Dn)
Q = calc_Q(D1, ..., Dn)
and two functions to recover a missing data block Dx using either parity
Dx = recover_P(x, D1, ..., Dx-1, Dx+1, ..., Dn, P)
Dx = recover_Q(x, D1, ..., Dx-1, Dx+1, ..., Dn, Q)
This pseudo-code should distinguish between a), b) and c) and properly
repair case b):
P' = calc_P(D1, ..., Dn);
Q' = calc_Q(D1, ..., Dn);
if (P' == P && Q' == Q) {
/* case a): zero errors */
return;
}
if (P' == P && Q' != Q) {
/* case b1): Q is bad, can be fixed */
Q = Q';
return;
}
if (P' != P && Q' == Q) {
/* case b2): P is bad, can be fixed */
P = P';
return;
}
/* both parities are bad, so we try whether the problem can
be fixed by repairing data blocks */
for (i = 1; i <= n; n++) {
/* assume only Di is bad, use P parity to repair */
D' = recover_P(i, D1, ..., Di-1, Di+1, ..., Dn, P);
/* use Q parity to check assumption */
Q' = calc_Q(D1, ..., Di-1, D', Di+1, ..., Dn);
if (Q == Q') {
/* case b3): Q parity is ok, that means the assumption was
correct and we can fix the problem */
Di = D';
return;
}
}
/* case c): when we get here, we have excluded cases a) and b),
so now we really have a problem */
report_unrecoverable_error();
return;
Concerning misidentification: A situation can be imagined, in which two
or more simultaneous corruptions have occurred in a very special way, so
that case b3) is diagnosed accidentally. While that is not impossible,
I'd assume the probability for it to be negligible, to be compared to
that of undetectable corruption in a RAID 5 setup.
Kind regards,
Thiemo
next prev parent reply other threads:[~2007-11-30 14:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-21 13:25 raid6 check/repair Thiemo Nagel
2007-11-22 3:55 ` Neil Brown
2007-11-22 16:51 ` Thiemo Nagel
2007-11-27 5:08 ` Bill Davidsen
2007-11-29 6:04 ` Neil Brown
2007-11-29 6:01 ` Neil Brown
2007-11-29 19:30 ` Bill Davidsen
2007-11-29 23:17 ` Eyal Lebedinsky
2007-11-30 14:42 ` Thiemo Nagel [this message]
[not found] ` <1196650421.14411.10.camel@elara.tcw.local>
[not found] ` <47546019.5030300@ph.tum.de>
2007-12-03 20:36 ` mailing list configuration (was: raid6 check/repair) Janek Kozicki
2007-12-04 8:45 ` Matti Aarnio
2007-12-04 21:07 ` raid6 check/repair Peter Grandi
2007-12-05 6:53 ` Mikael Abrahamsson
2007-12-05 9:00 ` Leif Nixon
2007-12-05 20:31 ` Bill Davidsen
2007-12-06 18:27 ` Andre Noll
2007-12-07 17:34 ` Gabor Gombas
2007-11-30 18:34 ` Thiemo Nagel
-- strict thread matches above, loose matches on Subject: below --
2007-11-21 13:45 Thiemo Nagel
2007-12-14 15:25 ` Thiemo Nagel
2007-11-15 15:28 Leif Nixon
2007-11-16 4:26 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47502140.1080601@ph.tum.de \
--to=thiemo.nagel@ph.tum.de \
--cc=eyal@eyal.emu.id.au \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).