Re: raid6 check/repair - Eyal Lebedinsky

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eyal Lebedinsky <eyal@eyal.emu.id.au>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid6 check/repair
Date: Fri, 30 Nov 2007 10:17:20 +1100	[thread overview]
Message-ID: <474F4880.8080300@eyal.emu.id.au> (raw)
In-Reply-To: <18254.21949.441607.134763@notabene.brown>

Neil Brown wrote:
> On Thursday November 22, thiemo.nagel@ph.tum.de wrote:
>> Dear Neil,
>>
>> thank you very much for your detailed answer.
>>
>> Neil Brown wrote:
>>> While it is possible to use the RAID6 P+Q information to deduce which
>>> data block is wrong if it is known that either 0 or 1 datablocks is 
>>> wrong, it is *not* possible to deduce which block or blocks are wrong
>>> if it is possible that more than 1 data block is wrong.
>> If I'm not mistaken, this is only partly correct.  Using P+Q redundancy,
>> it *is* possible, to distinguish three cases:
>> a) exactly zero bad blocks
>> b) exactly one bad block
>> c) more than one bad block
>>
>> Of course, it is only possible to recover from b), but one *can* tell,
>> whether the situation is a) or b) or c) and act accordingly.
> 
> It would seem that either you or Peter Anvin is mistaken.
> 
> On page 9 of 
>   http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
> at the end of section 4 it says:
> 
>       Finally, as a word of caution it should be noted that RAID-6 by
>       itself cannot even detect, never mind recover from, dual-disk
>       corruption. If two disks are corrupt in the same byte positions,
>       the above algorithm will in general introduce additional data
>       corruption by corrupting a third drive.

The above a/b/c cases are not correct for raid6. While we can detect
0, 1 or 2 errors, any higher number of errors will be misidentified as
one of these.

The cases we will always see are:
	a) no  errors - nothing to do
	b) one error - correct it
	c) two errors -report? take the raid down? recalc syndromes?
and any other case will always appear as *one* of these (not as [c]).

Case [c] is where different users will want to do different things. If my data
is highly critical (would I really use raid6 here and not a higher redundancy
level?) I could consider doing some investigation. e.g. pick each pair of disks
in turn as the faulty ones, correct them and check that my data looks good
(fsck? inspect the data visually?) until one pair choice gives good data.

<may be OT>

The quote, saying two errors may not be detected, is not how I understand
ECC schemes to work. Does anyone have other papers that point this?

Also, is it the case that the raid6 alg detects a failed disk (strip)
or is it actually detecting failed bits and as such the correction is
done to the whole stripe? In other words, values in all failed locations
are fixed (when only 1-error cases are present) and not in just one
strip. This means that we do not necessarily identify the bad disk, and
neither do we need to.

-- 
Eyal Lebedinsky	(eyal@eyal.emu.id.au)

next prev parent reply	other threads:[~2007-11-29 23:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-21 13:25 raid6 check/repair Thiemo Nagel
2007-11-22  3:55 ` Neil Brown
2007-11-22 16:51   ` Thiemo Nagel
2007-11-27  5:08     ` Bill Davidsen
2007-11-29  6:04       ` Neil Brown
2007-11-29  6:01     ` Neil Brown
2007-11-29 19:30       ` Bill Davidsen
2007-11-29 23:17       ` Eyal Lebedinsky [this message]
2007-11-30 14:42         ` Thiemo Nagel
     [not found]           ` <1196650421.14411.10.camel@elara.tcw.local>
     [not found]             ` <47546019.5030300@ph.tum.de>
2007-12-03 20:36               ` mailing list configuration (was: raid6 check/repair) Janek Kozicki
2007-12-04  8:45                 ` Matti Aarnio
2007-12-04 21:07               ` raid6 check/repair Peter Grandi
2007-12-05  6:53                 ` Mikael Abrahamsson
2007-12-05  9:00                 ` Leif Nixon
2007-12-05 20:31                 ` Bill Davidsen
2007-12-06 18:27                   ` Andre Noll
2007-12-07 17:34                   ` Gabor Gombas
2007-11-30 18:34       ` Thiemo Nagel
  -- strict thread matches above, loose matches on Subject: below --
2007-11-21 13:45 Thiemo Nagel
2007-12-14 15:25 ` Thiemo Nagel
2007-11-15 15:28 Leif Nixon
2007-11-16  4:26 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=474F4880.8080300@eyal.emu.id.au \
    --to=eyal@eyal.emu.id.au \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).