Re: raid6 check/repair - Bill Davidsen

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: thiemo.nagel@ph.tum.de, linux-raid@vger.kernel.org
Subject: Re: raid6 check/repair
Date: Thu, 29 Nov 2007 14:30:36 -0500	[thread overview]
Message-ID: <474F135C.2000703@tmr.com> (raw)
In-Reply-To: <18254.21949.441607.134763@notabene.brown>

Neil Brown wrote:
> On Thursday November 22, thiemo.nagel@ph.tum.de wrote:
>   
>> Dear Neil,
>>
>> thank you very much for your detailed answer.
>>
>> Neil Brown wrote:
>>     
>>> While it is possible to use the RAID6 P+Q information to deduce which
>>> data block is wrong if it is known that either 0 or 1 datablocks is 
>>> wrong, it is *not* possible to deduce which block or blocks are wrong
>>> if it is possible that more than 1 data block is wrong.
>>>       
>> If I'm not mistaken, this is only partly correct.  Using P+Q redundancy,
>> it *is* possible, to distinguish three cases:
>> a) exactly zero bad blocks
>> b) exactly one bad block
>> c) more than one bad block
>>
>> Of course, it is only possible to recover from b), but one *can* tell,
>> whether the situation is a) or b) or c) and act accordingly.
>>     
>
> It would seem that either you or Peter Anvin is mistaken.
>
> On page 9 of 
>   http://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
> at the end of section 4 it says:
>
>       Finally, as a word of caution it should be noted that RAID-6 by
>       itself cannot even detect, never mind recover from, dual-disk
>       corruption. If two disks are corrupt in the same byte positions,
>       the above algorithm will in general introduce additional data
>       corruption by corrupting a third drive.
>
>   
>> The point that I'm trying to make is, that there does exist a specific
>> case, in which recovery is possible, and that implementing recovery for
>> that case will not hurt in any way.
>>     
>
> Assuming that it true (maybe hpa got it wrong) what specific
> conditions would lead to one drive having corrupt data, and would
> correcting it on an occasional 'repair' pass be an appropriate
> response?
>
> Does the value justify the cost of extra code complexity?
>
>   
>>> RAID is not designed to protect again bad RAM, bad cables, chipset 
>>> bugs drivers bugs etc.  It is only designed to protect against drive 
>>> failure, where the drive failure is apparent.  i.e. a read must 
>>> return either the same data that was last written, or a failure 
>>> indication. Anything else is beyond the design parameters for RAID.
>>>       
>> I'm taking a more pragmatic approach here.  In my opinion, RAID should
>> "just protect my data", against drive failure, yes, of course, but if it
>> can help me in case of occasional data corruption, I'd happily take
>> that, too, especially if it doesn't cost extra... ;-)
>>     
>
> Everything costs extra.  Code uses bytes of memory, requires
> maintenance, and possibly introduced new bugs.  I'm not convinced the
> failure mode that you are considering actually happens with a
> meaningful frequency.
>   

People accept the hardware and performance costs of raid-6 in return for 
the better security of their data. If I run a check and find that I have 
an error, right now I have to treat that the same way as an 
unrecoverable failure, because the "repair" function doesn't fix the 
data, it just makes the symptom go away by redoing the p and q values.

This makes the naive user thinks the problem is solved, when in fact 
it's now worse, he has corrupt data with no indication of a problem. The 
fact that (most) people who read this list are advanced enough to 
understand the issue does not protect the majority of users from their 
ignorance. If that sounds elitist, many of the people on this list are 
the elite, and even knowing that you need to learn and understand more 
is a big plus in my book. It's the people who run repair and assume the 
problem is fixed who get hurt by the current behavior.

If you won't fix the recoverable case by recovering, then maybe for 
raid-6 you could print an error message like
  can't recover data, fix parity and hide the problem (y/N)?
or require a --force flag, and at least give a heads up to the people 
who just picked the "most reliable raid level" because they're trying to 
do it right, but need a clue that they have a real and serious problem, 
and just a "repair" can't fix it.

Recovering a filesystem full of "just files" is pretty easy, that's what 
backups with CRC are for, but a large database recovery often takes 
hours to restore and run journal files. I personally consider it the job 
of the kernel to do recovery when it is possible, absent that I would 
like the tools to tell me clearly that I have a problem and what it is.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark

next prev parent reply	other threads:[~2007-11-29 19:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-21 13:25 raid6 check/repair Thiemo Nagel
2007-11-22  3:55 ` Neil Brown
2007-11-22 16:51   ` Thiemo Nagel
2007-11-27  5:08     ` Bill Davidsen
2007-11-29  6:04       ` Neil Brown
2007-11-29  6:01     ` Neil Brown
2007-11-29 19:30       ` Bill Davidsen [this message]
2007-11-29 23:17       ` Eyal Lebedinsky
2007-11-30 14:42         ` Thiemo Nagel
     [not found]           ` <1196650421.14411.10.camel@elara.tcw.local>
     [not found]             ` <47546019.5030300@ph.tum.de>
2007-12-03 20:36               ` mailing list configuration (was: raid6 check/repair) Janek Kozicki
2007-12-04  8:45                 ` Matti Aarnio
2007-12-04 21:07               ` raid6 check/repair Peter Grandi
2007-12-05  6:53                 ` Mikael Abrahamsson
2007-12-05  9:00                 ` Leif Nixon
2007-12-05 20:31                 ` Bill Davidsen
2007-12-06 18:27                   ` Andre Noll
2007-12-07 17:34                   ` Gabor Gombas
2007-11-30 18:34       ` Thiemo Nagel
  -- strict thread matches above, loose matches on Subject: below --
2007-11-21 13:45 Thiemo Nagel
2007-12-14 15:25 ` Thiemo Nagel
2007-11-15 15:28 Leif Nixon
2007-11-16  4:26 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=474F135C.2000703@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=thiemo.nagel@ph.tum.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).