linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Buchholz <robert.buchholz@goodpoint.de>
To: linux-raid@vger.kernel.org
Subject: Find mismatch in data blocks during raid6 repair
Date: Wed, 20 Jun 2012 19:41:05 +0200	[thread overview]
Message-ID: <10900468.MPSjVn2C3J@peanut> (raw)

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

Hello list,

I have been looking into the repair functionality of the raid6 
implementation in recent kernels to find out how the md driver handles 
parity mismatches. As I understand, the handle_parity_checks6 function 
simply regenerates P and Q from the data blocks if they do not match. 
While this makes perfect sense for a single parity mismatch, when both 
are wrong it may indicate an error in the data blocks.

When repairing a full raid6 with no missing drives (raid-devices=n+2), a 
single inconsistent data block could be detected: For every one of the n 
blocks, assume its device is missing, recover the block from P, generate 
Q' and compare with the actual Q. If there is exactly one block where Q' 
equals Q, rewrite the data block in question.*

I understand the usual failure mode is to remove a drive from the array, 
or use IO errors from the kernel to identify incorrect data blocks. 
However, this assumes we recognize the error at the time and thus know 
which data block is incorrect. But that is not always the case:
The drives could be inconsistent after a multi-drive failure, unclean 
shutdown, bit rot or because one raid drive was replaced outside the 
realm of md using dd(rescue).
Is there a reason this approach is not currently chosen? The performance 
implications seem to be low, there is no increased io (in fact, it may 
decrease since write-back decreases up to a factor of 2), and number of 
parity calculations increases by a factor of n in the error case (both 
error case and n could be assumed low).


Cheers

Robert

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

             reply	other threads:[~2012-06-20 17:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 17:41 Robert Buchholz [this message]
2012-06-21 12:38 ` Find mismatch in data blocks during raid6 repair John Robinson
2012-06-21 14:58   ` Robert Buchholz
2012-06-21 18:23     ` Piergiorgio Sartor
2012-06-29 18:16       ` Robert Buchholz
2012-06-30 11:48         ` Piergiorgio Sartor
2012-07-03 19:10           ` Robert Buchholz
2012-07-03 20:27             ` Piergiorgio Sartor
2012-07-09  3:43               ` NeilBrown
2012-07-20 10:40                 ` [PATCH] " Robert Buchholz
2012-07-20 14:14                   ` Robert Buchholz
2012-07-20 10:53               ` Robert Buchholz
2012-07-21 16:00                 ` Piergiorgio Sartor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10900468.MPSjVn2C3J@peanut \
    --to=robert.buchholz@goodpoint.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).