Re: Find mismatch in data blocks during raid6 repair

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Robert Buchholz <robert.buchholz@goodpoint.de>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: John Robinson <john.robinson@anonymous.org.uk>,
	linux-raid@vger.kernel.org
Subject: Re: Find mismatch in data blocks during raid6 repair
Date: Tue, 03 Jul 2012 21:10:41 +0200	[thread overview]
Message-ID: <2116390.poR22k1RqP@peanut> (raw)
In-Reply-To: <20120630114831.GA3034@lazy.lzy>

[-- Attachment #1: Type: text/plain, Size: 4482 bytes --]

Hey Piergiorgio,

On Saturday, June 30, 2012 01:48:31 PM Piergiorgio Sartor wrote:
> > the tool currently can detect failure of a single slot, and it
> > could automatically repair that, I chose to make repair an
> > explicit action. In fact, even the slice number and the two slots
> > to repair are given via the command line.
> > 
> > So for example, given this output of raid6check (check mode):
> > Error detected at 1: possible failed disk slot: 5 --> /dev/sda1
> > Error detected at 2: possible failed disk slot: 3 --> /dev/sdb1
> > Error detected at 3: disk slot unknown
> > 
> > To regenerate 1 and 2, run:
> > raid6check /dev/md0 repair 1 5 3
> > raid6check /dev/md0 repair 2 5 3
> > (the repair arguments require you to always rebuild two blocks,
> > one of which should result in a noop in these cases)
> 
> Why always two blocks?

The reason is simply to have less cases to handle in the code. There's 
already three ways to regenerate regenerate two blocks (D&D, D/P&Q and 
D&P), and there would be two more cases if only one block was to be 
repaired. With the original patch, if you can repair two blocks, that 
allows you to repair one (and one other in addition) as well.

> > Since for stripe 3, two slots must be wrong, the admin has to
> > provide a
> Well, "unknown" means it is not possible to detect
> which one(s).
> It could be there are more than 2 corrupted.
> The "unknown" case means that the only reasonable thing
> would be to rebuild the parities, but nothing more can
> be said about the status of the array.
> 
> Nevertheless, there is a possibility which I was thinking
> about, but I never had time to implement (even if the
> software has some already built-in infrastructure for it).
> Specifically, a "vertical" statistic.
> That is, if there are mismatches, and, for example, 90% of
> them belong to /dev/sdX, and the rest 10% are "unknown",
> then it could be possible to extrapolate that, for the
> "unknown", /dev/sdX must be fixed anyway and then re-check
> if the status is still "unknown" or some other disk shows
> up. If one disk is reported, then it could be fixed.
> Other cases, the parity must be adjusted, whatever this
> means in terms of data recovery.
> 
> Of course, this is just a statistical assumption, which
> means a second, "aggressive", option will have to be
> available, with all the warnings of the case.

As you point out, it is impossible to determine which of two failed 
slots are in error. I would leave such decision to an admin, but giving 
one or more "advices" may be a nice idea.

Personally, I am recovering from a simultaneous three-disk failure on a 
backup storage. My best hope was to ddrescue "most" from all three disks 
onto fresh ones, and I lost a total of a few KB on each disk. Using the 
ddrescue log, I can even say which sectors of each disk were damaged. 
Interestingly, two disks of the same model failed on the very same 
sector (even though they were produced at different times), so I now 
have "unknown" slot errors in some stripes. But with context 
information, I am certain I know which slots need to be repaired.

> > guess (and could iterate guesses, provided proper stripe backups):
> > raid6check /dev/md0 repair 3 5 3
> 
> Actually, this could also be an improvement, I mean
> the possibility to backup stripes, so that other,
> advanced, recovery could be tried and reverted, if
> necessary.

That is true. I was thinking about this too. Unfortunately, as I 
remember, the functions to save and restore stripes in restripe.c do not 
save P and Q, which we should in order to redo the data block 
calculation. But with stripe backups, one could even imagine doing 
verifications on upper layers -- such as verifying file(system) 
checksums. I may send another patch implementing this, but I wanted to 
get general feedback on inclusion of such changes first (Neil?).

> Finally, someone should consider to use the optimized
> raid6 code, from the kernel module (can we link that
> code directly?), in order to speed up the check/repair.

I am a big supporter of getting it to work, then make it fast. Since a 
full raid check takes the magnitude of hours anyway, I do not mind that 
repairing blocks from the user space will take five minutes when it 
could be done in 3. That said, I think the faster code in the kernel is 
warranted (as it needs this calculation very often when a disk is 
failed), and if it is possible to reuse easily, we sure should.

Cheers,

Robert

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

next prev parent reply	other threads:[~2012-07-03 19:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 17:41 Find mismatch in data blocks during raid6 repair Robert Buchholz
2012-06-21 12:38 ` John Robinson
2012-06-21 14:58   ` Robert Buchholz
2012-06-21 18:23     ` Piergiorgio Sartor
2012-06-29 18:16       ` Robert Buchholz
2012-06-30 11:48         ` Piergiorgio Sartor
2012-07-03 19:10           ` Robert Buchholz [this message]
2012-07-03 20:27             ` Piergiorgio Sartor
2012-07-09  3:43               ` NeilBrown
2012-07-20 10:40                 ` [PATCH] " Robert Buchholz
2012-07-20 14:14                   ` Robert Buchholz
2012-07-20 10:53               ` Robert Buchholz
2012-07-21 16:00                 ` Piergiorgio Sartor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2116390.poR22k1RqP@peanut \
    --to=robert.buchholz@goodpoint.de \
    --cc=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=piergiorgio.sartor@nexgo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.