Re: Find mismatch in data blocks during raid6 repair

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Robert Buchholz <robert.buchholz@goodpoint.de>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	John Robinson <john.robinson@anonymous.org.uk>,
	linux-raid@vger.kernel.org
Subject: Re: Find mismatch in data blocks during raid6 repair
Date: Tue, 3 Jul 2012 22:27:34 +0200	[thread overview]
Message-ID: <20120703202734.GA10087@lazy.lzy> (raw)
In-Reply-To: <2116390.poR22k1RqP@peanut>

Hi Robert,

On Tue, Jul 03, 2012 at 09:10:41PM +0200, Robert Buchholz wrote:
[...]
> > Why always two blocks?
> 
> The reason is simply to have less cases to handle in the code. There's 
> already three ways to regenerate regenerate two blocks (D&D, D/P&Q and 
> D&P), and there would be two more cases if only one block was to be 
> repaired. With the original patch, if you can repair two blocks, that 
> allows you to repair one (and one other in addition) as well.

sorry, I express myself not clearly.

I mean, a two parities Reed-Solomon system can
only detect one incorrect slot position, so I would
expect to have the possibility to fix only one, not
two slots.

So, I did not understand why two. I mean, I understand
that a RAID-6 can correct exact up two incorrect slots,
but the "unknown" case might have more and correcting
will mean no correction or, maybe, even more damage.

I would prefer, if you agree, to simply tell "raid6check"
to fix a single slot, or the (single) wrong slots it finds
during the check.

Does it make sense to you, or, maybe, you're considering
something I'm missing?

> > Of course, this is just a statistical assumption, which
> > means a second, "aggressive", option will have to be
> > available, with all the warnings of the case.
> 
> As you point out, it is impossible to determine which of two failed 
> slots are in error. I would leave such decision to an admin, but giving 
> one or more "advices" may be a nice idea.

That would be exactly the background.
For example, considering that "raid6check" processes
stripes, but the check is done per byte, already
knowing how many bytes per stripe (or block) need
to be corrected (per device) will hint a lot about
the overall status of the storage.

> Personally, I am recovering from a simultaneous three-disk failure on a 
> backup storage. My best hope was to ddrescue "most" from all three disks 
> onto fresh ones, and I lost a total of a few KB on each disk. Using the 
> ddrescue log, I can even say which sectors of each disk were damaged. 
> Interestingly, two disks of the same model failed on the very same 
> sector (even though they were produced at different times), so I now 
> have "unknown" slot errors in some stripes. But with context 
> information, I am certain I know which slots need to be repaired.

That's good!
Did you use "raid6check" for a verification?

[...]
> checksums. I may send another patch implementing this, but I wanted to 
> get general feedback on inclusion of such changes first (Neil?).

Yeah, last time Neil mentioned he needs re-triggering :-),
I guess you'll have to add "[PATCH]" tag to the message too...

> I am a big supporter of getting it to work, then make it fast. Since a 
> full raid check takes the magnitude of hours anyway, I do not mind that 
> repairing blocks from the user space will take five minutes when it 
> could be done in 3. That said, I think the faster code in the kernel is 
> warranted (as it needs this calculation very often when a disk is 
> failed), and if it is possible to reuse easily, we sure should.

The check is pretty slow, also due to the terminal
print out, which is a bit too verbose, I think.

Anyhow, I'm really happy someone has interest in
improving "raid6check", I hope you'll be able to
improve it and, maybe, someone else will join
the bandwagon... :-)

bye,

-- 

piergiorgio

next prev parent reply	other threads:[~2012-07-03 20:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 17:41 Find mismatch in data blocks during raid6 repair Robert Buchholz
2012-06-21 12:38 ` John Robinson
2012-06-21 14:58   ` Robert Buchholz
2012-06-21 18:23     ` Piergiorgio Sartor
2012-06-29 18:16       ` Robert Buchholz
2012-06-30 11:48         ` Piergiorgio Sartor
2012-07-03 19:10           ` Robert Buchholz
2012-07-03 20:27             ` Piergiorgio Sartor [this message]
2012-07-09  3:43               ` NeilBrown
2012-07-20 10:40                 ` [PATCH] " Robert Buchholz
2012-07-20 14:14                   ` Robert Buchholz
2012-07-20 10:53               ` Robert Buchholz
2012-07-21 16:00                 ` Piergiorgio Sartor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120703202734.GA10087@lazy.lzy \
    --to=piergiorgio.sartor@nexgo.de \
    --cc=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=robert.buchholz@goodpoint.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).