All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: Robert Buchholz <robert.buchholz@goodpoint.de>,
	John Robinson <john.robinson@anonymous.org.uk>,
	linux-raid@vger.kernel.org
Subject: Re: Find mismatch in data blocks during raid6 repair
Date: Mon, 9 Jul 2012 13:43:08 +1000	[thread overview]
Message-ID: <20120709134308.0ea5b18d@notabene.brown> (raw)
In-Reply-To: <20120703202734.GA10087@lazy.lzy>

[-- Attachment #1: Type: text/plain, Size: 4224 bytes --]

On Tue, 3 Jul 2012 22:27:34 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Robert,
> 
> On Tue, Jul 03, 2012 at 09:10:41PM +0200, Robert Buchholz wrote:
> [...]
> > > Why always two blocks?
> > 
> > The reason is simply to have less cases to handle in the code. There's 
> > already three ways to regenerate regenerate two blocks (D&D, D/P&Q and 
> > D&P), and there would be two more cases if only one block was to be 
> > repaired. With the original patch, if you can repair two blocks, that 
> > allows you to repair one (and one other in addition) as well.
> 
> sorry, I express myself not clearly.
> 
> I mean, a two parities Reed-Solomon system can
> only detect one incorrect slot position, so I would
> expect to have the possibility to fix only one, not
> two slots.
> 
> So, I did not understand why two. I mean, I understand
> that a RAID-6 can correct exact up two incorrect slots,
> but the "unknown" case might have more and correcting
> will mean no correction or, maybe, even more damage.
> 
> I would prefer, if you agree, to simply tell "raid6check"
> to fix a single slot, or the (single) wrong slots it finds
> during the check.


I think this is a sensible feature to offer.  Maybe if "autorepair" is given
in place of "repair", then it should choose which block to repair, and do
that one?


> 
> Does it make sense to you, or, maybe, you're considering
> something I'm missing?
> 
> > > Of course, this is just a statistical assumption, which
> > > means a second, "aggressive", option will have to be
> > > available, with all the warnings of the case.
> > 
> > As you point out, it is impossible to determine which of two failed 
> > slots are in error. I would leave such decision to an admin, but giving 
> > one or more "advices" may be a nice idea.
> 
> That would be exactly the background.
> For example, considering that "raid6check" processes
> stripes, but the check is done per byte, already
> knowing how many bytes per stripe (or block) need
> to be corrected (per device) will hint a lot about
> the overall status of the storage.
>  
> > Personally, I am recovering from a simultaneous three-disk failure on a 
> > backup storage. My best hope was to ddrescue "most" from all three disks 
> > onto fresh ones, and I lost a total of a few KB on each disk. Using the 
> > ddrescue log, I can even say which sectors of each disk were damaged. 
> > Interestingly, two disks of the same model failed on the very same 
> > sector (even though they were produced at different times), so I now 
> > have "unknown" slot errors in some stripes. But with context 
> > information, I am certain I know which slots need to be repaired.
> 
> That's good!
> Did you use "raid6check" for a verification?
>  
> [...]
> > checksums. I may send another patch implementing this, but I wanted to 
> > get general feedback on inclusion of such changes first (Neil?).
> 
> Yeah, last time Neil mentioned he needs re-triggering :-),
> I guess you'll have to add "[PATCH]" tag to the message too...

:-)

I have applied the patches, though with some fairly minor editing (wrapping
lines, moving variable declarations before any statements, removing
tab-at-the-end-of-the-line).  They probably won't appear in my public .git
for a little while I I have some other patches that I need to sort through
and organise first.


Thanks,
NeilBrown


> 
> > I am a big supporter of getting it to work, then make it fast. Since a 
> > full raid check takes the magnitude of hours anyway, I do not mind that 
> > repairing blocks from the user space will take five minutes when it 
> > could be done in 3. That said, I think the faster code in the kernel is 
> > warranted (as it needs this calculation very often when a disk is 
> > failed), and if it is possible to reuse easily, we sure should.
> 
> The check is pretty slow, also due to the terminal
> print out, which is a bit too verbose, I think.
> 
> Anyhow, I'm really happy someone has interest in
> improving "raid6check", I hope you'll be able to
> improve it and, maybe, someone else will join
> the bandwagon... :-)
> 
> bye,
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-07-09  3:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20 17:41 Find mismatch in data blocks during raid6 repair Robert Buchholz
2012-06-21 12:38 ` John Robinson
2012-06-21 14:58   ` Robert Buchholz
2012-06-21 18:23     ` Piergiorgio Sartor
2012-06-29 18:16       ` Robert Buchholz
2012-06-30 11:48         ` Piergiorgio Sartor
2012-07-03 19:10           ` Robert Buchholz
2012-07-03 20:27             ` Piergiorgio Sartor
2012-07-09  3:43               ` NeilBrown [this message]
2012-07-20 10:40                 ` [PATCH] " Robert Buchholz
2012-07-20 14:14                   ` Robert Buchholz
2012-07-20 10:53               ` Robert Buchholz
2012-07-21 16:00                 ` Piergiorgio Sartor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120709134308.0ea5b18d@notabene.brown \
    --to=neilb@suse.de \
    --cc=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=piergiorgio.sartor@nexgo.de \
    --cc=robert.buchholz@goodpoint.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.