linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: David Brown <david@westcontrol.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: misunderstanding of spare and raid devices? - and one question more
Date: Fri, 1 Jul 2011 23:02:19 +1000	[thread overview]
Message-ID: <20110701230219.51604317@notabene.brown> (raw)
In-Reply-To: <iukfjs$34e$1@dough.gmane.org>

On Fri, 01 Jul 2011 14:45:00 +0200 David Brown <david@westcontrol.com> wrote:

> On 01/07/2011 13:29, Robin Hill wrote:
> > On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:
> >
> >> On 01/07/2011 10:50, Robin Hill wrote:
> >>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
> >>>
> >>>> What's the difference between a "resync" and a "recovery"?  Is it that a
> >>>> "resync" will read the whole stripe, check if it is valid, and if it is
> >>>> not it then generates the parity, while a "recovery" will always
> >>>> generate the parity?
> >>>>
> >>>    From the names, recovery would mean that it's reading from N-1 disks,
> >>> and recreating data/parity to rebuild the final disk (as when it
> >>> recovers from a drive failure), whereas resync will be reading from all
> >>> N disks and checking/recreating the parity (as when you're running a
> >>> repair on the array).
> >>>
> >>> The main reason I can see for doing a resync on RAID6 rather than a
> >>> recovery is if the data reconstruction from the Q parity is far slower
> >>> that the construction of the Q parity itself (I've no idea how the
> >>> mathematics works out for this).
> >>>
> >>
> >> Well, data reconstruction from Q parity /is/ more demanding than
> >> constructing the Q parity in the first place (the mathematics is the
> >> part that I know about).  That's why a two-disk degraded raid6 array is
> >> significantly slower (or, more accurately, significantly more
> >> cpu-intensive) than a one-disk degraded raid6 array.
> >>
> >> But that doesn't make a difference here - you are rebuilding one or two
> >> disks, so you have to use the data you've got whether you are doing a
> >> resync or a recovery.
> >>
> > Yes, but in a resync all the data you have available is the data
> > blocks, and you're reconstructing all the P and Q parity blocks. With a
> > recovery, the data you have available is some of the data blocks and some
> > of the P&  Q parity blocks, so for some stripes you'll be reconstructing
> > the parity and for others you'll be regenerating the data using the
> > parity (and for some you'll be doing one of each).
> >
> 
> If were that simple, then the resync (as used by RAID6 creates) would 
> not be so much slower the recovery used in a RAID5 build...
> 
> With a resync, you first check if the parity blocks are correct (by 
> generating them from the data blocks and comparing them to the read 
> parity blocks).  If they are not correct, you write out the parity 
> blocks.  With a recovery, you /know/ that one block is incorrect and 
> re-generate that (from the data blocks if it is a parity block, or using 
> the parities if it is a data block).
> 
> Consider the two cases raid5 and raid6 separately.
> 
> When you build your raid5 array, there is nothing worth keeping in the 
> data - the aim is simply to make the stripes consistent.  There are two 
> possible routes - consider the data blocks to be "correct" and do a 
> resync to make sure the parity blocks match, or consider the first n-1 
> disks to be "correct" and do a recovery to make sure the n'th disk 
> matches.  For recovery, that means reading n-1 blocks in a stripe, doing 
> a big xor, and writing out the remaining block (whether it is data or 
> parity).  For rsync, it means reading all n blocks, and checking the 
> xor.  If there is no match (which will be the norm when building an 
> array), then the correct parity is calculated and written out.  Thus an 
> rsync takes longer than a recovery, and a recovery is used.
> 
> When you build your raid6 array, you have the same two choices.  For an 
> rsync, you have to read all n blocks, calculate P and Q, compare them, 
> then (as there will be no match) write out P and Q.  In comparison to 
> the raid5 recovery, you've done a couple of unnecessary block reads and 
> compares, and the time-consuming Q calculation and write.  But if you 
> chose recovery, then you'd be assuming thve first n-2 blocks are correct 
> and re-calculating the last two blocks.  This avoids the extra reads and 
> compares, but if the two parity blocks are within the first n-2 blocks 
> read, then the recovery calculations will be much slower.  Hence an 
> rsync is faster for raid6.
> 
> I suppose the raid6 build could be optimised a little by skipping the 
> extra reads when you know in advance that they will not match.  But 
> either that is already being done, or it is considered a small issue 
> that is not worth changing (since it only has an effect during the 
> initial build).
> 

Almost everything you say is correct.
However I'm not convinced that a raid6 resync is faster than a raid6 recovery
(on devices where P and Q are not mostly correct).  I suspect it is just an
historical oversight that RAID6 doesn't force a recovery for the initial
create.

In any one would like to test it is easy to force a recovery by specifying
missing devices:

   mdadm -C /dev/md0 -l6 -n6 /dev/sd[abcd] missing missing -x2 /dev/sd[ef]

and easy to force a resync by using --force

   mdadm -C /dev/md0 -l5 -n5 /dev/sd[abcde] --force

It is only really a valid test if you know that the P and Q that resync will
read are not going to be correct most of the time.

NeilBrown

      reply	other threads:[~2011-07-01 13:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
2011-06-30 10:58 ` Robin Hill
2011-06-30 13:09   ` Karsten Römke
2011-06-30 11:30 ` John Robinson
2011-06-30 12:32   ` Phil Turmel
2011-06-30 12:52     ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
2011-06-30 13:34       ` Phil Turmel
2011-06-30 14:05         ` Karsten Römke
2011-06-30 14:21         ` Karsten Römke
2011-06-30 14:44           ` Phil Turmel
2011-07-02  8:34             ` Karsten Römke
2011-07-02  9:42               ` David Brown
2011-06-30 21:28           ` NeilBrown
2011-07-01  7:23             ` David Brown
2011-07-01  8:50               ` Robin Hill
2011-07-01 10:18                 ` David Brown
2011-07-01 11:29                   ` Robin Hill
2011-07-01 12:45                     ` David Brown
2011-07-01 13:02                       ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110701230219.51604317@notabene.brown \
    --to=neilb@suse.de \
    --cc=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).