From: NeilBrown <neilb@suse.de>
To: David Brown <david@westcontrol.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: misunderstanding of spare and raid devices? - and one question more
Date: Fri, 1 Jul 2011 23:02:19 +1000 [thread overview]
Message-ID: <20110701230219.51604317@notabene.brown> (raw)
In-Reply-To: <iukfjs$34e$1@dough.gmane.org>
On Fri, 01 Jul 2011 14:45:00 +0200 David Brown <david@westcontrol.com> wrote:
> On 01/07/2011 13:29, Robin Hill wrote:
> > On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:
> >
> >> On 01/07/2011 10:50, Robin Hill wrote:
> >>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
> >>>
> >>>> What's the difference between a "resync" and a "recovery"? Is it that a
> >>>> "resync" will read the whole stripe, check if it is valid, and if it is
> >>>> not it then generates the parity, while a "recovery" will always
> >>>> generate the parity?
> >>>>
> >>> From the names, recovery would mean that it's reading from N-1 disks,
> >>> and recreating data/parity to rebuild the final disk (as when it
> >>> recovers from a drive failure), whereas resync will be reading from all
> >>> N disks and checking/recreating the parity (as when you're running a
> >>> repair on the array).
> >>>
> >>> The main reason I can see for doing a resync on RAID6 rather than a
> >>> recovery is if the data reconstruction from the Q parity is far slower
> >>> that the construction of the Q parity itself (I've no idea how the
> >>> mathematics works out for this).
> >>>
> >>
> >> Well, data reconstruction from Q parity /is/ more demanding than
> >> constructing the Q parity in the first place (the mathematics is the
> >> part that I know about). That's why a two-disk degraded raid6 array is
> >> significantly slower (or, more accurately, significantly more
> >> cpu-intensive) than a one-disk degraded raid6 array.
> >>
> >> But that doesn't make a difference here - you are rebuilding one or two
> >> disks, so you have to use the data you've got whether you are doing a
> >> resync or a recovery.
> >>
> > Yes, but in a resync all the data you have available is the data
> > blocks, and you're reconstructing all the P and Q parity blocks. With a
> > recovery, the data you have available is some of the data blocks and some
> > of the P& Q parity blocks, so for some stripes you'll be reconstructing
> > the parity and for others you'll be regenerating the data using the
> > parity (and for some you'll be doing one of each).
> >
>
> If were that simple, then the resync (as used by RAID6 creates) would
> not be so much slower the recovery used in a RAID5 build...
>
> With a resync, you first check if the parity blocks are correct (by
> generating them from the data blocks and comparing them to the read
> parity blocks). If they are not correct, you write out the parity
> blocks. With a recovery, you /know/ that one block is incorrect and
> re-generate that (from the data blocks if it is a parity block, or using
> the parities if it is a data block).
>
> Consider the two cases raid5 and raid6 separately.
>
> When you build your raid5 array, there is nothing worth keeping in the
> data - the aim is simply to make the stripes consistent. There are two
> possible routes - consider the data blocks to be "correct" and do a
> resync to make sure the parity blocks match, or consider the first n-1
> disks to be "correct" and do a recovery to make sure the n'th disk
> matches. For recovery, that means reading n-1 blocks in a stripe, doing
> a big xor, and writing out the remaining block (whether it is data or
> parity). For rsync, it means reading all n blocks, and checking the
> xor. If there is no match (which will be the norm when building an
> array), then the correct parity is calculated and written out. Thus an
> rsync takes longer than a recovery, and a recovery is used.
>
> When you build your raid6 array, you have the same two choices. For an
> rsync, you have to read all n blocks, calculate P and Q, compare them,
> then (as there will be no match) write out P and Q. In comparison to
> the raid5 recovery, you've done a couple of unnecessary block reads and
> compares, and the time-consuming Q calculation and write. But if you
> chose recovery, then you'd be assuming thve first n-2 blocks are correct
> and re-calculating the last two blocks. This avoids the extra reads and
> compares, but if the two parity blocks are within the first n-2 blocks
> read, then the recovery calculations will be much slower. Hence an
> rsync is faster for raid6.
>
> I suppose the raid6 build could be optimised a little by skipping the
> extra reads when you know in advance that they will not match. But
> either that is already being done, or it is considered a small issue
> that is not worth changing (since it only has an effect during the
> initial build).
>
Almost everything you say is correct.
However I'm not convinced that a raid6 resync is faster than a raid6 recovery
(on devices where P and Q are not mostly correct). I suspect it is just an
historical oversight that RAID6 doesn't force a recovery for the initial
create.
In any one would like to test it is easy to force a recovery by specifying
missing devices:
mdadm -C /dev/md0 -l6 -n6 /dev/sd[abcd] missing missing -x2 /dev/sd[ef]
and easy to force a resync by using --force
mdadm -C /dev/md0 -l5 -n5 /dev/sd[abcde] --force
It is only really a valid test if you know that the P and Q that resync will
read are not going to be correct most of the time.
NeilBrown
prev parent reply other threads:[~2011-07-01 13:02 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
2011-06-30 10:58 ` Robin Hill
2011-06-30 13:09 ` Karsten Römke
2011-06-30 11:30 ` John Robinson
2011-06-30 12:32 ` Phil Turmel
2011-06-30 12:52 ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
2011-06-30 13:34 ` Phil Turmel
2011-06-30 14:05 ` Karsten Römke
2011-06-30 14:21 ` Karsten Römke
2011-06-30 14:44 ` Phil Turmel
2011-07-02 8:34 ` Karsten Römke
2011-07-02 9:42 ` David Brown
2011-06-30 21:28 ` NeilBrown
2011-07-01 7:23 ` David Brown
2011-07-01 8:50 ` Robin Hill
2011-07-01 10:18 ` David Brown
2011-07-01 11:29 ` Robin Hill
2011-07-01 12:45 ` David Brown
2011-07-01 13:02 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110701230219.51604317@notabene.brown \
--to=neilb@suse.de \
--cc=david@westcontrol.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).