From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: misunderstanding of spare and raid devices? - and one question more
Date: Fri, 01 Jul 2011 14:45:00 +0200 [thread overview]
Message-ID: <iukfjs$34e$1@dough.gmane.org> (raw)
In-Reply-To: <20110701112915.GB22611@cthulhu.home.robinhill.me.uk>
On 01/07/2011 13:29, Robin Hill wrote:
> On Fri Jul 01, 2011 at 12:18:22PM +0200, David Brown wrote:
>
>> On 01/07/2011 10:50, Robin Hill wrote:
>>> On Fri Jul 01, 2011 at 09:23:43 +0200, David Brown wrote:
>>>
>>>> What's the difference between a "resync" and a "recovery"? Is it that a
>>>> "resync" will read the whole stripe, check if it is valid, and if it is
>>>> not it then generates the parity, while a "recovery" will always
>>>> generate the parity?
>>>>
>>> From the names, recovery would mean that it's reading from N-1 disks,
>>> and recreating data/parity to rebuild the final disk (as when it
>>> recovers from a drive failure), whereas resync will be reading from all
>>> N disks and checking/recreating the parity (as when you're running a
>>> repair on the array).
>>>
>>> The main reason I can see for doing a resync on RAID6 rather than a
>>> recovery is if the data reconstruction from the Q parity is far slower
>>> that the construction of the Q parity itself (I've no idea how the
>>> mathematics works out for this).
>>>
>>
>> Well, data reconstruction from Q parity /is/ more demanding than
>> constructing the Q parity in the first place (the mathematics is the
>> part that I know about). That's why a two-disk degraded raid6 array is
>> significantly slower (or, more accurately, significantly more
>> cpu-intensive) than a one-disk degraded raid6 array.
>>
>> But that doesn't make a difference here - you are rebuilding one or two
>> disks, so you have to use the data you've got whether you are doing a
>> resync or a recovery.
>>
> Yes, but in a resync all the data you have available is the data
> blocks, and you're reconstructing all the P and Q parity blocks. With a
> recovery, the data you have available is some of the data blocks and some
> of the P& Q parity blocks, so for some stripes you'll be reconstructing
> the parity and for others you'll be regenerating the data using the
> parity (and for some you'll be doing one of each).
>
If were that simple, then the resync (as used by RAID6 creates) would
not be so much slower the recovery used in a RAID5 build...
With a resync, you first check if the parity blocks are correct (by
generating them from the data blocks and comparing them to the read
parity blocks). If they are not correct, you write out the parity
blocks. With a recovery, you /know/ that one block is incorrect and
re-generate that (from the data blocks if it is a parity block, or using
the parities if it is a data block).
Consider the two cases raid5 and raid6 separately.
When you build your raid5 array, there is nothing worth keeping in the
data - the aim is simply to make the stripes consistent. There are two
possible routes - consider the data blocks to be "correct" and do a
resync to make sure the parity blocks match, or consider the first n-1
disks to be "correct" and do a recovery to make sure the n'th disk
matches. For recovery, that means reading n-1 blocks in a stripe, doing
a big xor, and writing out the remaining block (whether it is data or
parity). For rsync, it means reading all n blocks, and checking the
xor. If there is no match (which will be the norm when building an
array), then the correct parity is calculated and written out. Thus an
rsync takes longer than a recovery, and a recovery is used.
When you build your raid6 array, you have the same two choices. For an
rsync, you have to read all n blocks, calculate P and Q, compare them,
then (as there will be no match) write out P and Q. In comparison to
the raid5 recovery, you've done a couple of unnecessary block reads and
compares, and the time-consuming Q calculation and write. But if you
chose recovery, then you'd be assuming the first n-2 blocks are correct
and re-calculating the last two blocks. This avoids the extra reads and
compares, but if the two parity blocks are within the first n-2 blocks
read, then the recovery calculations will be much slower. Hence an
rsync is faster for raid6.
I suppose the raid6 build could be optimised a little by skipping the
extra reads when you know in advance that they will not match. But
either that is already being done, or it is considered a small issue
that is not worth changing (since it only has an effect during the
initial build).
next prev parent reply other threads:[~2011-07-01 12:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-30 10:51 misunderstanding of spare and raid devices? Karsten Römke
2011-06-30 10:58 ` Robin Hill
2011-06-30 13:09 ` Karsten Römke
2011-06-30 11:30 ` John Robinson
2011-06-30 12:32 ` Phil Turmel
2011-06-30 12:52 ` misunderstanding of spare and raid devices? - and one question more Karsten Römke
2011-06-30 13:34 ` Phil Turmel
2011-06-30 14:05 ` Karsten Römke
2011-06-30 14:21 ` Karsten Römke
2011-06-30 14:44 ` Phil Turmel
2011-07-02 8:34 ` Karsten Römke
2011-07-02 9:42 ` David Brown
2011-06-30 21:28 ` NeilBrown
2011-07-01 7:23 ` David Brown
2011-07-01 8:50 ` Robin Hill
2011-07-01 10:18 ` David Brown
2011-07-01 11:29 ` Robin Hill
2011-07-01 12:45 ` David Brown [this message]
2011-07-01 13:02 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='iukfjs$34e$1@dough.gmane.org' \
--to=david@westcontrol.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).