From: David Brown <david.brown@hesbynett.no>
To: linux-raid@vger.kernel.org
Subject: Re: [md PATCH 17/34] md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6
Date: Tue, 26 Jul 2011 11:40:42 +0200 [thread overview]
Message-ID: <j0m22q$rur$1@dough.gmane.org> (raw)
In-Reply-To: <20110726115238.63fec583@notabene.brown>
On 26/07/11 03:52, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:10:33 +0900 Namhyung Kim<namhyung@gmail.com> wrote:
>
>> NeilBrown<neilb@suse.de> writes:
>>
>>> RAID6 is only allowed to choose 'reconstruct-write' while RAID5 is
>>> also allow 'read-modify-write'
>>> Apart from this difference, handle_stripe_dirtying[56] are nearly
>>> identical. So resolve these differences and create just one function.
>>>
>>> Signed-off-by: NeilBrown<neilb@suse.de>
>>
>> Reviewed-by: Namhyung Kim<namhyung@gmail.com>
>>
>> BTW, here is a question:
>> Why RAID6 doesn't allow the read-modify-write? I don't think it is
>> not possible, so what prevents doing that? performance? complexity?
>> or it's not possible? Why? :)
>
>
> The primary reason in my mind is that when Peter Anvin wrote this code he
> didn't implement read-modify-write and I have never seen a need to consider
> changing that.
>
> You would need a largish array - at least 7 devices - before RWM could
> possibly be a win, but some people do have arrays larger than that.
>
> The computation to "subtract" from the Q-syndrome might be a bit complex - I
> don't know.
>
The "subtract from Q" isn't difficult in theory, but it involves more
cpu time than the "subtract from P". It should be an overall win in the
same was as for RAID5. However, the code would be more complex if you
allow for RMW on more than one block.
With raid5, if you have a set of data blocks D0, D1, ..., Dn and a
parity block P, and you want to change Di_old to Di_new, you can do this:
Read Di_old and P_old
Calculate P_new = P_old xor Di_old xor Di_new
Write Di_new and P_new
You can easily extend this do changing several data blocks without
reading in the whole old stripe. I don't know if the Linux raid5
implementation does that or not (I understand the theory here, but I am
far from an expert on the implementation). There is no doubt a balance
between the speed gains of multiple-block RMW and the code complexity.
As long as you are changing less than half the blocks in the stripe, the
cpu time will be less than it would for whole stripe write. For RMW
writes that are more than half a stripe, the "best" choice depends on
the balance between cpu time and disk bandwidth - a balance that is very
different on today's servers compared to those when Linux raid5 was
first written.
With raid6, the procedure is similar:
Read Di_old, P_old and Q_old.
Calculate P_new = P_old xor Di_old xor Di_new
Calculate Q_new = Q_old xor (2^i . Di_old) xor (2^i . Di_new)
= Q_old xor (2^i . (Di_old xor Di_new))
Write Di_new, P_new and Q_new
The difference is simply that (Di_old xor Di_new) needs to be multiplied
(over the GF field - not normal multiplication) by 2^i. You would do
this by repeatedly applying the multiply-by-two function that already
exists in the raid6 implementation.
I don't know whether or not it is worth using the common subexpression
(Di_old xor Di_new), which turns up twice.
Multi-block raid6 RMW is similar, but you have to keep track of how many
times you should multiply by 2. For a general case, the code would
probably be too messy - it's easier to simple handle the whole stripe.
But the case of consecutive blocks is easier, and likely to be far more
common in practice. If you want to change blocks Di through Dj, you can do:
Read Di_old, D(i+1)_old, ..., Dj_old, P_old and Q_old.
Calculate P_new = P_old xor Di_old xor Di_new
xor D(i+1)_old xor D(i+1)_new
...
xor Dj_old xor Dj_new
Calculate Q_new = Q_old xor (2^i . (Di_old xor Di_new))
xor (2^(i+1) . (D(i+1)_old xor D(i+1)_new))
...
xor (2^j . (Dj_old xor Dj_new))
= Q_old xor (2^i .
(Di_old xor Di_new)
xor (2^1) . (D(i+1)_old xor D(i+1)_new))
xor (2^2) . (D(i+2)_old xor D(i+2)_new))
...
xor (2^(j-i) . (Dj_old xor Dj_new))
)
Write Di_new, D(i+1)_new, ..., Dj_new, P_new and Q_new
The algorithm above looks a little messy (ASCII emails are not the best
medium for mathematics), but it's not hard to see the pattern, and the
loops needed. It should also be possible to merge such routines with
the main raid6 parity calculation functions.
mvh.,
David
> Peter: maybe you have a reason that I haven't mentioned?
>
> Thanks,
> NeilBrown
>
next prev parent reply other threads:[~2011-07-26 9:40 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-21 2:32 [md PATCH 00/34] md patches for 3.1 - part 1 NeilBrown
2011-07-21 2:32 ` [md PATCH 03/34] md/raid10: share pages between read and write bio's during recovery NeilBrown
2011-07-21 2:32 ` [md PATCH 02/34] md/raid10: factor out common bio handling code NeilBrown
2011-07-21 2:32 ` [md PATCH 01/34] md/raid10: get rid of duplicated conditional expression NeilBrown
2011-07-21 2:32 ` [md PATCH 10/34] md/raid5: unify stripe_head_state and r6_state NeilBrown
2011-07-22 4:49 ` Namhyung Kim
2011-07-22 5:15 ` NeilBrown
2011-07-22 5:37 ` NeilBrown
2011-07-22 5:53 ` Namhyung Kim
2011-07-26 6:44 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 09/34] md/raid5: move common code into handle_stripe NeilBrown
2011-07-22 4:30 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 05/34] md/raid5: get rid of duplicated call to bio_data_dir() NeilBrown
2011-07-21 2:32 ` [md PATCH 08/34] md/raid5: replace sh->lock with an 'active' flag NeilBrown
2011-07-22 4:27 ` Namhyung Kim
2011-07-22 4:49 ` NeilBrown
2011-07-22 5:03 ` Namhyung Kim
2011-08-03 22:47 ` Dan Williams
2011-08-03 23:35 ` NeilBrown
2011-08-03 23:45 ` Williams, Dan J
2011-08-04 0:18 ` NeilBrown
2011-07-21 2:32 ` [md PATCH 07/34] md/raid5: Protect some more code with ->device_lock NeilBrown
2011-07-22 3:54 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 04/34] md/raid5: use kmem_cache_zalloc() NeilBrown
2011-07-21 2:32 ` [md PATCH 11/34] md/raid5: add some more fields to stripe_head_state NeilBrown
2011-07-22 5:31 ` Namhyung Kim
2011-07-26 1:35 ` NeilBrown
2011-07-21 2:32 ` [md PATCH 06/34] md/raid5: Remove use of sh->lock in sync_request NeilBrown
2011-07-22 3:39 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 19/34] md/raid5: move some more common code into handle_stripe NeilBrown
2011-07-22 9:29 ` Namhyung Kim
2011-07-26 1:59 ` NeilBrown
2011-07-21 2:32 ` [md PATCH 13/34] md/raid5: Move code for finishing a reconstruction " NeilBrown
2011-07-22 7:09 ` Namhyung Kim
2011-07-26 1:44 ` NeilBrown
2011-07-21 2:32 ` [md PATCH 12/34] md/raid5: move stripe_head_state and more code " NeilBrown
2011-07-22 5:41 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 17/34] md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6 NeilBrown
2011-07-22 9:10 ` Namhyung Kim
2011-07-26 1:52 ` NeilBrown
2011-07-26 2:41 ` H. Peter Anvin
2011-07-26 9:40 ` David Brown [this message]
2011-07-26 13:23 ` Namhyung Kim
2011-07-26 15:01 ` David Brown
2011-07-21 2:32 ` [md PATCH 18/34] md/raid5: move more common code into handle_stripe NeilBrown
2011-07-22 9:20 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 15/34] md/raid5: rearrange a test in fetch_block6 NeilBrown
2011-07-22 7:37 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 14/34] md/raid5: move more code into common handle_stripe NeilBrown
2011-07-22 7:32 ` Namhyung Kim
2011-07-26 1:48 ` NeilBrown
2011-07-21 2:32 ` [md PATCH 16/34] md/raid5: unite fetch_block5 and fetch_block6 NeilBrown
2011-07-22 8:24 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 24/34] md: remove ro check in md_check_recovery() NeilBrown
2011-07-21 2:32 ` [md PATCH 22/34] md/raid: use printk_ratelimited instead of printk_ratelimit NeilBrown
2011-07-21 2:32 ` [md PATCH 20/34] md/raid5: finalise new merged handle_stripe NeilBrown
2011-07-22 9:36 ` Namhyung Kim
2011-07-26 2:02 ` NeilBrown
2011-07-26 4:50 ` Namhyung Kim
2011-07-21 2:32 ` [md PATCH 25/34] md: change managed of recovery_disabled NeilBrown
2011-07-21 2:32 ` [md PATCH 26/34] md/raid10: Make use of new recovery_disabled handling NeilBrown
2011-07-21 2:32 ` [md PATCH 21/34] md: use proper little-endian bitops NeilBrown
2011-07-21 2:32 ` [md PATCH 27/34] md/raid10: Improve decision on whether to fail a device with a read error NeilBrown
2011-07-21 2:32 ` [md PATCH 23/34] md: introduce link/unlink_rdev() helpers NeilBrown
2011-07-21 2:32 ` [md PATCH 31/34] md/raid10: move rdev->corrected_errors counting NeilBrown
2011-07-21 2:32 ` [md PATCH 34/34] MD bitmap: Revert DM dirty log hooks NeilBrown
2011-07-21 2:32 ` [md PATCH 33/34] MD: raid1 s/sysfs_notify_dirent/sysfs_notify_dirent_safe NeilBrown
2011-07-21 2:32 ` [md PATCH 32/34] md/raid5: Avoid BUG caused by multiple failures NeilBrown
2011-07-21 2:32 ` [md PATCH 28/34] md: get rid of unnecessary casts on page_address() NeilBrown
2011-07-21 2:32 ` [md PATCH 29/34] md/raid1: move rdev->corrected_errors counting NeilBrown
2011-07-21 2:32 ` [md PATCH 30/34] md/raid5: " NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='j0m22q$rur$1@dough.gmane.org' \
--to=david.brown@hesbynett.no \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).