linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@gmail.com>
To: David Brown <david.brown@hesbynett.no>
Cc: linux-raid@vger.kernel.org
Subject: Re: [md PATCH 17/34] md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6
Date: Tue, 26 Jul 2011 22:23:27 +0900	[thread overview]
Message-ID: <8739htfa5s.fsf@gmail.com> (raw)
In-Reply-To: <j0m22q$rur$1@dough.gmane.org> (David Brown's message of "Tue, 26 Jul 2011 11:40:42 +0200")

David Brown <david.brown@hesbynett.no> writes:

> On 26/07/11 03:52, NeilBrown wrote:
>> On Fri, 22 Jul 2011 18:10:33 +0900 Namhyung Kim<namhyung@gmail.com>  wrote:
>>
>>> NeilBrown<neilb@suse.de>  writes:
>>>
>>>> RAID6 is only allowed to choose 'reconstruct-write' while RAID5 is
>>>> also allow 'read-modify-write'
>>>> Apart from this difference, handle_stripe_dirtying[56] are nearly
>>>> identical.  So resolve these differences and create just one function.
>>>>
>>>> Signed-off-by: NeilBrown<neilb@suse.de>
>>>
>>> Reviewed-by: Namhyung Kim<namhyung@gmail.com>
>>>
>>> BTW, here is a question:
>>>    Why RAID6 doesn't allow the read-modify-write? I don't think it is
>>>    not possible, so what prevents doing that? performance? complexity?
>>>    or it's not possible? Why? :)
>>
>>
>> The primary reason in my mind is that when Peter Anvin wrote this code he
>> didn't implement read-modify-write and I have never seen a need to consider
>> changing that.
>>
>> You would need a largish array - at least 7 devices - before RWM could
>> possibly be a win, but some people do have arrays larger than that.
>>
>> The computation to "subtract" from the Q-syndrome might be a bit complex - I
>> don't know.
>>
>
> The "subtract from Q" isn't difficult in theory, but it involves more
> cpu time than the "subtract from P".  It should be an overall win in
> the same was as for RAID5.  However, the code would be more complex if
> you allow for RMW on more than one block.
>
> With raid5, if you have a set of data blocks D0, D1, ..., Dn and a
> parity block P, and you want to change Di_old to Di_new, you can do
> this:
>
> Read Di_old and P_old
> Calculate P_new = P_old xor Di_old xor Di_new
> Write Di_new and P_new
>
> You can easily extend this do changing several data blocks without
> reading in the whole old stripe.  I don't know if the Linux raid5
> implementation does that or not (I understand the theory here, but I
> am far from an expert on the implementation).  There is no doubt a
> balance between the speed gains of multiple-block RMW and the code
> complexity. As long as you are changing less than half the blocks in
> the stripe, the cpu time will be less than it would for whole stripe
> write.  For RMW writes that are more than half a stripe, the "best"
> choice depends on the balance between cpu time and disk bandwidth - a
> balance that is very different on today's servers compared to those
> when Linux raid5 was first written.
>
>
>
> With raid6, the procedure is similar:
>
> Read Di_old, P_old and Q_old.
> Calculate P_new = P_old xor Di_old xor Di_new
> Calculate Q_new = Q_old xor (2^i . Di_old) xor (2^i . Di_new)
>                 = Q_old xor (2^i . (Di_old xor Di_new))
> Write Di_new, P_new and Q_new
>
> The difference is simply that (Di_old xor Di_new) needs to be
> multiplied (over the GF field - not normal multiplication) by 2^i.
> You would do this by repeatedly applying the multiply-by-two function
> that already exists in the raid6 implementation.
>
> I don't know whether or not it is worth using the common subexpression
> (Di_old xor Di_new), which turns up twice.
>
>
>
> Multi-block raid6 RMW is similar, but you have to keep track of how
> many times you should multiply by 2.  For a general case, the code
> would probably be too messy - it's easier to simple handle the whole
> stripe. But the case of consecutive blocks is easier, and likely to be
> far more common in practice.  If you want to change blocks Di through
> Dj, you can do:
>
> Read Di_old, D(i+1)_old, ..., Dj_old, P_old and Q_old.
> Calculate P_new = P_old xor Di_old xor Di_new
>                         xor D(i+1)_old xor D(i+1)_new
>                         ...
>                         xor Dj_old xor Dj_new
>
>
> Calculate Q_new = Q_old xor (2^i . (Di_old xor Di_new))
>                         xor (2^(i+1) . (D(i+1)_old xor D(i+1)_new))
>                         ...
>                         xor (2^j . (Dj_old xor Dj_new))
>                 = Q_old xor (2^i .
>                            (Di_old xor Di_new)
>                            xor (2^1) . (D(i+1)_old xor D(i+1)_new))
>                            xor (2^2) . (D(i+2)_old xor D(i+2)_new))
>                            ...
>                            xor (2^(j-i) . (Dj_old xor Dj_new))
>                          )
>
> Write Di_new, D(i+1)_new, ..., Dj_new, P_new and Q_new
>
>
> The algorithm above looks a little messy (ASCII emails are not the
> best medium for mathematics), but it's not hard to see the pattern,
> and the loops needed.  It should also be possible to merge such
> routines with the main raid6 parity calculation functions.
>
>
> mvh.,
>
> David
>

Thanks a lot for your detailed explanation, David!

I think it'd be good if we implement RMW for small (eg. single disk)
write on RAID6 too.


-- 
Regards,
Namhyung Kim

  reply	other threads:[~2011-07-26 13:23 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-21  2:32 [md PATCH 00/34] md patches for 3.1 - part 1 NeilBrown
2011-07-21  2:32 ` [md PATCH 02/34] md/raid10: factor out common bio handling code NeilBrown
2011-07-21  2:32 ` [md PATCH 01/34] md/raid10: get rid of duplicated conditional expression NeilBrown
2011-07-21  2:32 ` [md PATCH 03/34] md/raid10: share pages between read and write bio's during recovery NeilBrown
2011-07-21  2:32 ` [md PATCH 05/34] md/raid5: get rid of duplicated call to bio_data_dir() NeilBrown
2011-07-21  2:32 ` [md PATCH 09/34] md/raid5: move common code into handle_stripe NeilBrown
2011-07-22  4:30   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 10/34] md/raid5: unify stripe_head_state and r6_state NeilBrown
2011-07-22  4:49   ` Namhyung Kim
2011-07-22  5:15     ` NeilBrown
2011-07-22  5:37       ` NeilBrown
2011-07-22  5:53         ` Namhyung Kim
2011-07-26  6:44           ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 04/34] md/raid5: use kmem_cache_zalloc() NeilBrown
2011-07-21  2:32 ` [md PATCH 07/34] md/raid5: Protect some more code with ->device_lock NeilBrown
2011-07-22  3:54   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 08/34] md/raid5: replace sh->lock with an 'active' flag NeilBrown
2011-07-22  4:27   ` Namhyung Kim
2011-07-22  4:49     ` NeilBrown
2011-07-22  5:03       ` Namhyung Kim
2011-08-03 22:47   ` Dan Williams
2011-08-03 23:35     ` NeilBrown
2011-08-03 23:45       ` Williams, Dan J
2011-08-04  0:18         ` NeilBrown
2011-07-21  2:32 ` [md PATCH 06/34] md/raid5: Remove use of sh->lock in sync_request NeilBrown
2011-07-22  3:39   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 11/34] md/raid5: add some more fields to stripe_head_state NeilBrown
2011-07-22  5:31   ` Namhyung Kim
2011-07-26  1:35     ` NeilBrown
2011-07-21  2:32 ` [md PATCH 17/34] md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6 NeilBrown
2011-07-22  9:10   ` Namhyung Kim
2011-07-26  1:52     ` NeilBrown
2011-07-26  2:41       ` H. Peter Anvin
2011-07-26  9:40       ` David Brown
2011-07-26 13:23         ` Namhyung Kim [this message]
2011-07-26 15:01           ` David Brown
2011-07-21  2:32 ` [md PATCH 18/34] md/raid5: move more common code into handle_stripe NeilBrown
2011-07-22  9:20   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 12/34] md/raid5: move stripe_head_state and more " NeilBrown
2011-07-22  5:41   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 13/34] md/raid5: Move code for finishing a reconstruction " NeilBrown
2011-07-22  7:09   ` Namhyung Kim
2011-07-26  1:44     ` NeilBrown
2011-07-21  2:32 ` [md PATCH 19/34] md/raid5: move some more common code " NeilBrown
2011-07-22  9:29   ` Namhyung Kim
2011-07-26  1:59     ` NeilBrown
2011-07-21  2:32 ` [md PATCH 16/34] md/raid5: unite fetch_block5 and fetch_block6 NeilBrown
2011-07-22  8:24   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 15/34] md/raid5: rearrange a test in fetch_block6 NeilBrown
2011-07-22  7:37   ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 14/34] md/raid5: move more code into common handle_stripe NeilBrown
2011-07-22  7:32   ` Namhyung Kim
2011-07-26  1:48     ` NeilBrown
2011-07-21  2:32 ` [md PATCH 25/34] md: change managed of recovery_disabled NeilBrown
2011-07-21  2:32 ` [md PATCH 20/34] md/raid5: finalise new merged handle_stripe NeilBrown
2011-07-22  9:36   ` Namhyung Kim
2011-07-26  2:02     ` NeilBrown
2011-07-26  4:50       ` Namhyung Kim
2011-07-21  2:32 ` [md PATCH 22/34] md/raid: use printk_ratelimited instead of printk_ratelimit NeilBrown
2011-07-21  2:32 ` [md PATCH 24/34] md: remove ro check in md_check_recovery() NeilBrown
2011-07-21  2:32 ` [md PATCH 26/34] md/raid10: Make use of new recovery_disabled handling NeilBrown
2011-07-21  2:32 ` [md PATCH 23/34] md: introduce link/unlink_rdev() helpers NeilBrown
2011-07-21  2:32 ` [md PATCH 21/34] md: use proper little-endian bitops NeilBrown
2011-07-21  2:32 ` [md PATCH 27/34] md/raid10: Improve decision on whether to fail a device with a read error NeilBrown
2011-07-21  2:32 ` [md PATCH 33/34] MD: raid1 s/sysfs_notify_dirent/sysfs_notify_dirent_safe NeilBrown
2011-07-21  2:32 ` [md PATCH 32/34] md/raid5: Avoid BUG caused by multiple failures NeilBrown
2011-07-21  2:32 ` [md PATCH 34/34] MD bitmap: Revert DM dirty log hooks NeilBrown
2011-07-21  2:32 ` [md PATCH 31/34] md/raid10: move rdev->corrected_errors counting NeilBrown
2011-07-21  2:32 ` [md PATCH 30/34] md/raid5: " NeilBrown
2011-07-21  2:32 ` [md PATCH 29/34] md/raid1: " NeilBrown
2011-07-21  2:32 ` [md PATCH 28/34] md: get rid of unnecessary casts on page_address() NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8739htfa5s.fsf@gmail.com \
    --to=namhyung@gmail.com \
    --cc=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).