Re: [PATCH] raid5: add support for rmw writes in raid6

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Brown <david.brown@hesbynett.no>
To: Kumar Sundararajan <kumar@fb.com>
Cc: NeilBrown <neilb@suse.de>, Dan Williams <djbw@fb.com>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH] raid5: add support for rmw writes in raid6
Date: Mon, 29 Apr 2013 21:28:24 +0200	[thread overview]
Message-ID: <517EC9D8.8000805@hesbynett.no> (raw)
In-Reply-To: <BA07AEDCFD3B5B4DAB0FF1D7B35E42DC5544BCFA@PRN-MBX01-4.TheFacebook.com>

On 29/04/13 18:11, Kumar Sundararajan wrote:
>
>
> On 4/29/13 12:10 AM, "David Brown" <david.brown@hesbynett.no> wrote:
>
>> On 29/04/13 03:29, NeilBrown wrote:
>>> On Fri, 26 Apr 2013 14:35:27 -0700 Dan Williams <djbw@fb.com> wrote:
>>>
>>>> On Sun, Apr 21, 2013 at 9:22 PM, NeilBrown <neilb@suse.de> wrote:
>>>>> On Wed, 17 Apr 2013 17:56:56 -0700 Dan Williams <djbw@fb.com> wrote:
>>>>>
>>>>>> From: Kumar Sundararajan <kumar@fb.com>
>>>>>>
>>>>>> Add support for rmw writes in raid6. This improves small write
>>>>>> performance for most workloads.
>>>>>> Since there may be some configurations and workloads where rcw
>>>>>> always outperforms rmw,
>>>>>> this feature is disabled by default. It can be enabled via sysfs.
>>>>>> For example,
>>>>>>
>>>>>>        #echo 1 > /sys/block/md0/md/enable_rmw
>>>>>>
>>>>>> Signed-off-by: Kumar Sundararajan <kumar@fb.com>
>>>>>> Signed-off-by: Dan Williams <djbw@fb.com>
>>>>>> ---
>>>>>> Hi Neil,
>>>>>>
>>>>>> We decided to leave the enable in for the few cases where forced-rcw
>>>>>> outperformed rmw and there may be other cases out there.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> - More commentary would help.  The text at the top should explain
>>>>> enough so
>>>>>    when I read the code I am just verifying the text at the top, not
>>>>> trying to
>>>>>    figure out how it is supposed to work.
>>>>
>>>> Ok, along the lines of:
>>>>
>>>> "raid5 has supported sub-stripe writes by computing:
>>>>
>>>> P_new = P_old + D0_old + D0_new
>>>
>>> The '0' looks out of place.  All 'D's are treated the same, so I would
>>> write
>>> it
>>>     P_new = P_old + D_old + D_new
>>> though that is a small point.
>>>
>>>>
>>>> For raid6 we add the following:
>>>>
>>>> P_new = P_old + D0_old + D0_new
>>
>> Correct (though I agree with Neil that "0" should be replaced by "i").
>>
>>>>
>>>> Q_sub = Q_old + syndrome(D0_old)
>>
>> Wrong - using this terminology, Q_sub = syndrome(D0_old)
>>
>>>> Q_sub = Q_old + g^0*D0_old
>>>> Q_new = Q_old + Q_sub + g^0*D0_new
>>>
>>> But here the different Ds are different, so I would be using "Di" or
>>> similar.
>>>
>>>    P_new = P_old + Di_old + Di_new
>>
>> Correct.
>>
>>>
>>> and I assume the two 'Q_sub' lines are meant to mean the same thing, so
>>> let's
>>> take the second:
>>>
>>>    Q_sub = Q_old + g^i * Di_old
>>>    Q_new = Q_old + Q_sub + g^i * Di_new
>>>
>>
>> You've still got one "Q_old" too many, which means your algebra is wrong
>> (Q_new reduces here to g^i * Di_new, which is clearly wrong).  But
>> you've jumped to almost the right conclusion on the next line, proving
>> that two wrongs sometimes do make a right!
>>
>>> which looks odd as when that expands out, we add Q_old to Q_old and as
>>> '+'
>>> is 'xor', it disappears.  Maybe you mean:
>>>
>>>    Q_new = Q_sub + g^i * Di_new
>>> ??
>>
>> Q_sub = g^i * Di_old
>> Q_new = Q_old - Q_sub + g^i * Di_new
>>       = Q_old + g^i * Di_old + g^i * Di_new   (since "-" = "+")
>>       = Q_old + g^i * (Di_old + Di_new)
>>
>> I can't say for sure (I know the maths, not the code), but I would
>> expect this final version do be the most efficient for the calculations.
>> By xor'ing Di_old and Di_new first, then multiplying by g^i, you'll
>> half the number of costly g^i multiplications.
>>
>>
>
> Yes, I did consider this. But this scheme is harder to implement with how
> the rest of the code works,
> especially with writes covering more than one Di.
>
>

For each data block you are changing, you will need to remove the old 
g^i * Di_old then add in the new g^i * Di_new, so you can still use this 
simplification to reduce the number of multiplies.  If you want to 
change blocks "i" and "j", you thus do:

Q_new = Q_old + g^i * (Di_old + Di_new) + g^j * (Dj_old + Dj_new)

But as I say, I only know the maths - not the code.

mvh.,

David

next prev parent reply	other threads:[~2013-04-29 19:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-18  0:56 [PATCH] raid5: add support for rmw writes in raid6 Dan Williams
2013-04-18 18:40 ` Dan Williams
2013-04-22  4:22 ` NeilBrown
2013-04-26 21:35   ` Dan Williams
2013-04-29  1:29     ` NeilBrown
2013-04-29  7:10       ` David Brown
2013-04-29 16:11         ` Kumar Sundararajan
2013-04-29 19:28           ` David Brown [this message]
2013-04-30  0:05             ` Dan Williams
2013-04-30  6:48               ` David Brown
2013-04-30 16:01                 ` Kumar Sundararajan
2013-04-30 16:10                   ` Kumar Sundararajan
2013-04-30 18:39                     ` David Brown
2013-04-29 17:54       ` Kumar Sundararajan
2013-04-30 21:32       ` Dan Williams
2013-05-01 13:57         ` David Brown
2013-05-08 17:42       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517EC9D8.8000805@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=djbw@fb.com \
    --cc=kumar@fb.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.