All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: NeilBrown <neilb@suse.de>
Cc: stan@hardwarefreak.com, Michael Tokarev <mjt@tls.msk.ru>,
	Miquel van Smoorenburg <mikevs@xs4all.net>,
	Linux RAID <linux-raid@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: O_DIRECT to md raid 6 is slow
Date: Mon, 20 Aug 2012 09:47:39 +0200	[thread overview]
Message-ID: <5031EB9B.5010400@hesbynett.no> (raw)
In-Reply-To: <20120820100134.22b2b056@notabene.brown>

On 20/08/2012 02:01, NeilBrown wrote:
> On Sun, 19 Aug 2012 18:34:28 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
>
>
> Since we are trying to set the record straight....
>
>> md/RAID6 must read all devices in a RMW cycle.
>
> md/RAID6 must read all data devices (i.e. not parity devices) which it is not
> going to write to, in an RWM cycle (which the code actually calls RCW -
> reconstruct-write).
>
>>
>> md/RAID5 takes a shortcut for single block writes, and must only read
>> one drive for the RMW cycle.
>
> md/RAID5 uses an alternate mechanism when the number of data blocks that need
> to be written is less than half the number of data blocks in a stripe.  In
> this alternate mechansim (which the code calls RMW - read-modify-write),
> md/RAID5 reads all the blocks that it is about to write to, plus the parity
> block.  It then computes the new parity and writes it out along with the new
> data.
>

I've learned something here too - I thought this mechanism was only used 
for a single block write.  Thanks for the correction, Neil.

If you (or anyone else) are ever interested in implementing the same 
thing in raid6, the maths is not actually too bad (now that I've thought 
about it).  (I understand the theory here, but I'm afraid I don't have 
the experience with kernel programming to do the implementation.)

To change a few data blocks, you need to read in the old data blocks 
(Da, Db, etc.) and the old parities (P, Q).

Calculate the xor differences Xa = Da + D'a, Xb = Db + D'b, etc.

The new P parity is P' = P + Xa + Xb +...

The new Q parity is Q' = P + (g^a).Xa + (g^b).Xb + ...
The power series there is just the normal raid6 Q-parity calculation 
with most entries set to 0, and the Xa, Xb, etc. in the appropriate spots.

If the raid6 Q-parity function already has short-cuts for handling zero 
entries (I haven't looked, but the mechanism might be in place to 
slightly speed up dual-failure recovery), then all the blocks are in place.



  parent reply	other threads:[~2012-08-20  7:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-15  0:49 O_DIRECT to md raid 6 is slow Andy Lutomirski
2012-08-15  1:07 ` kedacomkernel
2012-08-15  1:07   ` kedacomkernel
2012-08-15  1:12   ` Andy Lutomirski
2012-08-15  1:23     ` kedacomkernel
2012-08-15  1:23       ` kedacomkernel
2012-08-15 11:50 ` John Robinson
2012-08-15 17:57   ` Andy Lutomirski
2012-08-15 22:00     ` Stan Hoeppner
2012-08-15 22:10       ` Andy Lutomirski
2012-08-15 23:50         ` Stan Hoeppner
2012-08-16  1:08           ` Andy Lutomirski
2012-08-16  6:41           ` Roman Mamedov
2012-08-15 23:07       ` Miquel van Smoorenburg
2012-08-16 11:05         ` Stan Hoeppner
2012-08-16 21:50           ` Miquel van Smoorenburg
2012-08-17  7:31             ` Stan Hoeppner
2012-08-17 11:16               ` Miquel van Smoorenburg
2012-08-18  5:09                 ` Stan Hoeppner
2012-08-18 10:08                   ` Michael Tokarev
2012-08-19  3:17                     ` Stan Hoeppner
2012-08-19 14:01                       ` David Brown
2012-08-19 23:34                         ` Stan Hoeppner
2012-08-20  0:01                           ` NeilBrown
2012-08-20  4:44                             ` Stan Hoeppner
2012-08-20  5:19                               ` Dave Chinner
2012-08-20  5:42                                 ` Stan Hoeppner
2012-08-20  7:47                             ` David Brown [this message]
2012-08-21 14:51                           ` Miquel van Smoorenburg
2012-08-22  3:59                             ` Stan Hoeppner
2012-08-19 17:02                       ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5031EB9B.5010400@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mikevs@xs4all.net \
    --cc=mjt@tls.msk.ru \
    --cc=neilb@suse.de \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.