From: Dave Chinner <david@fromorbit.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: NeilBrown <neilb@suse.de>, David Brown <david.brown@hesbynett.no>,
Michael Tokarev <mjt@tls.msk.ru>,
Miquel van Smoorenburg <mikevs@xs4all.net>,
Linux RAID <linux-raid@vger.kernel.org>,
LKML@dastard
Subject: Re: O_DIRECT to md raid 6 is slow
Date: Mon, 20 Aug 2012 15:19:51 +1000 [thread overview]
Message-ID: <20120820051951.GC19235@dastard> (raw)
In-Reply-To: <5031C0A9.60803@hardwarefreak.com>
On Sun, Aug 19, 2012 at 11:44:25PM -0500, Stan Hoeppner wrote:
> I'm copying Dave C. as he apparently misunderstood the behavior of
> md/RAID6 as well. My statement was based largely on Dave's information.
> See [1] below.
Not sure what I'm supposed to have misunderstood...
> On 8/19/2012 7:01 PM, NeilBrown wrote:
> > On Sun, 19 Aug 2012 18:34:28 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> > wrote:
>
> > Since we are trying to set the record straight....
>
> Thank you for finally jumping in Neil--had hoped to see your
> authoritative information sooner.
>
> > md/RAID6 must read all data devices (i.e. not parity devices) which it is not
> > going to write to, in an RWM cycle (which the code actually calls RCW -
> > reconstruct-write).
That's a RMW cycle from an IO point of view. i.e. sycnhronous read
must take place before the data can be modified and written...
> > md/RAID5 uses an alternate mechanism when the number of data blocks that need
> > to be written is less than half the number of data blocks in a stripe. In
> > this alternate mechansim (which the code calls RMW - read-modify-write),
> > md/RAID5 reads all the blocks that it is about to write to, plus the parity
> > block. It then computes the new parity and writes it out along with the new
> > data.
And by the same definition, that's also a RMW cycle.
> >> [1}The only thing that's not clear at this point is if md/RAID6 also
> >> always writes back all chunks during RMW, or only the chunk that has
> >> changed.
>
> > Do you seriously imagine anyone would write code to write out data which it
> > is known has not changed? Sad. :-)
Two words: media scrubbing.
> On 6/25/2012 9:30 PM, Dave Chinner wrote:
> > IOWs, every time you do a small isolated write, the MD RAID volume
> > will do a RMW cycle, reading 11MB and writing 12MB of data to disk.
Oh, you're probably complaining about that write number. All I was
trying to do was demonstrate what a worst case RMW cycle looks like.
So by the above, that occurs when you have a same isolated write to
each chunk of the stripe. A single write is read 11MB, write 1.5MB
(data + 2 parity). It doesn't really change the IO latency or load,
though, you've still got the same read-all, modify, write-multiple
IO pattern....
> > Given that most workloads are not doing lots and lots of large
> > sequential writes this is, IMO, a pretty bad default given typical
> > RAID5/6 volume configurations we see....
Either way, the point I was making in the original post stands -
RAID6 sucks balls for most workloads as they only do small writes in
comparison to the stripe width of the volume....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-08-20 5:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-15 0:49 O_DIRECT to md raid 6 is slow Andy Lutomirski
2012-08-15 1:07 ` kedacomkernel
2012-08-15 1:12 ` Andy Lutomirski
2012-08-15 1:23 ` kedacomkernel
2012-08-15 11:50 ` John Robinson
2012-08-15 17:57 ` Andy Lutomirski
2012-08-15 22:00 ` Stan Hoeppner
2012-08-15 22:10 ` Andy Lutomirski
2012-08-15 23:50 ` Stan Hoeppner
2012-08-16 1:08 ` Andy Lutomirski
2012-08-16 6:41 ` Roman Mamedov
[not found] ` <201208152307.q7FN7hMR008630@xs8.xs4all.nl>
[not found] ` <502CD3F8.70001@hardwarefreak.com>
[not found] ` <502D6B0A.6090508@xs4all.net>
[not found] ` <502DF357.8090205@hardwarefreak.com>
[not found] ` <502E2817.8040306@xs4all.net>
2012-08-18 5:09 ` Stan Hoeppner
2012-08-18 10:08 ` Michael Tokarev
2012-08-19 3:17 ` Stan Hoeppner
2012-08-19 14:01 ` David Brown
2012-08-19 23:34 ` Stan Hoeppner
2012-08-20 0:01 ` NeilBrown
2012-08-20 4:44 ` Stan Hoeppner
2012-08-20 5:19 ` Dave Chinner [this message]
2012-08-20 5:42 ` Stan Hoeppner
2012-08-20 7:47 ` David Brown
2012-08-21 14:51 ` Miquel van Smoorenburg
2012-08-22 3:59 ` Stan Hoeppner
2012-08-19 17:02 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120820051951.GC19235@dastard \
--to=david@fromorbit.com \
--cc=LKML@dastard \
--cc=david.brown@hesbynett.no \
--cc=linux-raid@vger.kernel.org \
--cc=mikevs@xs4all.net \
--cc=mjt@tls.msk.ru \
--cc=neilb@suse.de \
--cc=stan@hardwarefreak.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).