From: Dave Chinner <david@fromorbit.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: NeilBrown <neilb@suse.de>, David Brown <david.brown@hesbynett.no>,
Michael Tokarev <mjt@tls.msk.ru>,
Miquel van Smoorenburg <mikevs@xs4all.net>,
Linux RAID <linux-raid@vger.kernel.org>,
LKML@dastard
Subject: Re: O_DIRECT to md raid 6 is slow
Date: Mon, 20 Aug 2012 15:19:51 +1000 [thread overview]
Message-ID: <20120820051951.GC19235@dastard> (raw)
In-Reply-To: <5031C0A9.60803@hardwarefreak.com>
On Sun, Aug 19, 2012 at 11:44:25PM -0500, Stan Hoeppner wrote:
> I'm copying Dave C. as he apparently misunderstood the behavior of
> md/RAID6 as well. My statement was based largely on Dave's information.
> See [1] below.
Not sure what I'm supposed to have misunderstood...
> On 8/19/2012 7:01 PM, NeilBrown wrote:
> > On Sun, 19 Aug 2012 18:34:28 -0500 Stan Hoeppner <stan@hardwarefreak.com>
> > wrote:
>
> > Since we are trying to set the record straight....
>
> Thank you for finally jumping in Neil--had hoped to see your
> authoritative information sooner.
>
> > md/RAID6 must read all data devices (i.e. not parity devices) which it is not
> > going to write to, in an RWM cycle (which the code actually calls RCW -
> > reconstruct-write).
That's a RMW cycle from an IO point of view. i.e. sycnhronous read
must take place before the data can be modified and written...
> > md/RAID5 uses an alternate mechanism when the number of data blocks that need
> > to be written is less than half the number of data blocks in a stripe. In
> > this alternate mechansim (which the code calls RMW - read-modify-write),
> > md/RAID5 reads all the blocks that it is about to write to, plus the parity
> > block. It then computes the new parity and writes it out along with the new
> > data.
And by the same definition, that's also a RMW cycle.
> >> [1}The only thing that's not clear at this point is if md/RAID6 also
> >> always writes back all chunks during RMW, or only the chunk that has
> >> changed.
>
> > Do you seriously imagine anyone would write code to write out data which it
> > is known has not changed? Sad. :-)
Two words: media scrubbing.
> On 6/25/2012 9:30 PM, Dave Chinner wrote:
> > IOWs, every time you do a small isolated write, the MD RAID volume
> > will do a RMW cycle, reading 11MB and writing 12MB of data to disk.
Oh, you're probably complaining about that write number. All I was
trying to do was demonstrate what a worst case RMW cycle looks like.
So by the above, that occurs when you have a same isolated write to
each chunk of the stripe. A single write is read 11MB, write 1.5MB
(data + 2 parity). It doesn't really change the IO latency or load,
though, you've still got the same read-all, modify, write-multiple
IO pattern....
> > Given that most workloads are not doing lots and lots of large
> > sequential writes this is, IMO, a pretty bad default given typical
> > RAID5/6 volume configurations we see....
Either way, the point I was making in the original post stands -
RAID6 sucks balls for most workloads as they only do small writes in
comparison to the stripe width of the volume....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-08-20 5:19 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-15 0:49 O_DIRECT to md raid 6 is slow Andy Lutomirski
2012-08-15 1:07 ` kedacomkernel
2012-08-15 1:07 ` kedacomkernel
2012-08-15 1:12 ` Andy Lutomirski
2012-08-15 1:23 ` kedacomkernel
2012-08-15 1:23 ` kedacomkernel
2012-08-15 11:50 ` John Robinson
2012-08-15 17:57 ` Andy Lutomirski
2012-08-15 22:00 ` Stan Hoeppner
2012-08-15 22:10 ` Andy Lutomirski
2012-08-15 23:50 ` Stan Hoeppner
2012-08-16 1:08 ` Andy Lutomirski
2012-08-16 6:41 ` Roman Mamedov
2012-08-15 23:07 ` Miquel van Smoorenburg
2012-08-16 11:05 ` Stan Hoeppner
2012-08-16 21:50 ` Miquel van Smoorenburg
2012-08-17 7:31 ` Stan Hoeppner
2012-08-17 11:16 ` Miquel van Smoorenburg
2012-08-18 5:09 ` Stan Hoeppner
2012-08-18 10:08 ` Michael Tokarev
2012-08-19 3:17 ` Stan Hoeppner
2012-08-19 14:01 ` David Brown
2012-08-19 23:34 ` Stan Hoeppner
2012-08-20 0:01 ` NeilBrown
2012-08-20 4:44 ` Stan Hoeppner
2012-08-20 5:19 ` Dave Chinner [this message]
2012-08-20 5:42 ` Stan Hoeppner
2012-08-20 7:47 ` David Brown
2012-08-21 14:51 ` Miquel van Smoorenburg
2012-08-22 3:59 ` Stan Hoeppner
2012-08-19 17:02 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120820051951.GC19235@dastard \
--to=david@fromorbit.com \
--cc=LKML@dastard \
--cc=david.brown@hesbynett.no \
--cc=linux-raid@vger.kernel.org \
--cc=mikevs@xs4all.net \
--cc=mjt@tls.msk.ru \
--cc=neilb@suse.de \
--cc=stan@hardwarefreak.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.