linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Absymal performance of O_DIRECT write on parity raid
@ 2010-12-31  4:35 Spelic
  2010-12-31  5:36 ` Doug Dumitru
  0 siblings, 1 reply; 3+ messages in thread
From: Spelic @ 2010-12-31  4:35 UTC (permalink / raw)
  To: linux-raid

Hi all linux raiders

on kernel 2.6.36.2, but probably others, performances of O_DIRECT are 
absymal on parity raid, compared to nonparity raid

And this is NOT due to the RMW apparently! (see below)

With dd bs=1M to the bare MD device, a 6-disk raid5 1024k chunk, I 
obtain 2.1MB/sec on raid5 while the same test onto a 4-disk raid10 goes 
at 160MB/sec (80 times faster).
even with stripe_cache_size to the max.
Nondirect writes to the arrays are at about 250MB/sec for raid5, and 
about 180MB/sec for raid10.
With bs=4k directio it's 205KB/sec on the raid5 vs 28MB/sec on the 
raid10 (136 times faster)

This does NOT seem due to the RMW, because from the second time on MD 
does *not* read from the disks anymore (checked with iostat -x 1)
(BTW how do you clear that cache? echo 3 > /proc/sys/vm/drop_cache does 
not appear to work)

It's so bad it looks like a bug. Could you please have a look at this?
There are many important stuff that use o_direct, in particular:
- LVM, I think, especially pvmove and mirror creation, which are 
impossibly slow on parity raid
- Databases (ok I understand we should use raid10 but the difference 
should not be SO great!)
- Virtualization. E.g. KVM wants bare devices for high performance, 
wants to do direct io. Go figure.

With such a bad worst-case for o_direct we seriously risk to need to 
abandon MD parity raid completely
Please have a look

Thank you

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-01-05 11:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-31  4:35 Absymal performance of O_DIRECT write on parity raid Spelic
2010-12-31  5:36 ` Doug Dumitru
2011-01-05 11:51   ` Spelic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).