Re: RAID 5 doesn't scale

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Peter Landmann <sfrazt@googlemail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID 5 doesn't scale
Date: Wed, 03 Apr 2013 08:18:52 -0500	[thread overview]
Message-ID: <515C2C3C.3020400@hardwarefreak.com> (raw)
In-Reply-To: <loom.20130403T122905-373@post.gmane.org>

On 4/3/2013 6:00 AM, Peter Landmann wrote:

You didn't mention your stripe_cache_size value.  It'll make a lot of
difference.  Make sure it's at least 4096.  The default is 256.

~$ /bin/echo 4096 > /sys/block/md[X]/md/stripe_cache_size

> FIO settings:
> bs=4096
> iodepth=248
> direct=1
> continue_on_error=1
> rw=randwrite
> ioengine=libaio
> norandommap
> refill_buffers
> group_reporting

> numjobs=1

^^^^^^^^^^^  Even when using AIO you're still serialized when using a
single thread, regardless of queue depth.  Thus there is non trivial
latency between IO operations.  Retest with only these global parameters
to get some concurrency.  Along with a larger stripe cache your numbers
should go up substantially.  This test runs 4 threads/core to ensure you
saturate md with IO.

[global]
zero_buffers
numjobs=24
thread
group_reporting
blocksize=4096
ioengine=libaio
iodepth=16
direct=1
size=8G

> So you have an idea why the real performance is only 50% of the theoretical 
> performance? 

Three reasons:  IO latency, limited stripe_cache_size, parity RMW

> No cpu core is at its limits.

Because you're not cycle limited but latency limited.  With this FIO
test your CPU burn should increase a bit.

> As i said in my other post. I would be interested to solve the problem but i 
> have problems to identify it.

Note also that you're doing 4KB random writes against RAID5.  This is
going to generate substantial RMW cycles.  The Intel X25-M G2 is not a
speed daemon.  Its published max 4KB IOPS throughput is for purely
random writes, not the read+write pattern created by parity RMW.  So
while your random read should get a nice jump with this test, your
random write may not improve as much.  The limitation here is a function
of the SSD controller on the X25-M G2, not md/RAID5.  If you test 5
drives in md/RAID0 you'll see a bump in random write IOPS.

-- 
Stan

next prev parent reply	other threads:[~2013-04-03 13:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-03 11:00 RAID 5 doesn't scale Peter Landmann
2013-04-03 11:21 ` Benjamin ESTRABAUD
2013-04-03 18:34   ` Martin Wilck
2013-04-03 20:38     ` Peter Landmann
2013-04-04 13:40       ` Benjamin ESTRABAUD
2013-04-03 13:18 ` Stan Hoeppner [this message]
2013-04-03 15:23   ` keld
2013-04-03 15:31   ` Peter Landmann
2013-04-03 18:35     ` Stan Hoeppner
2013-04-03 18:23   ` Martin Wilck
2013-04-03 20:36     ` Peter Landmann
2013-04-03 21:19       ` Peter Landmann
2013-04-03 21:24       ` Stan Hoeppner
2013-04-03 21:29         ` Peter Landmann
2013-04-03 21:15     ` Stan Hoeppner
2013-04-03 19:56   ` Roy Sigurd Karlsbakk
2013-04-03 21:12   ` Peter Landmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515C2C3C.3020400@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sfrazt@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.