From: Stan Hoeppner <stan@hardwarefreak.com>
To: Adam Goryachev <mailinglists@websitemanagers.com.au>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID performance
Date: Thu, 07 Feb 2013 02:11:31 -0600 [thread overview]
Message-ID: <511361B3.8060204@hardwarefreak.com> (raw)
In-Reply-To: <51134E43.7090508@websitemanagers.com.au>
On 2/7/2013 12:48 AM, Adam Goryachev wrote:
> I'm trying to resolve a significant performance issue (not arbitrary dd
> tests, etc but real users complaining, real workload performance).
It's difficult to analyze your situation without even a basic
description of the workload(s). What is the file access pattern? What
types of files?
> I'm currently using 5 x 480GB SSD's in a RAID5 as follows:
> md1 : active raid5 sdf1[0] sdc1[4] sdb1[5] sdd1[3] sde1[1]
> 1863535104 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5]
...
> Each drive is set to the deadline scheduler.
Switching to noop may help a little, as may disablig NCQ, i.e. putting
the driver in native IDE mode, or setting queue depth to 1.
> Drives are:
> Intel 520s MLC 480G SATA3
> Supposedly Read 550M/Write 520M
> I think the workload being generated is simply too much for the
> underlying drives.
Not possible. With an effective spindle width of 4, these SSDs can do
~80K random read/write IOPS sustained. To put that into perspective,
you would need a ~$150,000 high end FC SAN array controller with 270 15K
SAS drives in RAID0 to get the same IOPS.
The problem is not the SSDs. Probably not the controller either.
> I've been collecting the information from
> /sys/block/<drive>/stat every 10 seconds for each drive. What makes me
> think the drives are overworked is that the backlog value gets very high
> at the same time the users complain about performance.
What is "very high"? Since you mention "backlog" I'll assume you're
referring to field #11. If so, note that on my idle server (field #9 is
0), it is currently showing 434045280 for field #11. That's apparently
a weighted value of milliseconds. And apparently it's not reliable as a
diagnostic value.
What you should be looking at is field #9, which simply tells you how
may IOs are in progress. But even if this number is high, which it can
be be very high with SSDs, it doesn't inform you if the drive is
performing properly or not. What you should be using is ioptop or
something similar. But this still isn't going to be all that informative.
> The load is a bunch of windows VM's, which were working fine until
> recently when I migrated the main fileserver/domain controller on
> (previously it was a single SCSI Ultra320 disk on a standalone machine).
> Hence, this also seems to indicate a lack of performance.
You just typed 4 lines and told us nothing of how this relates to the
problem you wish us to help you solve. Please be detailed.
> Currently the SSD's are connected to the onboard SATA ports (only SATA II):
> 00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI
> Controller (rev 05)
Unless this Southbridge has a bug (I don't have time to research it),
then this isn't the problem.
> There is one additional SSD which is just the OS drive also connected,
> but it is mostly idle (all it does is log the stats/etc).
Irrelevant.
> Assuming the issue is underlying hardware
It's not.
> 1) Get a battery backed RAID controller card (which should improve
> latency because the OS can pretend it is written while the card deals
> with writing it to disk).
[BB/FB]WC is basically useless with SSDs. LSI has the best boards, and
the "FastPath" option for SSDs basically disables the onboard cache to
get it out of the way. Enterprise SSDs have extra capacitance allowing
for cache flushing on power loss so battery/flash protection on the RAID
card isn't necessary. The write cache on the SSDs themselves is faster
in aggregate than the RAID card's ASIC and cache RAM interface, thus
having BBWC on the card enabled with SSDs actually slows you down.
So, in short, this isn't the answer to your problem, either.
> 2) Move from a 5 disk RAID5 to a 8 disk RAID10, giving better data
> protection (can lose up to four drives) and hopefully better performance
> (main concern right now), and same capacity as current.
You've got plenty of hardware performance. Moving to RAID10 will simply
cost more money with no performance gain. Here's why:
md/RAIAD5 and md/RAID10 both rely on a single write thread. If you've
been paying attention on this list you know that patches are in the
works to fix this but are not, AFAIK, in mainline yet, and a long way
from being in distro kernels. So, you've got maximum possible read
performance now, but your *write performance is limited to a single CPU
core* with both of these RAID drives. If your problem is write
performance, your only solution at this time with md is to use a layered
RAID, such as RAID0 over RAID1 pairs, or linear over RAID1 pairs. This
puts all of your cores in play for writes.
The reason this is an issue is that even a small number of SSDs can
overwhelm a single md thread, which is limited to one core of
throughput. This has also been discussed thoroughly here recently.
> The real questions are:
> 1) Is this data enough to say that the performance issue is due to
> underlying hardware as opposed to a mis-configuration?
No, it's not. We really need to have more specific workload data.
> 2) If so, any suggestions on specific hardware which would help?
It's not a hardware problem. Given that it's a VM consolidation host,
I'd guess it's a hypervisor configuration problem.
> 3) Would removing the bitmap make an improvement to the performance?
I can't say this any more emphatically. You have 5 of Intel's best
consumer SSDs and an Intel mainboard. The problem is not your hardware.
> Motherboard is Intel S1200BTLR Serverboard - 6xSATAII / Raid 0,1,10,5
>
> It is possibly to wipe the array and re-create that would help.......
Unless you're write IOPS starved due to md/RAID5 as I described above,
blowing away the array and creating a new one isn't going to help. You
simply need to investigate further.
And if you would like continued assistance, you'd need to provide much
greater detail of the hardware and workload. You didn't mention your
CPU(s) model/freq. This matters greatly with RAID5 and SSD. Nor RAM
type/capacity, network topology, nor number of users and what
applications they're running when they report the performance problem.
Nor did you mention which hypervisor kernel/distro you're using, how
many Windows VMs you're running, and the primary workload of each, etc,
etc, etc.
> Any comments, suggestions, advice greatly received.
More information, please.
--
Stan
next prev parent reply other threads:[~2013-02-07 8:11 UTC|newest]
Thread overview: 131+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-07 6:48 RAID performance Adam Goryachev
2013-02-07 6:51 ` Adam Goryachev
2013-02-07 8:24 ` Stan Hoeppner
2013-02-07 7:02 ` Carsten Aulbert
2013-02-07 10:12 ` Adam Goryachev
2013-02-07 10:29 ` Carsten Aulbert
2013-02-07 10:41 ` Adam Goryachev
2013-02-07 8:11 ` Stan Hoeppner [this message]
2013-02-07 10:05 ` Adam Goryachev
2013-02-16 4:33 ` RAID performance - *Slow SSDs likely solved* Stan Hoeppner
[not found] ` <cfefe7a6-a13f-413c-9e3d-e061c68dc01b@email.android.com>
2013-02-17 5:01 ` Stan Hoeppner
2013-02-08 7:21 ` RAID performance Adam Goryachev
2013-02-08 7:37 ` Chris Murphy
2013-02-08 13:04 ` Stan Hoeppner
2013-02-07 9:07 ` Dave Cundiff
2013-02-07 10:19 ` Adam Goryachev
2013-02-07 11:07 ` Dave Cundiff
2013-02-07 12:49 ` Adam Goryachev
2013-02-07 12:53 ` Phil Turmel
2013-02-07 12:58 ` Adam Goryachev
2013-02-07 13:03 ` Phil Turmel
2013-02-07 13:08 ` Adam Goryachev
2013-02-07 13:20 ` Mikael Abrahamsson
2013-02-07 22:03 ` Chris Murphy
2013-02-07 23:48 ` Chris Murphy
2013-02-08 0:02 ` Chris Murphy
2013-02-08 6:25 ` Adam Goryachev
2013-02-08 7:35 ` Chris Murphy
2013-02-08 8:34 ` Chris Murphy
2013-02-08 14:31 ` Adam Goryachev
2013-02-08 14:19 ` Adam Goryachev
2013-02-08 6:15 ` Adam Goryachev
2013-02-07 15:32 ` Dave Cundiff
2013-02-08 13:58 ` Adam Goryachev
2013-02-08 21:42 ` Stan Hoeppner
2013-02-14 22:42 ` Chris Murphy
2013-02-15 1:10 ` Adam Goryachev
2013-02-15 1:40 ` Chris Murphy
2013-02-15 4:01 ` Adam Goryachev
2013-02-15 5:14 ` Chris Murphy
2013-02-15 11:10 ` Adam Goryachev
2013-02-15 23:01 ` Chris Murphy
2013-02-17 9:52 ` RAID performance - new kernel results Adam Goryachev
2013-02-18 13:20 ` RAID performance - new kernel results - 5x SSD RAID5 Stan Hoeppner
2013-02-20 17:10 ` Adam Goryachev
2013-02-21 6:04 ` Stan Hoeppner
2013-02-21 6:40 ` Adam Goryachev
2013-02-21 8:47 ` Joseph Glanville
2013-02-22 8:10 ` Stan Hoeppner
2013-02-24 20:36 ` Stan Hoeppner
2013-03-01 16:06 ` Adam Goryachev
2013-03-02 9:15 ` Stan Hoeppner
2013-03-02 17:07 ` Phil Turmel
2013-03-02 23:48 ` Stan Hoeppner
2013-03-03 2:35 ` Phil Turmel
2013-03-03 15:19 ` Adam Goryachev
2013-03-04 1:31 ` Phil Turmel
2013-03-04 9:39 ` Adam Goryachev
2013-03-04 12:41 ` Phil Turmel
2013-03-04 12:42 ` Stan Hoeppner
2013-03-04 5:25 ` Stan Hoeppner
2013-03-03 17:32 ` Adam Goryachev
2013-03-04 12:20 ` Stan Hoeppner
2013-03-04 16:26 ` Adam Goryachev
2013-03-05 9:30 ` RAID performance - 5x SSD RAID5 - effects of stripe cache sizing Stan Hoeppner
2013-03-05 15:53 ` Adam Goryachev
2013-03-07 7:36 ` Stan Hoeppner
2013-03-08 0:17 ` Adam Goryachev
2013-03-08 4:02 ` Stan Hoeppner
2013-03-08 5:57 ` Mikael Abrahamsson
2013-03-08 10:09 ` Stan Hoeppner
2013-03-08 14:11 ` Mikael Abrahamsson
2013-02-21 17:41 ` RAID performance - new kernel results - 5x SSD RAID5 David Brown
2013-02-23 6:41 ` Stan Hoeppner
2013-02-23 15:57 ` RAID performance - new kernel results John Stoffel
2013-03-01 16:10 ` Adam Goryachev
2013-03-10 15:35 ` Charles Polisher
2013-04-15 12:23 ` Adam Goryachev
2013-04-15 15:31 ` John Stoffel
2013-04-17 10:15 ` Adam Goryachev
2013-04-15 16:49 ` Roy Sigurd Karlsbakk
2013-04-15 20:16 ` Phil Turmel
2013-04-16 19:28 ` Roy Sigurd Karlsbakk
2013-04-16 21:03 ` Phil Turmel
2013-04-16 21:43 ` Stan Hoeppner
2013-04-15 20:42 ` Stan Hoeppner
2013-02-08 3:32 ` RAID performance Stan Hoeppner
2013-02-08 7:11 ` Adam Goryachev
2013-02-08 17:10 ` Stan Hoeppner
2013-02-08 18:44 ` Adam Goryachev
2013-02-09 4:09 ` Stan Hoeppner
2013-02-10 4:40 ` Adam Goryachev
2013-02-10 13:22 ` Stan Hoeppner
2013-02-10 16:16 ` Adam Goryachev
2013-02-10 17:19 ` Mikael Abrahamsson
2013-02-10 21:57 ` Adam Goryachev
2013-02-11 3:41 ` Adam Goryachev
2013-02-11 4:33 ` Mikael Abrahamsson
2013-02-12 2:46 ` Stan Hoeppner
2013-02-12 5:33 ` Adam Goryachev
2013-02-13 7:56 ` Stan Hoeppner
2013-02-13 13:48 ` Phil Turmel
2013-02-13 16:17 ` Adam Goryachev
2013-02-13 20:20 ` Adam Goryachev
2013-02-14 12:22 ` Stan Hoeppner
2013-02-15 13:31 ` Stan Hoeppner
2013-02-15 14:32 ` Adam Goryachev
2013-02-16 1:07 ` Stan Hoeppner
2013-02-16 17:19 ` Adam Goryachev
2013-02-17 1:42 ` Stan Hoeppner
2013-02-17 5:02 ` Adam Goryachev
2013-02-17 6:28 ` Stan Hoeppner
2013-02-17 8:41 ` Adam Goryachev
2013-02-17 13:58 ` Stan Hoeppner
2013-02-17 14:46 ` Adam Goryachev
2013-02-19 8:17 ` Stan Hoeppner
2013-02-20 16:45 ` Adam Goryachev
2013-02-21 0:45 ` Stan Hoeppner
2013-02-21 3:10 ` Adam Goryachev
2013-02-22 11:19 ` Stan Hoeppner
2013-02-22 15:25 ` Charles Polisher
2013-02-23 4:14 ` Stan Hoeppner
2013-02-12 7:34 ` Mikael Abrahamsson
2013-02-08 7:17 ` Adam Goryachev
2013-02-07 12:01 ` Brad Campbell
2013-02-07 12:37 ` Adam Goryachev
2013-02-07 17:12 ` Fredrik Lindgren
2013-02-08 0:00 ` Adam Goryachev
2013-02-11 19:49 ` Roy Sigurd Karlsbakk
2013-02-11 20:30 ` Dave Cundiff
2013-02-07 11:32 ` Mikael Abrahamsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511361B3.8060204@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=linux-raid@vger.kernel.org \
--cc=mailinglists@websitemanagers.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.