From: Adam Goryachev <mailinglists@websitemanagers.com.au>
To: Dave Cundiff <syshackmin@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID performance - new kernel results
Date: Sun, 17 Feb 2013 20:52:14 +1100 [thread overview]
Message-ID: <5120A84E.4020702@websitemanagers.com.au> (raw)
In-Reply-To: <51150475.2020803@websitemanagers.com.au>
On 09/02/13 00:58, Adam Goryachev wrote:
> On 08/02/13 02:32, Dave Cundiff wrote:
>> On Thu, Feb 7, 2013 at 7:49 AM, Adam Goryachev
>> <mailinglists@websitemanagers.com.au> wrote:
>>>> I definitely see that. See below for a FIO run I just did on one of my RAID10s
> OK, some fio results.
>
> Firstly, this is done against /tmp which is on the single standalone
> Intel SSD used for the rootfs (shows some performance level of the
> chipset I presume):
>
> root@san1:/tmp/testing# fio /root/test.fio
> seq-read: (g=0): rw=read, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32
> seq-write: (g=1): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32
> Starting 2 processes
> seq-read: Laying out IO file(s) (1 file(s) / 4096MB)
> Jobs: 1 (f=1): [_W] [100.0% done] [0K/137M /s] [0/2133 iops] [eta 00m:00s]
> seq-read: (groupid=0, jobs=1): err= 0: pid=4932
> read : io=4096MB, bw=518840KB/s, iops=8106, runt= 8084msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=5138
> write: io=4096MB, bw=136405KB/s, iops=2131, runt= 30749msec
> Run status group 0 (all jobs):
> READ: io=4096MB, aggrb=518840KB/s, minb=531292KB/s, maxb=531292KB/s,
> mint=8084msec, maxt=8084msec
>
> Run status group 1 (all jobs):
> WRITE: io=4096MB, aggrb=136404KB/s, minb=139678KB/s, maxb=139678KB/s,
> mint=30749msec, maxt=30749msec
>
> Disk stats (read/write):
> sda: ios=66570/66363, merge=10297/10453, ticks=259152/993304,
> in_queue=1252592, util=99.34%
>
>
> PS, I'm assuming I should omit the extra output similar to what you
> did.... If I should include all info, I can re-run and provide...
>
> This seems to indicate a read speed of 531M and write of 139M, which to
> me says something is wrong. I thought write speed is slower, but not
> that much slower?
>
> Moving on, I've stopped the secondary DRBD, created a new LV (testlv) of
> 15G, and formatted with ext4, mounted it, and re-run the test:
>
> seq-read: (groupid=0, jobs=1): err= 0: pid=19578
> read : io=4096MB, bw=640743KB/s, iops=10011, runt= 6546msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=19997
> write: io=4096MB, bw=208765KB/s, iops=3261, runt= 20091msec
> Run status group 0 (all jobs):
> READ: io=4096MB, aggrb=640743KB/s, minb=656120KB/s, maxb=656120KB/s,
> mint=6546msec, maxt=6546msec
>
> Run status group 1 (all jobs):
> WRITE: io=4096MB, aggrb=208765KB/s, minb=213775KB/s, maxb=213775KB/s,
> mint=20091msec, maxt=20091msec
>
> Disk stats (read/write):
> dm-14: ios=65536/64841, merge=0/0, ticks=206920/469464,
> in_queue=676580, util=98.89%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0,
> aggrin_queue=0, aggrutil=0.00%
> drbd2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%
>
> dm-14 is the testlv
>
> So, this indicates a max read speed of 656M and write of 213M, again,
> write is very slow (about 30%).
>
> With these figures, just 2 x 1Gbps links would saturate the write
> performance of this RAID5 array.
>
> Finally, changing the fio config file to point filename=/dev/vg0/testlv
> (ie, raw LV, no filesystem):
> seq-read: (groupid=0, jobs=1): err= 0: pid=10986
> read : io=4096MB, bw=652607KB/s, iops=10196, runt= 6427msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=11177
> write: io=4096MB, bw=202252KB/s, iops=3160, runt= 20738msec
> Run status group 0 (all jobs):
> READ: io=4096MB, aggrb=652606KB/s, minb=668269KB/s, maxb=668269KB/s,
> mint=6427msec, maxt=6427msec
>
> Run status group 1 (all jobs):
> WRITE: io=4096MB, aggrb=202252KB/s, minb=207106KB/s, maxb=207106KB/s,
> mint=20738msec, maxt=20738msec
>
> Not much difference, which I didn't really expect...
>
> So, should I be concerned about these results? Do I need to try to
> re-run these tests at a lower layer (ie, remove DRBD and/or LVM from the
> picture)? Are these meaningless and I should be running a different
> test/set of tests/etc ?
OK, I've upgraded to:
Linux san1 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.35-2~bpo60+1 x86_64
GNU/Linux
I also upgraded to iscsitarget from testing, as there seemed a few fixes
there, even though not the one I might have liked:
ii iscsitarget 1.4.20.2-10.1
iSCSI Enterprise Target userland tools
ii iscsitarget-dkms 1.4.20.2-10.1
iSCSI Enterprise Target kernel module source - dkms version
Then I re-ran the fio tests from above and here is what I get when
testing against an LV which has a snapshot against it:
seq-read: (groupid=0, jobs=1): err= 0: pid=10168
read : io=4096MB, bw=1920MB/s, iops=30724, runt= 2133msec
seq-write: (groupid=1, jobs=1): err= 0: pid=10169
write: io=2236MB, bw=38097KB/s, iops=595, runt= 60094msec
Run status group 0 (all jobs):
READ: io=4096MB, aggrb=1920MB/s, minb=1966MB/s, maxb=1966MB/s,
mint=2133msec, maxt=2133msec
Run status group 1 (all jobs):
WRITE: io=2236MB, aggrb=38097KB/s, minb=39011KB/s, maxb=39011KB/s,
mint=60094msec, maxt=60094msec
So, 1920MB/s read, that sounds good to me, almost 3 times faster,
however, the write performance is pretty dismal :(
After removing the snapshot, here is another look:
seq-read: (groupid=0, jobs=1): err= 0: pid=10222
read : io=4096MB, bw=2225MB/s, iops=35598, runt= 1841msec
seq-write: (groupid=1, jobs=1): err= 0: pid=10223
write: io=4096MB, bw=111666KB/s, iops=1744, runt= 37561msec
Run status group 0 (all jobs):
READ: io=4096MB, aggrb=2225MB/s, minb=2278MB/s, maxb=2278MB/s,
mint=1841msec, maxt=1841msec
Run status group 1 (all jobs):
WRITE: io=4096MB, aggrb=111666KB/s, minb=114346KB/s, maxb=114346KB/s,
mint=37561msec, maxt=37561msec
A big improvement, 111MB/s write, and even better reads. However, this
write speed still seems pretty slow.
Another run after stopping the secondary DRBD sync:
seq-read: (groupid=0, jobs=1): err= 0: pid=10708
read : io=4096MB, bw=2242MB/s, iops=35870, runt= 1827msec
seq-write: (groupid=1, jobs=1): err= 0: pid=10709
write: io=4096MB, bw=560661KB/s, iops=8760, runt= 7481msec
Run status group 0 (all jobs):
READ: io=4096MB, aggrb=2242MB/s, minb=2296MB/s, maxb=2296MB/s,
mint=1827msec, maxt=1827msec
Run status group 1 (all jobs):
WRITE: io=4096MB, aggrb=560660KB/s, minb=574116KB/s, maxb=574116KB/s,
mint=7481msec, maxt=7481msec
Now THAT is what I was hoping to see.... 2,242MB/s read, enough to
saturate 18 x 1Gbps ports... and 560MB/s write, enough for 4.5 x 1Gbps,
which is more than the maximum from 2 machines. So as long as I have the
secondary DRBD disconnected during the day (I do), and don't have any
LVM snapshots (I don't due to performance), then things should be a lot
better.
Now looking back at all this, I think I was probably suffering from a
whole bunch of problems:
1) Write cache enabled on windows
2) iSCSI not configured to properly deal with intermittent/slow
responses, queue forever instead of returning an error
3) Not using multipath IO
4) Server storage performance too slow to keep up (due to kernel bug in
debian stable squeeze/2.6.32)
5) Using LVM snapshots which degraded performance
6) Using DRBD during the day with spinning disks on the secondary
(couldn't keep up, slowed down the primary)
7) Sharing a single ethernet for user traffic and SAN traffic, allowing
one protocol to flood/block the other
8) Using RR bonding with more ports on the SAN than the client, causing
flooding, 802.3X pause frames, etc
I can't say that any one of the above fixed the problem, it has been
getting progressively better as each item has been addressed. I'd like
to think that its very close to done now.
The only thing I still need to do is get rid of the bond0 on the SAN,
change to use 8 individual IP's, and configure the clients to talk to
two of the IP's on the san, but only one over each ethernet interface.
I'd again like to say thanks to all the people who've helped out with
this drama. I did forget to take those photo's, but I'll take some next
time I'm in, I think I did a pretty good job overall, and it looks
reasonably neat (by my standards anyway :)
Regards,
Adam
--
Adam Goryachev
Website Managers
www.websitemanagers.com.au
next prev parent reply other threads:[~2013-02-17 9:52 UTC|newest]
Thread overview: 131+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-07 6:48 RAID performance Adam Goryachev
2013-02-07 6:51 ` Adam Goryachev
2013-02-07 8:24 ` Stan Hoeppner
2013-02-07 7:02 ` Carsten Aulbert
2013-02-07 10:12 ` Adam Goryachev
2013-02-07 10:29 ` Carsten Aulbert
2013-02-07 10:41 ` Adam Goryachev
2013-02-07 8:11 ` Stan Hoeppner
2013-02-07 10:05 ` Adam Goryachev
2013-02-16 4:33 ` RAID performance - *Slow SSDs likely solved* Stan Hoeppner
[not found] ` <cfefe7a6-a13f-413c-9e3d-e061c68dc01b@email.android.com>
2013-02-17 5:01 ` Stan Hoeppner
2013-02-08 7:21 ` RAID performance Adam Goryachev
2013-02-08 7:37 ` Chris Murphy
2013-02-08 13:04 ` Stan Hoeppner
2013-02-07 9:07 ` Dave Cundiff
2013-02-07 10:19 ` Adam Goryachev
2013-02-07 11:07 ` Dave Cundiff
2013-02-07 12:49 ` Adam Goryachev
2013-02-07 12:53 ` Phil Turmel
2013-02-07 12:58 ` Adam Goryachev
2013-02-07 13:03 ` Phil Turmel
2013-02-07 13:08 ` Adam Goryachev
2013-02-07 13:20 ` Mikael Abrahamsson
2013-02-07 22:03 ` Chris Murphy
2013-02-07 23:48 ` Chris Murphy
2013-02-08 0:02 ` Chris Murphy
2013-02-08 6:25 ` Adam Goryachev
2013-02-08 7:35 ` Chris Murphy
2013-02-08 8:34 ` Chris Murphy
2013-02-08 14:31 ` Adam Goryachev
2013-02-08 14:19 ` Adam Goryachev
2013-02-08 6:15 ` Adam Goryachev
2013-02-07 15:32 ` Dave Cundiff
2013-02-08 13:58 ` Adam Goryachev
2013-02-08 21:42 ` Stan Hoeppner
2013-02-14 22:42 ` Chris Murphy
2013-02-15 1:10 ` Adam Goryachev
2013-02-15 1:40 ` Chris Murphy
2013-02-15 4:01 ` Adam Goryachev
2013-02-15 5:14 ` Chris Murphy
2013-02-15 11:10 ` Adam Goryachev
2013-02-15 23:01 ` Chris Murphy
2013-02-17 9:52 ` Adam Goryachev [this message]
2013-02-18 13:20 ` RAID performance - new kernel results - 5x SSD RAID5 Stan Hoeppner
2013-02-20 17:10 ` Adam Goryachev
2013-02-21 6:04 ` Stan Hoeppner
2013-02-21 6:40 ` Adam Goryachev
2013-02-21 8:47 ` Joseph Glanville
2013-02-22 8:10 ` Stan Hoeppner
2013-02-24 20:36 ` Stan Hoeppner
2013-03-01 16:06 ` Adam Goryachev
2013-03-02 9:15 ` Stan Hoeppner
2013-03-02 17:07 ` Phil Turmel
2013-03-02 23:48 ` Stan Hoeppner
2013-03-03 2:35 ` Phil Turmel
2013-03-03 15:19 ` Adam Goryachev
2013-03-04 1:31 ` Phil Turmel
2013-03-04 9:39 ` Adam Goryachev
2013-03-04 12:41 ` Phil Turmel
2013-03-04 12:42 ` Stan Hoeppner
2013-03-04 5:25 ` Stan Hoeppner
2013-03-03 17:32 ` Adam Goryachev
2013-03-04 12:20 ` Stan Hoeppner
2013-03-04 16:26 ` Adam Goryachev
2013-03-05 9:30 ` RAID performance - 5x SSD RAID5 - effects of stripe cache sizing Stan Hoeppner
2013-03-05 15:53 ` Adam Goryachev
2013-03-07 7:36 ` Stan Hoeppner
2013-03-08 0:17 ` Adam Goryachev
2013-03-08 4:02 ` Stan Hoeppner
2013-03-08 5:57 ` Mikael Abrahamsson
2013-03-08 10:09 ` Stan Hoeppner
2013-03-08 14:11 ` Mikael Abrahamsson
2013-02-21 17:41 ` RAID performance - new kernel results - 5x SSD RAID5 David Brown
2013-02-23 6:41 ` Stan Hoeppner
2013-02-23 15:57 ` RAID performance - new kernel results John Stoffel
2013-03-01 16:10 ` Adam Goryachev
2013-03-10 15:35 ` Charles Polisher
2013-04-15 12:23 ` Adam Goryachev
2013-04-15 15:31 ` John Stoffel
2013-04-17 10:15 ` Adam Goryachev
2013-04-15 16:49 ` Roy Sigurd Karlsbakk
2013-04-15 20:16 ` Phil Turmel
2013-04-16 19:28 ` Roy Sigurd Karlsbakk
2013-04-16 21:03 ` Phil Turmel
2013-04-16 21:43 ` Stan Hoeppner
2013-04-15 20:42 ` Stan Hoeppner
2013-02-08 3:32 ` RAID performance Stan Hoeppner
2013-02-08 7:11 ` Adam Goryachev
2013-02-08 17:10 ` Stan Hoeppner
2013-02-08 18:44 ` Adam Goryachev
2013-02-09 4:09 ` Stan Hoeppner
2013-02-10 4:40 ` Adam Goryachev
2013-02-10 13:22 ` Stan Hoeppner
2013-02-10 16:16 ` Adam Goryachev
2013-02-10 17:19 ` Mikael Abrahamsson
2013-02-10 21:57 ` Adam Goryachev
2013-02-11 3:41 ` Adam Goryachev
2013-02-11 4:33 ` Mikael Abrahamsson
2013-02-12 2:46 ` Stan Hoeppner
2013-02-12 5:33 ` Adam Goryachev
2013-02-13 7:56 ` Stan Hoeppner
2013-02-13 13:48 ` Phil Turmel
2013-02-13 16:17 ` Adam Goryachev
2013-02-13 20:20 ` Adam Goryachev
2013-02-14 12:22 ` Stan Hoeppner
2013-02-15 13:31 ` Stan Hoeppner
2013-02-15 14:32 ` Adam Goryachev
2013-02-16 1:07 ` Stan Hoeppner
2013-02-16 17:19 ` Adam Goryachev
2013-02-17 1:42 ` Stan Hoeppner
2013-02-17 5:02 ` Adam Goryachev
2013-02-17 6:28 ` Stan Hoeppner
2013-02-17 8:41 ` Adam Goryachev
2013-02-17 13:58 ` Stan Hoeppner
2013-02-17 14:46 ` Adam Goryachev
2013-02-19 8:17 ` Stan Hoeppner
2013-02-20 16:45 ` Adam Goryachev
2013-02-21 0:45 ` Stan Hoeppner
2013-02-21 3:10 ` Adam Goryachev
2013-02-22 11:19 ` Stan Hoeppner
2013-02-22 15:25 ` Charles Polisher
2013-02-23 4:14 ` Stan Hoeppner
2013-02-12 7:34 ` Mikael Abrahamsson
2013-02-08 7:17 ` Adam Goryachev
2013-02-07 12:01 ` Brad Campbell
2013-02-07 12:37 ` Adam Goryachev
2013-02-07 17:12 ` Fredrik Lindgren
2013-02-08 0:00 ` Adam Goryachev
2013-02-11 19:49 ` Roy Sigurd Karlsbakk
2013-02-11 20:30 ` Dave Cundiff
2013-02-07 11:32 ` Mikael Abrahamsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5120A84E.4020702@websitemanagers.com.au \
--to=mailinglists@websitemanagers.com.au \
--cc=linux-raid@vger.kernel.org \
--cc=syshackmin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.