From: Mark Nelson <mark.nelson@inktank.com>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: poor OSD performance using kernel 3.4
Date: Tue, 29 May 2012 17:25:31 -0500 [thread overview]
Message-ID: <4FC54CDB.1000506@inktank.com> (raw)
In-Reply-To: <4FBE415E.8030702@profihost.ag>
On 05/24/2012 09:10 AM, Stefan Priebe - Profihost AG wrote:
> Hi list,
>
> today while testing btrfs i discovered a very poor osd performance using
> kernel 3.4.
>
> Underlying FS is XFS but it is the same with btrfs.
>
> 3.0.30:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 41 25 99.9767 100 0.586984 0.447293
> 2 16 71 55 109.979 120 0.934388 0.488375
> 3 16 99 83 110.647 112 1.15982 0.503111
> 4 16 130 114 113.981 124 1.05952 0.516925
> 5 16 159 143 114.382 116 0.149313 0.510734
> 6 16 188 172 114.649 116 0.287166 0.52203
> 7 16 215 199 113.697 108 0.151784 0.531461
> 8 16 242 226 112.984 108 0.623478 0.539896
> 9 16 265 249 110.651 92 0.50354 0.538504
> 10 16 296 280 111.984 124 0.155048 0.542846
> Total time run: 10.776153
> Total writes made: 297
> Write size: 4194304
> Bandwidth (MB/sec): 110.243
>
> Average Latency: 0.577534
> Max latency: 1.85499
> Min latency: 0.091473
>
>
> 3.4:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 40 24 95.9794 96 0.393196 0.455936
> 2 16 68 52 103.983 112 0.835652 0.517297
> 3 16 85 69 91.9849 68 1.00535 0.493058
> 4 16 96 80 79.9869 44 0.096564 0.577948
> 5 16 103 87 69.5879 28 0.092722 0.589147
> 6 16 117 101 67.3216 56 0.222175 0.675334
> 7 16 130 114 65.1321 52 0.15677 0.623806
> 8 16 144 128 63.9896 56 0.089157 0.56746
> 9 16 144 128 56.8794 0 - 0.56746
> 10 16 144 128 51.1912 0 - 0.56746
> 11 16 144 128 46.5373 0 - 0.56746
> 12 16 144 128 42.6591 0 - 0.56746
> 13 16 144 128 39.3776 0 - 0.56746
> 14 16 144 128 36.5649 0 - 0.56746
> 15 16 144 128 34.1272 0 - 0.56746
> 16 16 145 129 32.2443 0.5 11.3422 0.650985
> Total time run: 16.193871
> Total writes made: 145
> Write size: 4194304
> Bandwidth (MB/sec): 35.816
>
> Average Latency: 1.78467
> Max latency: 14.4744
> Min latency: 0.088753
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I setup some tests today to try to replicate your findings (and also
check results against some previous ones I've done). I don't think I'm
seeing exactly the same results as you, but I definitely see xfs
performing worse in this specific test than btrfs. I've included the
results here.
Distro: Ubuntu Oneiric (IE no syncfs in glibc)
Ceph: 0.47.2
Kernel 3.4.0-ceph (autobuild-ceph@gitbuilder-kernel-amd64)
Network: 10GbE
1 Client node
3 Mon nodes
2 OSD nodes with 1 OSD each mounted on a 7200rpm SAS drive. H700 Raid
controller with each drive in a 1 disk raid0. Journals are partitioned
on a separate drive. OSD data disks are using WT cache while journals
are using WB.
btrfs created with -l 64k -n64k, mounted using noatime.
xfs created with -f -d su=64k,sw=1 -i size=2048, mounted using noatime.
rados bench invocation: rados -p data bench 300 write -t 16 -b 4194304
btrfs:
Total time run: 300.413696
Total writes made: 7582
Write size: 4194304
Bandwidth (MB/sec): 100.954
Average Latency: 0.633932
Max latency: 3.78661
Min latency: 0.065734
xfs:
Total time run: 304.435966
Total writes made: 5023
Write size: 4194304
Bandwidth (MB/sec): 65.997
Average Latency: 0.96965
Max latency: 36.4993
Min latency: 0.07516
Full results are available here:
http://nhm.ceph.com/results/mailinglist-tests/
I created seekwatcher movies by running blktrace on the underlying OSD
data disks during the tests. These show throughput over time,
seeks/sec, and visual representation of where the disk is being written
to for each OSD. You can see them here:
http://nhm.ceph.com/movies/mailinglist-tests/
As you can see, at least for the quick tests I did this afternoon, the
performance of the underlying OSD disk is highly correlated with the
number of seeks being done. These results may improve with syncfs
support in Ubuntu 12.04. If you have your journals on the same disks as
the OSDs, that will cause even more seeks (in addition to the additional
to the greater throughput demands). These are things that we are
actively investigating and hopefully will be able to improve over the
coming months.
Thanks,
Mark
next prev parent reply other threads:[~2012-05-29 22:31 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-24 14:10 poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
2012-05-24 14:57 ` Mark Nelson
[not found] ` <CAJCPpW+SKnnVUaDEAsCkKyZwMVrHCRJF2C8zqB4eORgwW5p=1Q@mail.gmail.com>
[not found] ` <4FBE7ABC.5020502@profihost.ag>
2012-05-24 18:53 ` Mark Nelson
2012-05-24 19:05 ` Stefan Priebe
2012-05-25 1:53 ` Mark Nelson
2012-05-25 8:19 ` Stefan Priebe - Profihost AG
2012-05-25 11:31 ` Stefan Priebe - Profihost AG
2012-05-25 12:10 ` Stefan Priebe - Profihost AG
2012-05-25 15:47 ` Alexandre DERUMIER
2012-05-27 9:11 ` Stefan Priebe - Profihost AG
2012-05-27 11:33 ` Alexandre DERUMIER
2012-05-27 18:57 ` Stefan Priebe
2012-05-28 5:37 ` Alexandre DERUMIER
2012-05-28 6:25 ` Stefan Priebe
2012-05-28 6:52 ` Alexandre DERUMIER
2012-05-28 19:48 ` Stefan Priebe
2012-05-29 3:54 ` Alexandre DERUMIER
2012-05-29 8:22 ` Stefan Priebe - Profihost AG
2012-05-29 13:01 ` Alexandre DERUMIER
2012-05-29 14:18 ` Stefan Priebe - Profihost AG
2012-05-29 9:46 ` Stefan Priebe - Profihost AG
2012-05-29 13:39 ` Yann Dupont
2012-05-29 14:43 ` Stefan Priebe - Profihost AG
2012-05-29 17:50 ` Mark Nelson
2012-05-29 19:50 ` Yann Dupont
2012-05-29 21:04 ` Stefan Priebe
2012-05-29 21:08 ` Stefan Priebe
2012-05-29 21:31 ` Yann Dupont
2012-05-29 21:34 ` Stefan Priebe
2012-05-29 21:45 ` Yann Dupont
2012-05-30 6:29 ` Stefan Priebe - Profihost AG
2012-05-29 21:41 ` Mark Nelson
2012-05-30 6:22 ` Stefan Priebe - Profihost AG
2012-05-30 7:20 ` building test cluster : missing /etc/ceph/client.admin.keyring, need help Alexandre DERUMIER
2012-05-30 7:25 ` Stefan Priebe - Profihost AG
2012-05-30 7:33 ` Alexandre DERUMIER
2012-05-30 7:47 ` Alexandre DERUMIER
2012-05-29 22:25 ` Mark Nelson [this message]
2012-05-30 6:33 ` poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
[not found] ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
2012-05-30 7:16 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuiJqZUCK-0qR_CrOo6GRhkjaCdkOhJ2boq3zD0_voTsA@mail.gmail.com>
2012-05-30 11:04 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
2012-05-30 11:25 ` Stefan Priebe - Profihost AG
2012-05-30 12:17 ` Mark Nelson
2012-05-30 12:41 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
2012-05-30 13:19 ` Stefan Priebe - Profihost AG
[not found] ` <CADdPHGvxCmuViy+0==Vkdz_QjC1K+kD5kD1m7+0tYM2YDTtJbw@mail.gmail.com>
2012-05-30 13:54 ` Stefan Priebe - Profihost AG
[not found] ` <4FC63381.6090300@inktank.com>
2012-05-30 14:53 ` Stefan Priebe
2012-05-30 14:56 ` Mark Nelson
2012-05-30 18:26 ` Stefan Priebe
2012-05-30 19:41 ` Mark Nelson
2012-05-30 13:27 ` Mark Nelson
2012-05-30 13:51 ` Stefan Priebe - Profihost AG
2012-05-30 14:16 ` Mark Nelson
2012-05-30 18:42 ` Stefan Priebe
[not found] ` <CADdPHGuxa7TAyqXcXehb9WgKgkHwkybYTrj2oue_PKsiF+oR3A@mail.gmail.com>
2012-05-30 21:10 ` Stefan Priebe
[not found] ` <CADdPHGutEwoDc=Kcrqcx2ZMO=dqhuoT5iLoP-WxqD+e5ZUmBRA@mail.gmail.com>
2012-05-31 7:10 ` poor OSD performance using kernel 3.4 => problem found Stefan Priebe - Profihost AG
2012-05-31 7:30 ` Yehuda Sadeh
[not found] ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
2012-05-31 12:31 ` Mark Nelson
2012-05-31 12:33 ` Stefan Priebe - Profihost AG
2012-05-31 13:21 ` Yann Dupont
2012-05-31 13:37 ` Stefan Priebe - Profihost AG
2012-05-31 13:45 ` Yann Dupont
2012-05-31 14:42 ` Yann Dupont
2012-05-31 15:32 ` Mark Nelson
2012-05-31 15:43 ` Yann Dupont
2012-05-31 16:14 ` Mark Nelson
2012-05-31 16:29 ` Sage Weil
2012-05-31 16:37 ` Yann Dupont
[not found] ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
2012-05-31 8:04 ` Stefan Priebe - Profihost AG
2012-05-31 8:09 ` Stefan Majer
2012-05-31 11:34 ` Stefan Priebe - Profihost AG
2012-05-31 12:18 ` Stefan Priebe - Profihost AG
2012-05-30 11:51 ` poor OSD performance using kernel 3.4 Mark Nelson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC54CDB.1000506@inktank.com \
--to=mark.nelson@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=s.priebe@profihost.ag \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.