All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: poor OSD performance using kernel 3.4
Date: Tue, 29 May 2012 17:25:31 -0500	[thread overview]
Message-ID: <4FC54CDB.1000506@inktank.com> (raw)
In-Reply-To: <4FBE415E.8030702@profihost.ag>

On 05/24/2012 09:10 AM, Stefan Priebe - Profihost AG wrote:
> Hi list,
>
> today while testing btrfs i discovered a very poor osd performance using
> kernel 3.4.
>
> Underlying FS is XFS but it is the same with btrfs.
>
> 3.0.30:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      16        41        25   99.9767       100  0.586984  0.447293
>      2      16        71        55   109.979       120  0.934388  0.488375
>      3      16        99        83   110.647       112   1.15982  0.503111
>      4      16       130       114   113.981       124   1.05952  0.516925
>      5      16       159       143   114.382       116  0.149313  0.510734
>      6      16       188       172   114.649       116  0.287166   0.52203
>      7      16       215       199   113.697       108  0.151784  0.531461
>      8      16       242       226   112.984       108  0.623478  0.539896
>      9      16       265       249   110.651        92   0.50354  0.538504
>     10      16       296       280   111.984       124  0.155048  0.542846
> Total time run:        10.776153
> Total writes made:     297
> Write size:            4194304
> Bandwidth (MB/sec):    110.243
>
> Average Latency:       0.577534
> Max latency:           1.85499
> Min latency:           0.091473
>
>
> 3.4:
> ~# rados -p data bench 10 write -t 16
> Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds.
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      16        40        24   95.9794        96  0.393196  0.455936
>      2      16        68        52   103.983       112  0.835652  0.517297
>      3      16        85        69   91.9849        68   1.00535  0.493058
>      4      16        96        80   79.9869        44  0.096564  0.577948
>      5      16       103        87   69.5879        28  0.092722  0.589147
>      6      16       117       101   67.3216        56  0.222175  0.675334
>      7      16       130       114   65.1321        52   0.15677  0.623806
>      8      16       144       128   63.9896        56  0.089157   0.56746
>      9      16       144       128   56.8794         0         -   0.56746
>     10      16       144       128   51.1912         0         -   0.56746
>     11      16       144       128   46.5373         0         -   0.56746
>     12      16       144       128   42.6591         0         -   0.56746
>     13      16       144       128   39.3776         0         -   0.56746
>     14      16       144       128   36.5649         0         -   0.56746
>     15      16       144       128   34.1272         0         -   0.56746
>     16      16       145       129   32.2443       0.5   11.3422  0.650985
> Total time run:        16.193871
> Total writes made:     145
> Write size:            4194304
> Bandwidth (MB/sec):    35.816
>
> Average Latency:       1.78467
> Max latency:           14.4744
> Min latency:           0.088753
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I setup some tests today to try to replicate your findings (and also 
check results against some previous ones I've done).  I don't think I'm 
seeing exactly the same results as you, but I definitely see xfs 
performing worse in this specific test than btrfs.  I've included the 
results here.

Distro: Ubuntu Oneiric (IE no syncfs in glibc)
Ceph: 0.47.2
Kernel 3.4.0-ceph (autobuild-ceph@gitbuilder-kernel-amd64)
Network: 10GbE

1 Client node
3 Mon nodes
2 OSD nodes with 1 OSD each mounted on a 7200rpm SAS drive.  H700 Raid 
controller with each drive in a 1 disk raid0.  Journals are partitioned 
on a separate drive.  OSD data disks are using WT cache while journals 
are using WB.
btrfs created with -l 64k -n64k, mounted using noatime.
xfs created with -f -d su=64k,sw=1 -i size=2048, mounted using noatime.
rados bench invocation: rados -p data bench 300 write -t 16 -b 4194304

btrfs:

Total time run:        300.413696
Total writes made:     7582
Write size:            4194304
Bandwidth (MB/sec):    100.954

Average Latency:       0.633932
Max latency:           3.78661
Min latency:           0.065734

xfs:

Total time run:        304.435966
Total writes made:     5023
Write size:            4194304
Bandwidth (MB/sec):    65.997

Average Latency:       0.96965
Max latency:           36.4993
Min latency:           0.07516

Full results are available here:

http://nhm.ceph.com/results/mailinglist-tests/

I created seekwatcher movies by running blktrace on the underlying OSD 
data disks during the tests.  These show throughput over time, 
seeks/sec, and visual representation of where the disk is being written 
to for each OSD.  You can see them here:

http://nhm.ceph.com/movies/mailinglist-tests/

As you can see, at least for the quick tests I did this afternoon, the 
performance of the underlying OSD disk is highly correlated with the 
number of seeks being done.  These results may improve with syncfs 
support in Ubuntu 12.04.  If you have your journals on the same disks as 
the OSDs, that will cause even more seeks (in addition to the additional 
to the greater throughput demands).  These are things that we are 
actively investigating and hopefully will be able to improve over the 
coming months.

Thanks,
Mark


  parent reply	other threads:[~2012-05-29 22:31 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-24 14:10 poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
2012-05-24 14:57 ` Mark Nelson
     [not found] ` <CAJCPpW+SKnnVUaDEAsCkKyZwMVrHCRJF2C8zqB4eORgwW5p=1Q@mail.gmail.com>
     [not found]   ` <4FBE7ABC.5020502@profihost.ag>
2012-05-24 18:53     ` Mark Nelson
2012-05-24 19:05       ` Stefan Priebe
2012-05-25  1:53         ` Mark Nelson
2012-05-25  8:19           ` Stefan Priebe - Profihost AG
2012-05-25 11:31             ` Stefan Priebe - Profihost AG
2012-05-25 12:10               ` Stefan Priebe - Profihost AG
2012-05-25 15:47                 ` Alexandre DERUMIER
2012-05-27  9:11                   ` Stefan Priebe - Profihost AG
2012-05-27 11:33                     ` Alexandre DERUMIER
2012-05-27 18:57                       ` Stefan Priebe
2012-05-28  5:37                         ` Alexandre DERUMIER
2012-05-28  6:25                           ` Stefan Priebe
2012-05-28  6:52                             ` Alexandre DERUMIER
2012-05-28 19:48                               ` Stefan Priebe
2012-05-29  3:54                                 ` Alexandre DERUMIER
2012-05-29  8:22                                   ` Stefan Priebe - Profihost AG
2012-05-29 13:01                                     ` Alexandre DERUMIER
2012-05-29 14:18                                       ` Stefan Priebe - Profihost AG
2012-05-29  9:46                                   ` Stefan Priebe - Profihost AG
2012-05-29 13:39                                     ` Yann Dupont
2012-05-29 14:43                                       ` Stefan Priebe - Profihost AG
2012-05-29 17:50                                         ` Mark Nelson
2012-05-29 19:50                                           ` Yann Dupont
2012-05-29 21:04                                           ` Stefan Priebe
2012-05-29 21:08                                           ` Stefan Priebe
2012-05-29 21:31                                             ` Yann Dupont
2012-05-29 21:34                                               ` Stefan Priebe
2012-05-29 21:45                                                 ` Yann Dupont
2012-05-30  6:29                                                   ` Stefan Priebe - Profihost AG
2012-05-29 21:41                                             ` Mark Nelson
2012-05-30  6:22                                               ` Stefan Priebe - Profihost AG
2012-05-30  7:20                                                 ` building test cluster : missing /etc/ceph/client.admin.keyring, need help Alexandre DERUMIER
2012-05-30  7:25                                                   ` Stefan Priebe - Profihost AG
2012-05-30  7:33                                                     ` Alexandre DERUMIER
2012-05-30  7:47                                                       ` Alexandre DERUMIER
2012-05-29 22:25 ` Mark Nelson [this message]
2012-05-30  6:33   ` poor OSD performance using kernel 3.4 Stefan Priebe - Profihost AG
     [not found]     ` <CADdPHGs9dpSh9Oyu+5yDhyYU=Et_-zF5MuYybBuuAN5DgR433A@mail.gmail.com>
2012-05-30  7:16       ` Stefan Priebe - Profihost AG
     [not found]         ` <CADdPHGuiJqZUCK-0qR_CrOo6GRhkjaCdkOhJ2boq3zD0_voTsA@mail.gmail.com>
2012-05-30 11:04           ` Stefan Priebe - Profihost AG
     [not found]             ` <CADdPHGuLAL5+hkzq0tigqu355DvPxkhE5sxBhOVZPj=EzDSVtA@mail.gmail.com>
2012-05-30 11:25               ` Stefan Priebe - Profihost AG
2012-05-30 12:17             ` Mark Nelson
2012-05-30 12:41               ` Stefan Priebe - Profihost AG
     [not found]                 ` <CADdPHGsmr8Ht1pTWH1Oe8=NmAyM81SSdH+c_GV89D8ntfyUmgA@mail.gmail.com>
2012-05-30 13:19                   ` Stefan Priebe - Profihost AG
     [not found]                     ` <CADdPHGvxCmuViy+0==Vkdz_QjC1K+kD5kD1m7+0tYM2YDTtJbw@mail.gmail.com>
2012-05-30 13:54                       ` Stefan Priebe - Profihost AG
     [not found]                       ` <4FC63381.6090300@inktank.com>
2012-05-30 14:53                         ` Stefan Priebe
2012-05-30 14:56                           ` Mark Nelson
2012-05-30 18:26                             ` Stefan Priebe
2012-05-30 19:41                               ` Mark Nelson
2012-05-30 13:27                 ` Mark Nelson
2012-05-30 13:51                   ` Stefan Priebe - Profihost AG
2012-05-30 14:16                 ` Mark Nelson
2012-05-30 18:42                   ` Stefan Priebe
     [not found]                     ` <CADdPHGuxa7TAyqXcXehb9WgKgkHwkybYTrj2oue_PKsiF+oR3A@mail.gmail.com>
2012-05-30 21:10                       ` Stefan Priebe
     [not found]                         ` <CADdPHGutEwoDc=Kcrqcx2ZMO=dqhuoT5iLoP-WxqD+e5ZUmBRA@mail.gmail.com>
2012-05-31  7:10                           ` poor OSD performance using kernel 3.4 => problem found Stefan Priebe - Profihost AG
2012-05-31  7:30                             ` Yehuda Sadeh
     [not found]                               ` <CADdPHGtz9Jq624DMO6Dve2AcJ9vrnFHbyqRa+qheA+0-y4k++g@mail.gmail.com>
2012-05-31 12:31                                 ` Mark Nelson
2012-05-31 12:33                                   ` Stefan Priebe - Profihost AG
2012-05-31 13:21                               ` Yann Dupont
2012-05-31 13:37                                 ` Stefan Priebe - Profihost AG
2012-05-31 13:45                                   ` Yann Dupont
2012-05-31 14:42                                     ` Yann Dupont
2012-05-31 15:32                                       ` Mark Nelson
2012-05-31 15:43                                         ` Yann Dupont
2012-05-31 16:14                                           ` Mark Nelson
2012-05-31 16:29                                           ` Sage Weil
2012-05-31 16:37                                             ` Yann Dupont
     [not found]                             ` <CADdPHGv0YjxDQFnZML-55jDj7XxHxaxUZ_FeQ=ReKK6Rs7NNhw@mail.gmail.com>
2012-05-31  8:04                               ` Stefan Priebe - Profihost AG
2012-05-31  8:09                                 ` Stefan Majer
2012-05-31 11:34                                   ` Stefan Priebe - Profihost AG
2012-05-31 12:18                                   ` Stefan Priebe - Profihost AG
2012-05-30 11:51     ` poor OSD performance using kernel 3.4 Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC54CDB.1000506@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=s.priebe@profihost.ag \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.