Re: rbd_cache, limiting read on high iops around 40k

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mark Nelson <mnelson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Alexandre DERUMIER
	<aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org>,
	pushpesh sharma
	<pushpesh.eck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	ceph-users <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
Subject: Re: rbd_cache, limiting read on high iops around 40k
Date: Tue, 09 Jun 2015 06:36:31 -0500	[thread overview]
Message-ID: <5576CFBF.1070405@redhat.com> (raw)
In-Reply-To: <1897614581.1694878.1433838989184.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>

Hi All,

In the past we've hit some performance issues with RBD cache that we've 
fixed, but we've never really tried pushing a single VM beyond 40+K read 
IOPS in testing (or at least I never have).  I suspect there's a couple 
of possibilities as to why it might be slower, but perhaps joshd can 
chime in as he's more familiar with what that code looks like.

Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM!  How fast are the SSDs in those 3 OSDs?

Mark

On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote:
> It's seem that the limit is mainly going in high queue depth (+- > 16)
>
> Here the result in iops with 1client- 4krandread- 3osd - with differents queue depth size.
> rbd_cache is almost the same than without cache with queue depth <16
>
>
> cache
> -----
> qd1: 1651
> qd2: 3482
> qd4: 7958
> qd8: 17912
> qd16: 36020
> qd32: 42765
> qd64: 46169
>
> no cache
> --------
> qd1: 1748
> qd2: 3570
> qd4: 8356
> qd8: 17732
> qd16: 41396
> qd32: 78633
> qd64: 79063
> qd128: 79550
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier@odiso.com>
> À: "pushpesh sharma" <pushpesh.eck@gmail.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Mardi 9 Juin 2015 09:28:21
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> Hi,
>
>>> We tried adding more RBDs to single VM, but no luck.
>
> If you want to scale with more disks in a single qemu vm, you need to use iothread feature from qemu and assign 1 iothread by disk (works with virtio-blk).
> It's working for me, I can scale with adding more disks.
>
>
> My bench here are done with fio-rbd on host.
> I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and around 250kiops 10clients-rbdcache=on.
>
>
> I just wonder why I don't have performance decrease around 30k iops with 1osd.
>
> I'm going to see if this tracker
> http://tracker.ceph.com/issues/11056
>
> could be the cause.
>
> (My master build was done some week ago)
>
>
>
> ----- Mail original -----
> De: "pushpesh sharma" <pushpesh.eck@gmail.com>
> À: "aderumier" <aderumier@odiso.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Mardi 9 Juin 2015 09:21:04
> Objet: Re: rbd_cache, limiting read on high iops around 40k
>
> Hi Alexandre,
>
> We have also seen something very similar on Hammer(0.94-1). We were doing some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. We tried adding more RBDs to single VM, but no luck. However increasing number of VMs to 4 on a single hypervisor did scale to some extent. After this there was no much benefit we got from adding more VMs.
>
> Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor has 4 VM, each VM has 1 RBD:-
>
>
>
>
> VDbench is used as benchmarking tool. We were not saturating network and CPUs at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is where we were suspecting of some throttling effect. However we haven't setted any such limits from nova or kvm end. We tried some CPU pinning and other KVM related tuning as well, but no luck.
>
> We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that point the numbers were actually degrading. (Single pipe more congestion effect)
>
> We never suspected that rbd cache enable could be detrimental to performance. It would nice to route cause the problem if that is the case.
>
> On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderumier@odiso.com > wrote:
>
>
> Hi,
>
> I'm doing benchmark (ceph master branch), with randread 4k qdepth=32,
> and rbd_cache=true seem to limit the iops around 40k
>
>
> no cache
> --------
> 1 client - rbd_cache=false - 1osd : 38300 iops
> 1 client - rbd_cache=false - 2osd : 69073 iops
> 1 client - rbd_cache=false - 3osd : 78292 iops
>
>
> cache
> -----
> 1 client - rbd_cache=true - 1osd : 38100 iops
> 1 client - rbd_cache=true - 2osd : 42457 iops
> 1 client - rbd_cache=true - 3osd : 45823 iops
>
>
>
> Is it expected ?
>
>
>
> fio result rbd_cache=false 3 osd
> --------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.1.11
> Starting 1 process
> rbd engine: RBD version: 0.1.9
> Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 2015
> read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec
> slat (usec): min=5, max=530, avg=11.77, stdev= 6.77
> clat (usec): min=70, max=2240, avg=336.08, stdev=94.82
> lat (usec): min=101, max=2247, avg=347.84, stdev=95.49
> clat percentiles (usec):
> | 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262],
> | 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346],
> | 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506],
> | 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948],
> | 99.99th=[ 1176]
> bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, stdev=25196.21
> lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23%
> lat (msec) : 2=0.03%, 4=0.01%
> cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0%
> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, mint=32698msec, maxt=32698msec
>
> Disk stats (read/write):
> dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
> sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00%
>
>
>
>
> fio result rbd_cache=true 3osd
> ------------------------------
>
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.1.11
> Starting 1 process
> rbd engine: RBD version: 0.1.9
> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 07:47:30 2015
> read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec
> slat (usec): min=7, max=805, avg=21.26, stdev=15.84
> clat (usec): min=101, max=4602, avg=478.55, stdev=143.73
> lat (usec): min=123, max=4669, avg=499.80, stdev=146.03
> clat percentiles (usec):
> | 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350],
> | 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506],
> | 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724],
> | 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288],
> | 99.99th=[ 2192]
> bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, stdev=15079.93
> lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28%
> lat (msec) : 2=0.19%, 4=0.01%, 10=0.01%
> cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100
> IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0%
> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, mint=55866msec, maxt=55866msec
>
> Disk stats (read/write):
> dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01%
> sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01%
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

next prev parent reply	other threads:[~2015-06-09 11:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-09  5:51 rbd_cache, limiting read on high iops around 40k Alexandre DERUMIER
     [not found] ` <1684793881.1564583.1433829106394.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-09  7:21   ` pushpesh sharma
     [not found]     ` <CAMc8nAWo-jnAHS5cLw5gDt57T3vZpiN79vFXc=pz=+Cjm6Ra6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-09  7:28       ` Alexandre DERUMIER
2015-06-09  8:36         ` [ceph-users] " Alexandre DERUMIER
     [not found]           ` <1897614581.1694878.1433838989184.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-09 11:36             ` Mark Nelson [this message]
2015-06-09 12:02               ` Alexandre DERUMIER
     [not found]                 ` <1208111516.1790161.1433851367996.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-09 16:00                   ` Robert LeBlanc
2015-06-09 16:47                     ` [ceph-users] " Alexandre DERUMIER
     [not found]                       ` <1058039366.2034449.1433868447253.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-10  4:10                         ` Alexandre DERUMIER
     [not found]                           ` <284297771.2095666.1433909407567.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-10  5:21                             ` Irek Fasikhov
     [not found]                               ` <CAF-rypxjbsH3GdUG474OgSZVjdzKyf_0n8-zAkAuGhk83TXQhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-10  5:41                                 ` Alexandre DERUMIER
     [not found]                                   ` <2010200873.2102614.1433914918985.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-10  7:06                                     ` Somnath Roy
2015-06-10  7:29                                       ` Alexandre DERUMIER
2015-06-12  5:52                                         ` pushpesh sharma
2015-06-12  6:03                                           ` Alexandre DERUMIER
2015-06-12  6:58                                             ` pushpesh sharma
2015-06-16 16:38                                               ` Alexandre DERUMIER
2015-06-22  5:58                                                 ` pushpesh sharma
2015-06-22  7:08                                                   ` Alexandre DERUMIER
2015-06-22  7:12                                                     ` Stefan Priebe - Profihost AG
     [not found]                                                       ` <942E436A-5668-4F76-91E7-FAA08CC0F48A-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2015-06-22  7:22                                                         ` Irek Fasikhov
2015-06-22  8:54                                                           ` Alexandre DERUMIER
     [not found]                                                             ` <1581092206.1667776.1434963299884.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-22  9:04                                                               ` Irek Fasikhov
2015-06-22  9:26                                                                 ` Alexandre DERUMIER
     [not found]                                                                   ` <43279853.1688973.1434965164602.JavaMail.zimbra-M8QNeUgB6UTyG1zEObXtfA@public.gmane.org>
2015-06-22  9:28                                                                     ` Stefan Priebe - Profihost AG
     [not found]                                                                       ` <B7D8B5F0-4AB9-449A-895D-CF87AE49BCF6-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2015-06-22  9:57                                                                         ` Alexandre DERUMIER
2015-06-09 13:39               ` [ceph-users] " Jason Dillaman
     [not found]                 ` <1569135212.13362835.1433857190455.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-06-09 16:52                   ` Alexandre DERUMIER

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5576CFBF.1070405@redhat.com \
    --to=mnelson-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=aderumier-U/x3PoR4x10AvxtiuMwx3w@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
    --cc=pushpesh.eck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.