From: Mark Nelson <mark.nelson@inktank.com>
To: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: RBD speed vs threads
Date: Fri, 15 Jun 2012 07:03:44 -0500 [thread overview]
Message-ID: <4FDB24A0.8000401@inktank.com> (raw)
In-Reply-To: <4FDACEAA.8080306@profihost.ag>
On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> Hello list,
>
> i still don't understand why the speed of the rados bench depends so
> heavily on the threads.
>
> Right now i get around 100MB/s per thread. So 1 thread is 100MB/s, 4
> Threads 400MB/s and 16 threads results an about 1100MB/s.
>
> So 1100MB/s is great but i still don't get why 1 thread gets "only"
> 100MB/s.
>
> Total time run: 30.037374
> Total writes made: 8326
> Write size: 4194304
> Bandwidth (MB/sec): 1108.752
>
> Stddev Bandwidth: 47.5612
> Max bandwidth (MB/sec): 1152
> Min bandwidth (MB/sec): 948
> Average Latency: 0.0577107
> Stddev Latency: 0.020784
> Max latency: 0.382413
> Min latency: 0.026057
>
> Stefan
Hi Stefan,
Let me preface this by saying that I haven't specifically read through
the rados bench code. Having said that, the basic idea here is that you
have a pipeline where a request is sent from the client to an OSD. If
you specify "-t 1", the client will only send a single request at a
time, which means that the entire process is serial and you are entirely
latency bound. Now think about what happens when the client sends the
request. Before client gets an acknowledgement, the request must:
1) Go through client side processing.
2) Travel over the IP network to the destination OSD.
3) Go through all of the queue processing code on the OSD.
4a) Write the data to the journal (Or the faster of the journal/data
disk when using btrfs. Note: The journal writes may stall if the data
disk is too slow and the journal has gotten sufficiently ahead of it)
4b) Complete replication to other OSDs based on the pool's replication
level and the placement group the data gets put in. (basically steps
1,2,3,4a and 5 all over again with the OSD as the client).
5) Send the Ack back to the client over the IP network
If only one request is sent at a time, most of the hardware will sit
idle while the request is making it's way through the pipeline. If you
have multiple concurrent requests, the OSD(s) can better utilize all of
the hardware (ie some requests can be coming in over the network, while
others can be writing to disk, while others can be replicating).
You can probably imagine that once you have multiple OSDs on multiple
Nodes, having concurrent requests in flight help you even more.
Mark
next prev parent reply other threads:[~2012-06-15 12:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-15 5:56 RBD speed vs threads Stefan Priebe - Profihost AG
2012-06-15 12:03 ` Mark Nelson [this message]
2012-06-15 16:33 ` Sage Weil
2012-06-15 20:29 ` Stefan Priebe
2012-06-15 20:28 ` Stefan Priebe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FDB24A0.8000401@inktank.com \
--to=mark.nelson@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=s.priebe@profihost.ag \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.