* Re: RBD speed vs threads
2012-06-15 12:03 ` Mark Nelson
@ 2012-06-15 16:33 ` Sage Weil
2012-06-15 20:29 ` Stefan Priebe
2012-06-15 20:28 ` Stefan Priebe
1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2012-06-15 16:33 UTC (permalink / raw)
To: Mark Nelson; +Cc: Stefan Priebe - Profihost AG, ceph-devel@vger.kernel.org
On Fri, 15 Jun 2012, Mark Nelson wrote:
> On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> > Hello list,
> >
> > i still don't understand why the speed of the rados bench depends so
> > heavily on the threads.
> >
> > Right now i get around 100MB/s per thread. So 1 thread is 100MB/s, 4
> > Threads 400MB/s and 16 threads results an about 1100MB/s.
> >
> > So 1100MB/s is great but i still don't get why 1 thread gets "only"
> > 100MB/s.
The one other thing worth mentioning here is that "thread" is really a
misnomer. Rados bench is actually dispatching it's IO asynchronously from
a single thread, and the -t option is really controlling the number of
IO's in flight. That is more or less what you get if you have N threads
doing a single synchronous IO each, which is why the option is called
that.
sage
> >
> > Total time run: 30.037374
> > Total writes made: 8326
> > Write size: 4194304
> > Bandwidth (MB/sec): 1108.752
> >
> > Stddev Bandwidth: 47.5612
> > Max bandwidth (MB/sec): 1152
> > Min bandwidth (MB/sec): 948
> > Average Latency: 0.0577107
> > Stddev Latency: 0.020784
> > Max latency: 0.382413
> > Min latency: 0.026057
> >
> > Stefan
>
> Hi Stefan,
>
> Let me preface this by saying that I haven't specifically read through the
> rados bench code. Having said that, the basic idea here is that you have a
> pipeline where a request is sent from the client to an OSD. If you specify
> "-t 1", the client will only send a single request at a time, which means that
> the entire process is serial and you are entirely latency bound. Now think
> about what happens when the client sends the request. Before client gets an
> acknowledgement, the request must:
>
> 1) Go through client side processing.
> 2) Travel over the IP network to the destination OSD.
> 3) Go through all of the queue processing code on the OSD.
> 4a) Write the data to the journal (Or the faster of the journal/data disk when
> using btrfs. Note: The journal writes may stall if the data disk is too slow
> and the journal has gotten sufficiently ahead of it)
> 4b) Complete replication to other OSDs based on the pool's replication level
> and the placement group the data gets put in. (basically steps 1,2,3,4a and 5
> all over again with the OSD as the client).
> 5) Send the Ack back to the client over the IP network
>
> If only one request is sent at a time, most of the hardware will sit idle
> while the request is making it's way through the pipeline. If you have
> multiple concurrent requests, the OSD(s) can better utilize all of the
> hardware (ie some requests can be coming in over the network, while others can
> be writing to disk, while others can be replicating).
>
> You can probably imagine that once you have multiple OSDs on multiple Nodes,
> having concurrent requests in flight help you even more.
>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RBD speed vs threads
2012-06-15 12:03 ` Mark Nelson
2012-06-15 16:33 ` Sage Weil
@ 2012-06-15 20:28 ` Stefan Priebe
1 sibling, 0 replies; 5+ messages in thread
From: Stefan Priebe @ 2012-06-15 20:28 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Am 15.06.2012 14:03, schrieb Mark Nelson:
> On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> Let me preface this by saying that I haven't specifically read through
> the rados bench code. Having said that, the basic idea here is that you
> have a pipeline where a request is sent from the client to an OSD. If
> you specify "-t 1", the client will only send a single request at a
> time, which means that the entire process is serial and you are entirely
> latency bound. Now think about what happens when the client sends the
> request. Before client gets an acknowledgement, the request must:
>
> 1) Go through client side processing.
> 2) Travel over the IP network to the destination OSD.
> 3) Go through all of the queue processing code on the OSD.
> 4a) Write the data to the journal (Or the faster of the journal/data
> disk when using btrfs. Note: The journal writes may stall if the data
> disk is too slow and the journal has gotten sufficiently ahead of it)
> 4b) Complete replication to other OSDs based on the pool's replication
> level and the placement group the data gets put in. (basically steps
> 1,2,3,4a and 5 all over again with the OSD as the client).
> 5) Send the Ack back to the client over the IP network
>
> If only one request is sent at a time, most of the hardware will sit
> idle while the request is making it's way through the pipeline. If you
> have multiple concurrent requests, the OSD(s) can better utilize all of
> the hardware (ie some requests can be coming in over the network, while
> others can be writing to disk, while others can be replicating).
>
> You can probably imagine that once you have multiple OSDs on multiple
> Nodes, having concurrent requests in flight help you even more.
Thanks for your explanation.
Stefan
^ permalink raw reply [flat|nested] 5+ messages in thread