RBD speed vs threads

All of lore.kernel.org
 help / color / mirror / Atom feed

* RBD speed vs threads
@ 2012-06-15  5:56 Stefan Priebe - Profihost AG
  2012-06-15 12:03 ` Mark Nelson
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-15  5:56 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hello list,

i still don't understand why the speed of the rados bench depends so 
heavily on the threads.

Right now i get around 100MB/s per thread. So 1 thread is 100MB/s, 4 
Threads 400MB/s and 16 threads results an about 1100MB/s.

So 1100MB/s is great but i still don't get why 1 thread gets "only" 100MB/s.

Total time run:         30.037374
Total writes made:      8326
Write size:             4194304
Bandwidth (MB/sec):     1108.752

Stddev Bandwidth:       47.5612
Max bandwidth (MB/sec): 1152
Min bandwidth (MB/sec): 948
Average Latency:        0.0577107
Stddev Latency:         0.020784
Max latency:            0.382413
Min latency:            0.026057

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RBD speed vs threads
  2012-06-15  5:56 RBD speed vs threads Stefan Priebe - Profihost AG
@ 2012-06-15 12:03 ` Mark Nelson
  2012-06-15 16:33   ` Sage Weil
  2012-06-15 20:28   ` Stefan Priebe
  0 siblings, 2 replies; 5+ messages in thread
From: Mark Nelson @ 2012-06-15 12:03 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> Hello list,
>
> i still don't understand why the speed of the rados bench depends so
> heavily on the threads.
>
> Right now i get around 100MB/s per thread. So 1 thread is 100MB/s, 4
> Threads 400MB/s and 16 threads results an about 1100MB/s.
>
> So 1100MB/s is great but i still don't get why 1 thread gets "only"
> 100MB/s.
>
> Total time run: 30.037374
> Total writes made: 8326
> Write size: 4194304
> Bandwidth (MB/sec): 1108.752
>
> Stddev Bandwidth: 47.5612
> Max bandwidth (MB/sec): 1152
> Min bandwidth (MB/sec): 948
> Average Latency: 0.0577107
> Stddev Latency: 0.020784
> Max latency: 0.382413
> Min latency: 0.026057
>
> Stefan

Hi Stefan,

Let me preface this by saying that I haven't specifically read through 
the rados bench code.  Having said that, the basic idea here is that you 
have a pipeline where a request is sent from the client to an OSD.  If 
you specify "-t 1", the client will only send a single request at a 
time, which means that the entire process is serial and you are entirely 
latency bound.  Now think about what happens when the client sends the 
request.  Before client gets an acknowledgement, the request must:

1) Go through client side processing.
2) Travel over the IP network to the destination OSD.
3) Go through all of the queue processing code on the OSD.
4a) Write the data to the journal (Or the faster of the journal/data 
disk when using btrfs.  Note: The journal writes may stall if the data 
disk is too slow and the journal has gotten sufficiently ahead of it)
4b) Complete replication to other OSDs based on the pool's replication 
level and the placement group the data gets put in. (basically steps 
1,2,3,4a and 5 all over again with the OSD as the client).
5) Send the Ack back to the client over the IP network

If only one request is sent at a time, most of the hardware will sit 
idle while the request is making it's way through the pipeline.  If you 
have multiple concurrent requests, the OSD(s) can better utilize all of 
the hardware (ie some requests can be coming in over the network, while 
others can be writing to disk, while others can be replicating).

You can probably imagine that once you have multiple OSDs on multiple 
Nodes, having concurrent requests in flight help you even more.

Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RBD speed vs threads
  2012-06-15 12:03 ` Mark Nelson
@ 2012-06-15 16:33   ` Sage Weil
  2012-06-15 20:29     ` Stefan Priebe
  2012-06-15 20:28   ` Stefan Priebe
  1 sibling, 1 reply; 5+ messages in thread
From: Sage Weil @ 2012-06-15 16:33 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Stefan Priebe - Profihost AG, ceph-devel@vger.kernel.org

On Fri, 15 Jun 2012, Mark Nelson wrote:
> On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> > Hello list,
> > 
> > i still don't understand why the speed of the rados bench depends so
> > heavily on the threads.
> > 
> > Right now i get around 100MB/s per thread. So 1 thread is 100MB/s, 4
> > Threads 400MB/s and 16 threads results an about 1100MB/s.
> > 
> > So 1100MB/s is great but i still don't get why 1 thread gets "only"
> > 100MB/s.

The one other thing worth mentioning here is that "thread" is really a 
misnomer.  Rados bench is actually dispatching it's IO asynchronously from 
a single thread, and the -t option is really controlling the number of 
IO's in flight.  That is more or less what you get if you have N threads 
doing a single synchronous IO each, which is why the option is called 
that.

sage


> > 
> > Total time run: 30.037374
> > Total writes made: 8326
> > Write size: 4194304
> > Bandwidth (MB/sec): 1108.752
> > 
> > Stddev Bandwidth: 47.5612
> > Max bandwidth (MB/sec): 1152
> > Min bandwidth (MB/sec): 948
> > Average Latency: 0.0577107
> > Stddev Latency: 0.020784
> > Max latency: 0.382413
> > Min latency: 0.026057
> > 
> > Stefan
> 
> Hi Stefan,
> 
> Let me preface this by saying that I haven't specifically read through the
> rados bench code.  Having said that, the basic idea here is that you have a
> pipeline where a request is sent from the client to an OSD.  If you specify
> "-t 1", the client will only send a single request at a time, which means that
> the entire process is serial and you are entirely latency bound.  Now think
> about what happens when the client sends the request.  Before client gets an
> acknowledgement, the request must:
> 
> 1) Go through client side processing.
> 2) Travel over the IP network to the destination OSD.
> 3) Go through all of the queue processing code on the OSD.
> 4a) Write the data to the journal (Or the faster of the journal/data disk when
> using btrfs.  Note: The journal writes may stall if the data disk is too slow
> and the journal has gotten sufficiently ahead of it)
> 4b) Complete replication to other OSDs based on the pool's replication level
> and the placement group the data gets put in. (basically steps 1,2,3,4a and 5
> all over again with the OSD as the client).
> 5) Send the Ack back to the client over the IP network
> 
> If only one request is sent at a time, most of the hardware will sit idle
> while the request is making it's way through the pipeline.  If you have
> multiple concurrent requests, the OSD(s) can better utilize all of the
> hardware (ie some requests can be coming in over the network, while others can
> be writing to disk, while others can be replicating).
> 
> You can probably imagine that once you have multiple OSDs on multiple Nodes,
> having concurrent requests in flight help you even more.
> 
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RBD speed vs threads
  2012-06-15 16:33   ` Sage Weil
@ 2012-06-15 20:29     ` Stefan Priebe
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Priebe @ 2012-06-15 20:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: Mark Nelson, ceph-devel@vger.kernel.org

Am 15.06.2012 18:33, schrieb Sage Weil:
> The one other thing worth mentioning here is that "thread" is really a
> misnomer.  Rados bench is actually dispatching it's IO asynchronously from
> a single thread, and the -t option is really controlling the number of
> IO's in flight.  That is more or less what you get if you have N threads
> doing a single synchronous IO each, which is why the option is called
> that.

THX

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RBD speed vs threads
  2012-06-15 12:03 ` Mark Nelson
  2012-06-15 16:33   ` Sage Weil
@ 2012-06-15 20:28   ` Stefan Priebe
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Priebe @ 2012-06-15 20:28 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

Am 15.06.2012 14:03, schrieb Mark Nelson:
> On 06/15/2012 12:56 AM, Stefan Priebe - Profihost AG wrote:
> Let me preface this by saying that I haven't specifically read through
> the rados bench code.  Having said that, the basic idea here is that you
> have a pipeline where a request is sent from the client to an OSD.  If
> you specify "-t 1", the client will only send a single request at a
> time, which means that the entire process is serial and you are entirely
> latency bound.  Now think about what happens when the client sends the
> request.  Before client gets an acknowledgement, the request must:
>
> 1) Go through client side processing.
> 2) Travel over the IP network to the destination OSD.
> 3) Go through all of the queue processing code on the OSD.
> 4a) Write the data to the journal (Or the faster of the journal/data
> disk when using btrfs.  Note: The journal writes may stall if the data
> disk is too slow and the journal has gotten sufficiently ahead of it)
> 4b) Complete replication to other OSDs based on the pool's replication
> level and the placement group the data gets put in. (basically steps
> 1,2,3,4a and 5 all over again with the OSD as the client).
> 5) Send the Ack back to the client over the IP network
>
> If only one request is sent at a time, most of the hardware will sit
> idle while the request is making it's way through the pipeline.  If you
> have multiple concurrent requests, the OSD(s) can better utilize all of
> the hardware (ie some requests can be coming in over the network, while
> others can be writing to disk, while others can be replicating).
>
> You can probably imagine that once you have multiple OSDs on multiple
> Nodes, having concurrent requests in flight help you even more.

Thanks for your explanation.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-15 20:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-15  5:56 RBD speed vs threads Stefan Priebe - Profihost AG
2012-06-15 12:03 ` Mark Nelson
2012-06-15 16:33   ` Sage Weil
2012-06-15 20:29     ` Stefan Priebe
2012-06-15 20:28   ` Stefan Priebe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.