Re: RBD vs RADOS benchmark performance

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: RBD vs RADOS benchmark performance
       [not found] ` <1368423516.6771.2.camel@localhost>
@ 2013-05-13 12:26   ` Greg
  2013-05-13 13:55     ` [ceph-users] " Mark Nelson
       [not found]     ` <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Greg @ 2013-05-13 12:26 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA; +Cc: ceph-users-Qp0mS5GaXlQ

Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
> Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
>> Hello folks,
>>
>> I'm in the process of testing CEPH and RBD, I have set up a small
>> cluster of  hosts running each a MON and an OSD with both journal and
>> data on the same SSD (ok this is stupid but this is simple to verify the
>> disks are not the bottleneck for 1 client). All nodes are connected on a
>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>
>> Summary : the RBD performance is poor compared to benchmark
>>
>> A 5 seconds seq read benchmark shows something like this :
>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>>>       0       0         0         0         0 0         -         0
>>>       1      16        39        23   91.9586        92 0.966117  0.431249
>>>       2      16        64        48   95.9602       100 0.513435   0.53849
>>>       3      16        90        74   98.6317       104 0.25631   0.55494
>>>       4      11        95        84   83.9735        40 1.80038   0.58712
>>>   Total time run:        4.165747
>>> Total reads made:     95
>>> Read size:            4194304
>>> Bandwidth (MB/sec):    91.220
>>>
>>> Average Latency:       0.678901
>>> Max latency:           1.80038
>>> Min latency:           0.104719
>> 91MB read performance, quite good !
>>
>> Now the RBD performance :
>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>> 100+0 records in
>>> 100+0 records out
>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>> dd on block device)
>>
>> The network is ok, the CPU is also ok on all OSDs.
>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>> patches for the SoC being used)
>>
>> Can you show me the starting point for digging into this ?
> You should try to increase read_ahead to 512K instead of the defaults
> 128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
> on reads with that.
>
Olivier,

thanks a lot for pointing this out, it indeed makes a *huge* difference !
> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
(caches dropped before each test of course)

Mark, this is probably something you will want to investigate and 
explain in a "tweaking" topic of the documentation.

Regards,
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] RBD vs RADOS benchmark performance
  2013-05-13 12:26   ` RBD vs RADOS benchmark performance Greg
@ 2013-05-13 13:55     ` Mark Nelson
  2013-05-13 14:52       ` Greg
       [not found]     ` <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Mark Nelson @ 2013-05-13 13:55 UTC (permalink / raw)
  To: Greg; +Cc: ceph-devel, Olivier Bonvalet, ceph-users

On 05/13/2013 07:26 AM, Greg wrote:
> Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
>> Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
>>> Hello folks,
>>>
>>> I'm in the process of testing CEPH and RBD, I have set up a small
>>> cluster of  hosts running each a MON and an OSD with both journal and
>>> data on the same SSD (ok this is stupid but this is simple to verify the
>>> disks are not the bottleneck for 1 client). All nodes are connected on a
>>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>>
>>> Summary : the RBD performance is poor compared to benchmark
>>>
>>> A 5 seconds seq read benchmark shows something like this :
>>>>     sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat
>>>> avg lat
>>>>       0       0         0         0         0 0         -         0
>>>>       1      16        39        23   91.9586        92 0.966117
>>>> 0.431249
>>>>       2      16        64        48   95.9602       100 0.513435
>>>> 0.53849
>>>>       3      16        90        74   98.6317       104 0.25631
>>>> 0.55494
>>>>       4      11        95        84   83.9735        40 1.80038
>>>> 0.58712
>>>>   Total time run:        4.165747
>>>> Total reads made:     95
>>>> Read size:            4194304
>>>> Bandwidth (MB/sec):    91.220
>>>>
>>>> Average Latency:       0.678901
>>>> Max latency:           1.80038
>>>> Min latency:           0.104719
>>> 91MB read performance, quite good !
>>>
>>> Now the RBD performance :
>>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>>> dd on block device)
>>>
>>> The network is ok, the CPU is also ok on all OSDs.
>>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>>> patches for the SoC being used)
>>>
>>> Can you show me the starting point for digging into this ?
>> You should try to increase read_ahead to 512K instead of the defaults
>> 128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
>> on reads with that.
>>
> Olivier,
>
> thanks a lot for pointing this out, it indeed makes a *huge* difference !
>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>> 100+0 records in
>> 100+0 records out
>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
> (caches dropped before each test of course)
>
> Mark, this is probably something you will want to investigate and
> explain in a "tweaking" topic of the documentation.
>
> Regards,

Out of curiosity, has your rados bench performance improved as well? 
We've also seen improvements for sequential read throughput when 
increasing read_ahead_kb. (it may decrease random iops in some cases 
though!)  The reason I didn't think to mention it here though is because 
I was just focused on the difference between rados bench and rbd.  It 
would be interesting to know if rbd has improved more dramatically than 
rados bench.

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] RBD vs RADOS benchmark performance
  2013-05-13 13:55     ` [ceph-users] " Mark Nelson
@ 2013-05-13 14:52       ` Greg
       [not found]         ` <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Greg @ 2013-05-13 14:52 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel, Olivier Bonvalet, ceph-users

Le 13/05/2013 15:55, Mark Nelson a écrit :
> On 05/13/2013 07:26 AM, Greg wrote:
>> Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
>>> Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
>>>> Hello folks,
>>>>
>>>> I'm in the process of testing CEPH and RBD, I have set up a small
>>>> cluster of  hosts running each a MON and an OSD with both journal and
>>>> data on the same SSD (ok this is stupid but this is simple to 
>>>> verify the
>>>> disks are not the bottleneck for 1 client). All nodes are connected 
>>>> on a
>>>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>>>
>>>> Summary : the RBD performance is poor compared to benchmark
>>>>
>>>> A 5 seconds seq read benchmark shows something like this :
>>>>>     sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
>>>>> avg lat
>>>>>       0       0         0         0         0 0 -         0
>>>>>       1      16        39        23   91.9586        92 0.966117
>>>>> 0.431249
>>>>>       2      16        64        48   95.9602       100 0.513435
>>>>> 0.53849
>>>>>       3      16        90        74   98.6317       104 0.25631
>>>>> 0.55494
>>>>>       4      11        95        84   83.9735        40 1.80038
>>>>> 0.58712
>>>>>   Total time run:        4.165747
>>>>> Total reads made:     95
>>>>> Read size:            4194304
>>>>> Bandwidth (MB/sec):    91.220
>>>>>
>>>>> Average Latency:       0.678901
>>>>> Max latency:           1.80038
>>>>> Min latency:           0.104719
>>>> 91MB read performance, quite good !
>>>>
>>>> Now the RBD performance :
>>>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>>>> 100+0 records in
>>>>> 100+0 records out
>>>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>>>> dd on block device)
>>>>
>>>> The network is ok, the CPU is also ok on all OSDs.
>>>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>>>> patches for the SoC being used)
>>>>
>>>> Can you show me the starting point for digging into this ?
>>> You should try to increase read_ahead to 512K instead of the defaults
>>> 128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
>>> on reads with that.
>>>
>> Olivier,
>>
>> thanks a lot for pointing this out, it indeed makes a *huge* 
>> difference !
>>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>> 100+0 records in
>>> 100+0 records out
>>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>> (caches dropped before each test of course)
>>
>> Mark, this is probably something you will want to investigate and
>> explain in a "tweaking" topic of the documentation.
>>
>> Regards,
>
> Out of curiosity, has your rados bench performance improved as well? 
> We've also seen improvements for sequential read throughput when 
> increasing read_ahead_kb. (it may decrease random iops in some cases 
> though!)  The reason I didn't think to mention it here though is 
> because I was just focused on the difference between rados bench and 
> rbd.  It would be interesting to know if rbd has improved more 
> dramatically than rados bench.
Mark, the read ahead is set on the RBD block device (on the client), so 
it doesn't improve benchmark results as the benchmark doesn't use the 
block layer.

1 question remains : why did I have poor performance with 1 single 
writing thread ?

Regards,
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>]

* Re: RBD vs RADOS benchmark performance
       [not found]         ` <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
@ 2013-05-13 15:17           ` Mark Nelson
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Nelson @ 2013-05-13 15:17 UTC (permalink / raw)
  To: Greg; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users-Qp0mS5GaXlQ

On 05/13/2013 09:52 AM, Greg wrote:
> Le 13/05/2013 15:55, Mark Nelson a écrit :
>> On 05/13/2013 07:26 AM, Greg wrote:
>>> Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
>>>> Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
>>>>> Hello folks,
>>>>>
>>>>> I'm in the process of testing CEPH and RBD, I have set up a small
>>>>> cluster of  hosts running each a MON and an OSD with both journal and
>>>>> data on the same SSD (ok this is stupid but this is simple to
>>>>> verify the
>>>>> disks are not the bottleneck for 1 client). All nodes are connected
>>>>> on a
>>>>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>>>>
>>>>> Summary : the RBD performance is poor compared to benchmark
>>>>>
>>>>> A 5 seconds seq read benchmark shows something like this :
>>>>>>     sec Cur ops   started  finished avg MB/s  cur MB/s  last lat
>>>>>> avg lat
>>>>>>       0       0         0         0         0 0 -         0
>>>>>>       1      16        39        23   91.9586        92 0.966117
>>>>>> 0.431249
>>>>>>       2      16        64        48   95.9602       100 0.513435
>>>>>> 0.53849
>>>>>>       3      16        90        74   98.6317       104 0.25631
>>>>>> 0.55494
>>>>>>       4      11        95        84   83.9735        40 1.80038
>>>>>> 0.58712
>>>>>>   Total time run:        4.165747
>>>>>> Total reads made:     95
>>>>>> Read size:            4194304
>>>>>> Bandwidth (MB/sec):    91.220
>>>>>>
>>>>>> Average Latency:       0.678901
>>>>>> Max latency:           1.80038
>>>>>> Min latency:           0.104719
>>>>> 91MB read performance, quite good !
>>>>>
>>>>> Now the RBD performance :
>>>>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>>>>> 100+0 records in
>>>>>> 100+0 records out
>>>>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>>>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>>>>> dd on block device)
>>>>>
>>>>> The network is ok, the CPU is also ok on all OSDs.
>>>>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>>>>> patches for the SoC being used)
>>>>>
>>>>> Can you show me the starting point for digging into this ?
>>>> You should try to increase read_ahead to 512K instead of the defaults
>>>> 128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
>>>> on reads with that.
>>>>
>>> Olivier,
>>>
>>> thanks a lot for pointing this out, it indeed makes a *huge*
>>> difference !
>>>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>>> (caches dropped before each test of course)
>>>
>>> Mark, this is probably something you will want to investigate and
>>> explain in a "tweaking" topic of the documentation.
>>>
>>> Regards,
>>
>> Out of curiosity, has your rados bench performance improved as well?
>> We've also seen improvements for sequential read throughput when
>> increasing read_ahead_kb. (it may decrease random iops in some cases
>> though!)  The reason I didn't think to mention it here though is
>> because I was just focused on the difference between rados bench and
>> rbd.  It would be interesting to know if rbd has improved more
>> dramatically than rados bench.
> Mark, the read ahead is set on the RBD block device (on the client), so
> it doesn't improve benchmark results as the benchmark doesn't use the
> block layer.

Ah, I was thinking you had increased it on the OSDs (which can also 
help).  On the OSD side, if you are targeting spinning disks, it can 
depend a lot on how much data is stored per track and the cost of head 
switches and track switches.

>
> 1 question remains : why did I have poor performance with 1 single
> writing thread ?

In general, parallelism is really helpful because it hides latency and 
also helps you spread the load over all of your OSDs.  Even on a single 
disk, having concurrent requests lets the scheduler/controller do a 
better job of ordering requests.  Even on high performance distributed 
file systems like lustre you generally are going to do best with lots of 
IO nodes reading/writing multiple files.

>
> Regards,

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>]

* Re: RBD vs RADOS benchmark performance
       [not found]     ` <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
@ 2013-05-13 15:01       ` Gandalf Corvotempesta
       [not found]         ` <CAJH6TXhcgNOLE53eJoJamwE3i-FSfBf9LzpRACHwp_hEriH5zA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Gandalf Corvotempesta @ 2013-05-13 15:01 UTC (permalink / raw)
  To: Greg
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ

2013/5/13 Greg <itooo-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>:
> thanks a lot for pointing this out, it indeed makes a *huge* difference !
>>
>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>
>> 100+0 records in
>> 100+0 records out
>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>
> (caches dropped before each test of course)

What if you set 1024 or greater value ?
Is bandwidth relative to the read ahead size?

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <CAJH6TXhcgNOLE53eJoJamwE3i-FSfBf9LzpRACHwp_hEriH5zA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: RBD vs RADOS benchmark performance
       [not found]         ` <CAJH6TXhcgNOLE53eJoJamwE3i-FSfBf9LzpRACHwp_hEriH5zA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-05-13 15:10           ` Greg
  0 siblings, 0 replies; 6+ messages in thread
From: Greg @ 2013-05-13 15:10 UTC (permalink / raw)
  To: Gandalf Corvotempesta
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ

Le 13/05/2013 17:01, Gandalf Corvotempesta a écrit :
> 2013/5/13 Greg <itooo-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>:
>> thanks a lot for pointing this out, it indeed makes a *huge* difference !
>>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>>
>>> 100+0 records in
>>> 100+0 records out
>>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>> (caches dropped before each test of course)
> What if you set 1024 or greater value ?
> Is bandwidth relative to the read ahead size?
Setting the value too high degrades performance, especially random IO 
performance.
You have to determine  the right choice for your usage.

Cheers,

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-13 15:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <518D2B76.9040706@itooo.com>
     [not found] ` <1368423516.6771.2.camel@localhost>
2013-05-13 12:26   ` RBD vs RADOS benchmark performance Greg
2013-05-13 13:55     ` [ceph-users] " Mark Nelson
2013-05-13 14:52       ` Greg
     [not found]         ` <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
2013-05-13 15:17           ` Mark Nelson
     [not found]     ` <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
2013-05-13 15:01       ` Gandalf Corvotempesta
     [not found]         ` <CAJH6TXhcgNOLE53eJoJamwE3i-FSfBf9LzpRACHwp_hEriH5zA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-13 15:10           ` Greg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.