From: Mark Nelson <mark.nelson-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>
To: Greg <itooo-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ceph-users-Qp0mS5GaXlQ@public.gmane.org
Subject: Re: RBD vs RADOS benchmark performance
Date: Mon, 13 May 2013 10:17:20 -0500 [thread overview]
Message-ID: <51910400.9080607@inktank.com> (raw)
In-Reply-To: <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
On 05/13/2013 09:52 AM, Greg wrote:
> Le 13/05/2013 15:55, Mark Nelson a écrit :
>> On 05/13/2013 07:26 AM, Greg wrote:
>>> Le 13/05/2013 07:38, Olivier Bonvalet a écrit :
>>>> Le vendredi 10 mai 2013 à 19:16 +0200, Greg a écrit :
>>>>> Hello folks,
>>>>>
>>>>> I'm in the process of testing CEPH and RBD, I have set up a small
>>>>> cluster of hosts running each a MON and an OSD with both journal and
>>>>> data on the same SSD (ok this is stupid but this is simple to
>>>>> verify the
>>>>> disks are not the bottleneck for 1 client). All nodes are connected
>>>>> on a
>>>>> 1Gb network (no dedicated network for OSDs, shame on me :).
>>>>>
>>>>> Summary : the RBD performance is poor compared to benchmark
>>>>>
>>>>> A 5 seconds seq read benchmark shows something like this :
>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat
>>>>>> avg lat
>>>>>> 0 0 0 0 0 0 - 0
>>>>>> 1 16 39 23 91.9586 92 0.966117
>>>>>> 0.431249
>>>>>> 2 16 64 48 95.9602 100 0.513435
>>>>>> 0.53849
>>>>>> 3 16 90 74 98.6317 104 0.25631
>>>>>> 0.55494
>>>>>> 4 11 95 84 83.9735 40 1.80038
>>>>>> 0.58712
>>>>>> Total time run: 4.165747
>>>>>> Total reads made: 95
>>>>>> Read size: 4194304
>>>>>> Bandwidth (MB/sec): 91.220
>>>>>>
>>>>>> Average Latency: 0.678901
>>>>>> Max latency: 1.80038
>>>>>> Min latency: 0.104719
>>>>> 91MB read performance, quite good !
>>>>>
>>>>> Now the RBD performance :
>>>>>> root@client:~# dd if=/dev/rbd1 of=/dev/null bs=4M count=100
>>>>>> 100+0 records in
>>>>>> 100+0 records out
>>>>>> 419430400 bytes (419 MB) copied, 13.0568 s, 32.1 MB/s
>>>>> There is a 3x performance factor (same for write: ~60M benchmark, ~20M
>>>>> dd on block device)
>>>>>
>>>>> The network is ok, the CPU is also ok on all OSDs.
>>>>> CEPH is Bobtail 0.56.4, linux is 3.8.1 arm (vanilla release + some
>>>>> patches for the SoC being used)
>>>>>
>>>>> Can you show me the starting point for digging into this ?
>>>> You should try to increase read_ahead to 512K instead of the defaults
>>>> 128K (/sys/block/*/queue/read_ahead_kb). I have seen a huge difference
>>>> on reads with that.
>>>>
>>> Olivier,
>>>
>>> thanks a lot for pointing this out, it indeed makes a *huge*
>>> difference !
>>>> # dd if=/mnt/t/1 of=/dev/zero bs=4M count=100
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 419430400 bytes (419 MB) copied, 5.12768 s, 81.8 MB/s
>>> (caches dropped before each test of course)
>>>
>>> Mark, this is probably something you will want to investigate and
>>> explain in a "tweaking" topic of the documentation.
>>>
>>> Regards,
>>
>> Out of curiosity, has your rados bench performance improved as well?
>> We've also seen improvements for sequential read throughput when
>> increasing read_ahead_kb. (it may decrease random iops in some cases
>> though!) The reason I didn't think to mention it here though is
>> because I was just focused on the difference between rados bench and
>> rbd. It would be interesting to know if rbd has improved more
>> dramatically than rados bench.
> Mark, the read ahead is set on the RBD block device (on the client), so
> it doesn't improve benchmark results as the benchmark doesn't use the
> block layer.
Ah, I was thinking you had increased it on the OSDs (which can also
help). On the OSD side, if you are targeting spinning disks, it can
depend a lot on how much data is stored per track and the cost of head
switches and track switches.
>
> 1 question remains : why did I have poor performance with 1 single
> writing thread ?
In general, parallelism is really helpful because it hides latency and
also helps you spread the load over all of your OSDs. Even on a single
disk, having concurrent requests lets the scheduler/controller do a
better job of ordering requests. Even on high performance distributed
file systems like lustre you generally are going to do best with lots of
IO nodes reading/writing multiple files.
>
> Regards,
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
next prev parent reply other threads:[~2013-05-13 15:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <518D2B76.9040706@itooo.com>
[not found] ` <1368423516.6771.2.camel@localhost>
2013-05-13 12:26 ` RBD vs RADOS benchmark performance Greg
2013-05-13 13:55 ` [ceph-users] " Mark Nelson
2013-05-13 14:52 ` Greg
[not found] ` <5190FE49.1030307-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
2013-05-13 15:17 ` Mark Nelson [this message]
[not found] ` <5190DBD9.9070500-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org>
2013-05-13 15:01 ` Gandalf Corvotempesta
[not found] ` <CAJH6TXhcgNOLE53eJoJamwE3i-FSfBf9LzpRACHwp_hEriH5zA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-05-13 15:10 ` Greg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51910400.9080607@inktank.com \
--to=mark.nelson-4gqslpfj+cxbdgjk7y7tuq@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ceph-users-Qp0mS5GaXlQ@public.gmane.org \
--cc=itooo-xVucS5mfmt0AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox