All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph benchmarks
@ 2012-08-27 20:47 Sébastien Han
  2012-08-27 20:59 ` Andrey Korolyov
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Sébastien Han @ 2012-08-27 20:47 UTC (permalink / raw)
  To: ceph-devel

Hi community,

For those of you who are interested, I performed several benchmarks of
RADOS and RBD on different types of hardware and use case.
You can find my results here:
http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/

Hope it helps :)

Feel free to comment, critic... :)

Cheers!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
@ 2012-08-27 20:59 ` Andrey Korolyov
  2012-08-28  1:40 ` Alexandre DERUMIER
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Andrey Korolyov @ 2012-08-27 20:59 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

On Tue, Aug 28, 2012 at 12:47 AM, Sébastien Han <han.sebastien@gmail.com> wrote:
> Hi community,
>
> For those of you who are interested, I performed several benchmarks of
> RADOS and RBD on different types of hardware and use case.
> You can find my results here:
> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
>
> Hope it helps :)
>
> Feel free to comment, critic... :)
>
> Cheers!

My two cents - on ultrafast journal(tmpfs) it means which tcp
congestion control algorithm you using. For default CUBIC delays
aggregated sixteen-osd writing speed is about 450MBps, but for DCTCP
it raising up to 550MBps. For such device as SLC disk(ext4,^O journal,
commit=100) there is no observable difference - both times aggregated
speed measured about 330MBps. I do not tried yet H(S)TCP, it should do
the same as DCTCP. For delays lower than regular gigabit ethernet
different congestion algorithms should show bigger difference, though.

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
  2012-08-27 20:59 ` Andrey Korolyov
@ 2012-08-28  1:40 ` Alexandre DERUMIER
  2012-08-28  2:18 ` Mark Nelson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Alexandre DERUMIER @ 2012-08-28  1:40 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

Hi,
Nice benchmark !

Maybe It could be great if you can add some fio benchmark.

I'm interested to see random iops values, as I never be able to reach more than 8000iops with a rbd cluster.




random read: (iops)
fio --filename=/dev/[device] --direct=1 --rw=randread --bs=4k --size=1G --iodepth=100 --runtime=120 --group_reporting --name=file1 --ioengine=libaio

random read: (iops)
fio --filename=/dev/[device] --direct=1 --rw=randwrite --bs=4k --size=1G --iodepth=100 --runtime=120 --group_reporting --name=file1 --ioengine=libaio



seq read: (bandwith)
fio --filename=/dev/[device] --direct=1 --rw=read --bs=4M --size=1G --iodepth=100 --runtime=120 --group_reporting --name=file1 --ioengine=libaio

seq write: (bandwith)
fio --filename=/dev/[device] --direct=1 --rw=write --bs=4M --size=1G --iodepth=100 --runtime=120 --group_reporting --name=file1 --ioengine=libaio


Regards,

Alexandre

----- Mail original ----- 

De: "Sébastien Han" <han.sebastien@gmail.com> 
À: "ceph-devel" <ceph-devel@vger.kernel.org> 
Envoyé: Lundi 27 Août 2012 22:47:06 
Objet: Ceph benchmarks 

Hi community, 

For those of you who are interested, I performed several benchmarks of 
RADOS and RBD on different types of hardware and use case. 
You can find my results here: 
http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/ 

Hope it helps :) 

Feel free to comment, critic... :) 

Cheers! 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 



-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
  2012-08-27 20:59 ` Andrey Korolyov
  2012-08-28  1:40 ` Alexandre DERUMIER
@ 2012-08-28  2:18 ` Mark Nelson
  2012-08-28  4:27   ` Mark Kirkwood
  2012-08-28 11:51 ` Plaetinck, Dieter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mark Nelson @ 2012-08-28  2:18 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

On 08/27/2012 03:47 PM, Sébastien Han wrote:
> Hi community,
>

Hi!

> For those of you who are interested, I performed several benchmarks of
> RADOS and RBD on different types of hardware and use case.
> You can find my results here:
> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
>
> Hope it helps :)
>
> Feel free to comment, critic... :)

A couple of thoughts:

1) With so few OSDs going from 1000 to 10000 pgs shouldn't make too much 
of a difference.  It would be concerning if it did!

2) Were the commodity results with SSDs using replication of 3?  Also, 
was that test with the flusher on or off?  I'd hope that with 15k drives 
you'd see a bit better throughput with journals on the SSDs.

3) It would be interesting to try these tests without the raid1 and see 
if you can max out the bonded interface.

4) I think the R520 backplane is using SAS expanders like in the R515s 
we have.  We've had some performance problems caused either by them or 
by something goofy going on with our H700 controllers.

5) rados bench tests with smaller requests could be interesting on 15k 
drives.  I typically see about 1-2MB/s per OSD for 4k requests with 
7200rpm SATA disks.

Mark

>
> Cheers!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-28  2:18 ` Mark Nelson
@ 2012-08-28  4:27   ` Mark Kirkwood
  2012-08-28  8:32     ` Sébastien Han
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Kirkwood @ 2012-08-28  4:27 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Sébastien Han, ceph-devel

+1 to that. I've been seeing 4-6 MB/s for 4K writes for 1 OSD with 1 SSD 
for journal and another for data [1]. Interestingly I did see some nice 
scaling with 4K random reads: 2-4 MB/s per thread for up to 8 threads 
(looked like it plateaued thereafter).

Cheers

Mark

[1] FYI not on the box I posted about before - on a more modern pc with 
6Bbit/s SATA.

On 28/08/12 14:18, Mark Nelson wrote:
> 5) rados bench tests with smaller requests could be interesting on 15k 
> drives.  I typically see about 1-2MB/s per OSD for 4k requests with 
> 7200rpm SATA disks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-28  4:27   ` Mark Kirkwood
@ 2012-08-28  8:32     ` Sébastien Han
  2012-08-28 11:46       ` Mark Nelson
  0 siblings, 1 reply; 12+ messages in thread
From: Sébastien Han @ 2012-08-28  8:32 UTC (permalink / raw)
  To: Mark Kirkwood; +Cc: Mark Nelson, ceph-devel

@Alexandre: I don't have all the machines anymore, I'll see what I can
do :). Only the commodity cluster remains

@Mark Nelson: 2) Which bench? The RADOS one?
3) Sorry the RAID controller doesn't support JBOD...
5) I still have the commodity cluster, I'll perform some little rados benchmarks

Cheers!

On Tue, Aug 28, 2012 at 6:27 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> +1 to that. I've been seeing 4-6 MB/s for 4K writes for 1 OSD with 1 SSD for
> journal and another for data [1]. Interestingly I did see some nice scaling
> with 4K random reads: 2-4 MB/s per thread for up to 8 threads (looked like
> it plateaued thereafter).
>
> Cheers
>
> Mark
>
> [1] FYI not on the box I posted about before - on a more modern pc with
> 6Bbit/s SATA.
>
>
> On 28/08/12 14:18, Mark Nelson wrote:
>>
>> 5) rados bench tests with smaller requests could be interesting on 15k
>> drives.  I typically see about 1-2MB/s per OSD for 4k requests with 7200rpm
>> SATA disks.
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-28  8:32     ` Sébastien Han
@ 2012-08-28 11:46       ` Mark Nelson
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Nelson @ 2012-08-28 11:46 UTC (permalink / raw)
  To: Sébastien Han; +Cc: Mark Kirkwood, ceph-devel

On 08/28/2012 03:32 AM, Sébastien Han wrote:
> @Alexandre: I don't have all the machines anymore, I'll see what I can
> do :). Only the commodity cluster remains
>
> @Mark Nelson: 2) Which bench? The RADOS one?
> 3) Sorry the RAID controller doesn't support JBOD...
> 5) I still have the commodity cluster, I'll perform some little rados benchmarks

Ah, that's the problem we have with our H700s.  We do single drive 
raid0s to get around it, but it's not ideal.  Do you have two drives in 
a raid1 or just a single drive?

Some other things we've noticed on our Dell machines:

- Writeback cache is pretty much faster than writethrough cache on all 
of our tests, even sequential writes.
- Concurrent writers to a single raid group seem to tank performance.  I 
still don't know why this is, but it's making buffered IO and any direct 
IO with more than one writer top out at about 95MB/s regardless of the 
number of drives in the raid group.  (And more writers slower 
performance more).

Mark

>
> Cheers!
>
> On Tue, Aug 28, 2012 at 6:27 AM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz>  wrote:
>> +1 to that. I've been seeing 4-6 MB/s for 4K writes for 1 OSD with 1 SSD for
>> journal and another for data [1]. Interestingly I did see some nice scaling
>> with 4K random reads: 2-4 MB/s per thread for up to 8 threads (looked like
>> it plateaued thereafter).
>>
>> Cheers
>>
>> Mark
>>
>> [1] FYI not on the box I posted about before - on a more modern pc with
>> 6Bbit/s SATA.
>>
>>
>> On 28/08/12 14:18, Mark Nelson wrote:
>>>
>>> 5) rados bench tests with smaller requests could be interesting on 15k
>>> drives.  I typically see about 1-2MB/s per OSD for 4k requests with 7200rpm
>>> SATA disks.
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
                   ` (2 preceding siblings ...)
  2012-08-28  2:18 ` Mark Nelson
@ 2012-08-28 11:51 ` Plaetinck, Dieter
  2012-08-28 13:11 ` Tommi Virtanen
  2012-09-08 18:16 ` Ceph benchmarks / ceph osd tell X bench Dieter Kasper
  5 siblings, 0 replies; 12+ messages in thread
From: Plaetinck, Dieter @ 2012-08-28 11:51 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

Sébastien Han <han.sebastien@gmail.com> wrote:

> Just as a reminder the system maintains 2 caches facilities:
> * disk write cache
> * page cache

the page cache is the one commonly referred to as block cache right (i.e. in the block layer, below the filesystem layer in the kernel)?
what do you mean with disk write cache? the one where io commands are held so they can be reordered? I always thought that was the same as the block cache - and also hard disks sometimes do this themselves, buffering in a few megs of memory on the hard disk itself?

Dieter
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
                   ` (3 preceding siblings ...)
  2012-08-28 11:51 ` Plaetinck, Dieter
@ 2012-08-28 13:11 ` Tommi Virtanen
  2012-08-28 22:16   ` Sébastien Han
  2012-09-08 18:16 ` Ceph benchmarks / ceph osd tell X bench Dieter Kasper
  5 siblings, 1 reply; 12+ messages in thread
From: Tommi Virtanen @ 2012-08-28 13:11 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

On Mon, Aug 27, 2012 at 1:47 PM, Sébastien Han <han.sebastien@gmail.com> wrote:
> For those of you who are interested, I performed several benchmarks of
> RADOS and RBD on different types of hardware and use case.
> You can find my results here:
> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/

Nice!

Minor nit: "sudo echo 3 | tee /proc/sys/vm/drop_caches && sudo sync"
you probably want "say echo 3 | sudo tee ... && sync"
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks
  2012-08-28 13:11 ` Tommi Virtanen
@ 2012-08-28 22:16   ` Sébastien Han
  0 siblings, 0 replies; 12+ messages in thread
From: Sébastien Han @ 2012-08-28 22:16 UTC (permalink / raw)
  To: Tommi Virtanen; +Cc: ceph-devel

@Mark Nelson: thanks for the precision, I'll think about that the next
time I'll build an array. It was raid1 with 2 disks (no broken array)

@Plaetinck, Dieter: Sorry I made a little mistake, I was referring
about the system cache (page cache), the one which considers write
operations to the storage system complete after the data has been
copied into it. Secondly the disk write cache (hard drive disk), the
one stored into the hard drive disk. I'm going to make the sentence
clearer and remove the disk write cache part.

@Jerker Nyberg: I performed some measure on each system during a write
but it's more in my head than on the paper. As I can say, the
commodity cluster was struggling during the write. The other machines
barely showed a load, even when I deactivated every cores and kept
only 1 or 2 it was ok. The CPU load from the OSD wasn't so high.

@Tommi Virtanen: nice catch! I'm gonna update the article :)

Thank you for all the feedback, I'll try to perform some of the tests
guys mentioned above :)

On Tue, Aug 28, 2012 at 3:11 PM, Tommi Virtanen <tv@inktank.com> wrote:
> On Mon, Aug 27, 2012 at 1:47 PM, Sébastien Han <han.sebastien@gmail.com> wrote:
>> For those of you who are interested, I performed several benchmarks of
>> RADOS and RBD on different types of hardware and use case.
>> You can find my results here:
>> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
>
> Nice!
>
> Minor nit: "sudo echo 3 | tee /proc/sys/vm/drop_caches && sudo sync"
> you probably want "say echo 3 | sudo tee ... && sync"
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks / ceph osd tell X bench
  2012-08-27 20:47 Ceph benchmarks Sébastien Han
                   ` (4 preceding siblings ...)
  2012-08-28 13:11 ` Tommi Virtanen
@ 2012-09-08 18:16 ` Dieter Kasper
  2012-09-10 17:07   ` Sébastien Han
  5 siblings, 1 reply; 12+ messages in thread
From: Dieter Kasper @ 2012-09-08 18:16 UTC (permalink / raw)
  To: Sébastien Han; +Cc: ceph-devel

Hi Sébastien,

when running 'ceph osd tell $i bench'
who/where will I see the results:
osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 19.109900 sec at 54870 KB/sec
osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 20.755279 sec at 50520 KB/sec
osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 19.347267 sec at 54197 KB/sec
?

Which logging do I have to active for it ?

Thanks,
-Dieter


On Mon, Aug 27, 2012 at 10:47:06PM +0200, Sébastien Han wrote:
> Hi community,
> 
> For those of you who are interested, I performed several benchmarks of
> RADOS and RBD on different types of hardware and use case.
> You can find my results here:
> http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
> 
> Hope it helps :)
> 
> Feel free to comment, critic... :)
> 
> Cheers!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ceph benchmarks / ceph osd tell X bench
  2012-09-08 18:16 ` Ceph benchmarks / ceph osd tell X bench Dieter Kasper
@ 2012-09-10 17:07   ` Sébastien Han
  0 siblings, 0 replies; 12+ messages in thread
From: Sébastien Han @ 2012-09-10 17:07 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: ceph-devel

Hi Dieter,

Simply run a "ceph -w" and wait for the output.

Cheers.


On Sat, Sep 8, 2012 at 8:16 PM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
>
> Hi Sébastien,
>
> when running 'ceph osd tell $i bench'
> who/where will I see the results:
> osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 19.109900 sec at 54870 KB/sec
> osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 20.755279 sec at 50520 KB/sec
> osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 19.347267 sec at 54197 KB/sec
> ?
>
> Which logging do I have to active for it ?
>
> Thanks,
> -Dieter
>
>
> On Mon, Aug 27, 2012 at 10:47:06PM +0200, Sébastien Han wrote:
> > Hi community,
> >
> > For those of you who are interested, I performed several benchmarks of
> > RADOS and RBD on different types of hardware and use case.
> > You can find my results here:
> > http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
> >
> > Hope it helps :)
> >
> > Feel free to comment, critic... :)
> >
> > Cheers!
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-09-10 17:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-27 20:47 Ceph benchmarks Sébastien Han
2012-08-27 20:59 ` Andrey Korolyov
2012-08-28  1:40 ` Alexandre DERUMIER
2012-08-28  2:18 ` Mark Nelson
2012-08-28  4:27   ` Mark Kirkwood
2012-08-28  8:32     ` Sébastien Han
2012-08-28 11:46       ` Mark Nelson
2012-08-28 11:51 ` Plaetinck, Dieter
2012-08-28 13:11 ` Tommi Virtanen
2012-08-28 22:16   ` Sébastien Han
2012-09-08 18:16 ` Ceph benchmarks / ceph osd tell X bench Dieter Kasper
2012-09-10 17:07   ` Sébastien Han

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.