Ceph write performance

All of lore.kernel.org
 help / color / mirror / Atom feed

* Ceph write performance
@ 2012-07-20 10:24 George Shuklin
       [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: George Shuklin @ 2012-07-20 10:24 UTC (permalink / raw)
  To: ceph-devel

Good day.

I've start to play with Ceph... And I found some kinda strange 
performance issues. I'm not sure if this is due ceph limitation or my 
bad setup.

Setup:

osd - xfs on ramdisk (only one osd)
mds - raid0 on 10 disks
mon - second raid0 on 10 disks

I've mount ceph share at localhost and run FIO (randwrite, 4k, iodepth=32)

What I've got: 1900 IOPS on writing (4k block, 1Gb span).

Normally fio shows about 200kIOPS writing on ramdisk.

Why it was so slow? I've  done setup exactly like described here: 
http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster 
(but one osd).

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance
       [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
@ 2012-07-20 10:48   ` George Shuklin
  2012-07-20 11:49     ` Mark Nelson
  0 siblings, 1 reply; 31+ messages in thread
From: George Shuklin @ 2012-07-20 10:48 UTC (permalink / raw)
  To: Dieter Kasper (KD), ceph-devel

On 20.07.2012 14:41, Dieter Kasper (KD) wrote:

Good day.

Thank you for attention.

ramdisk size ~70Gb (modprobe brd rd_size=70000000)
journal seems be on same device as storage
size of OSD was unchanged (... means I create it by manual and do not 
make any specific changes)

During test I watch IO load closely, IO on MDS/MON was insignificant 
(most of the time zero, sometimes few very mild peaks).

Just in case, configs:

ceph.conf:

[osd]
         osd journal size = 1000
         filestore xattr use omap = true

[mon.a]
         host = srv1
         mon addr = 192.168.0.1:6789

[osd.0]
         host = srv1

[mds.a]
         host = srv1

fio.ini:
[test]
blocksize=4k
filename=/media/test
size=16g
fallocate=posix
rw=randread
direct=1
buffered=0
ioengine=libaio
iodepth=32


Thanks for advising, I'll recheck with new settings.

> George,
>
> please share more details of your config:
> - RAM size of your system
> - location of the journal
> - size of your OSD
>
> Can you try (just for the 1st test) to
> .. put the journal on RAM disk	
> .. put the MDS on RAM disk
> .. put the MON on RAM disk
> .. use btrfs for OSD
>
> As an alternative to isolate the bottleneck you can try to
> - run without a journal
> - use RBD instead Ceph-FS
>    + create a File System on top of the /dev/rbd0
>
> Regards,
> Dieter Kasper
>
>
> On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
>> Good day.
>>
>> I've start to play with Ceph... And I found some kinda strange
>> performance issues. I'm not sure if this is due ceph limitation or my
>> bad setup.
>>
>> Setup:
>>
>> osd - xfs on ramdisk (only one osd)
>> mds - raid0 on 10 disks
>> mon - second raid0 on 10 disks
>>
>> I've mount ceph share at localhost and run FIO (randwrite, 4k, iodepth=32)
>>
>> What I've got: 1900 IOPS on writing (4k block, 1Gb span).
>>
>> Normally fio shows about 200kIOPS writing on ramdisk.
>>
>> Why it was so slow? I've  done setup exactly like described here:
>> http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
>> (but one osd).
>>
>> Thanks.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance
  2012-07-20 10:48   ` George Shuklin
@ 2012-07-20 11:49     ` Mark Nelson
  2012-07-20 20:36       ` Ceph write performance on RAM-DISK Dieter Kasper
  0 siblings, 1 reply; 31+ messages in thread
From: Mark Nelson @ 2012-07-20 11:49 UTC (permalink / raw)
  To: George Shuklin; +Cc: Dieter Kasper (KD), ceph-devel

Hi George,

I think you may find that the limitation is in the the filestore.  It's 
one of the things I've been working on trying to track down as I've seen 
low performance on SSDs with small request sizes as well.  You can use 
the test_filestore_workloadgen to specifically test the filestore code 
with small requests if you'd like.  I'm not sure if it is included with 
the binary distribution but it can be compiled if you download the src. 
  I think it's "make test_filestore_workloadgen" in the src directory.

Mark

On 7/20/12 5:48 AM, George Shuklin wrote:
> On 20.07.2012 14:41, Dieter Kasper (KD) wrote:
>
> Good day.
>
> Thank you for attention.
>
> ramdisk size ~70Gb (modprobe brd rd_size=70000000)
> journal seems be on same device as storage
> size of OSD was unchanged (... means I create it by manual and do not
> make any specific changes)
>
> During test I watch IO load closely, IO on MDS/MON was insignificant
> (most of the time zero, sometimes few very mild peaks).
>
> Just in case, configs:
>
> ceph.conf:
>
> [osd]
>          osd journal size = 1000
>          filestore xattr use omap = true
>
> [mon.a]
>          host = srv1
>          mon addr = 192.168.0.1:6789
>
> [osd.0]
>          host = srv1
>
> [mds.a]
>          host = srv1
>
> fio.ini:
> [test]
> blocksize=4k
> filename=/media/test
> size=16g
> fallocate=posix
> rw=randread
> direct=1
> buffered=0
> ioengine=libaio
> iodepth=32
>
>
> Thanks for advising, I'll recheck with new settings.
>
>> George,
>>
>> please share more details of your config:
>> - RAM size of your system
>> - location of the journal
>> - size of your OSD
>>
>> Can you try (just for the 1st test) to
>> .. put the journal on RAM disk
>> .. put the MDS on RAM disk
>> .. put the MON on RAM disk
>> .. use btrfs for OSD
>>
>> As an alternative to isolate the bottleneck you can try to
>> - run without a journal
>> - use RBD instead Ceph-FS
>>    + create a File System on top of the /dev/rbd0
>>
>> Regards,
>> Dieter Kasper
>>
>>
>> On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
>>> Good day.
>>>
>>> I've start to play with Ceph... And I found some kinda strange
>>> performance issues. I'm not sure if this is due ceph limitation or my
>>> bad setup.
>>>
>>> Setup:
>>>
>>> osd - xfs on ramdisk (only one osd)
>>> mds - raid0 on 10 disks
>>> mon - second raid0 on 10 disks
>>>
>>> I've mount ceph share at localhost and run FIO (randwrite, 4k,
>>> iodepth=32)
>>>
>>> What I've got: 1900 IOPS on writing (4k block, 1Gb span).
>>>
>>> Normally fio shows about 200kIOPS writing on ramdisk.
>>>
>>> Why it was so slow? I've  done setup exactly like described here:
>>> http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
>>> (but one osd).
>>>
>>> Thanks.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance
  2012-07-20 10:24 Ceph write performance George Shuklin
       [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
@ 2012-07-20 15:53 ` Matthew Richardson
  2012-07-20 16:37 ` Gregory Farnum
  2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
  3 siblings, 0 replies; 31+ messages in thread
From: Matthew Richardson @ 2012-07-20 15:53 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On 20/07/12 11:24, George Shuklin wrote:
> Good day.
> 
> I've start to play with Ceph... And I found some kinda strange
> performance issues. I'm not sure if this is due ceph limitation or my
> bad setup.

I'm seeing a similar problem which looks like a potential bug, which
someone else seems to have already reported

(http://www.spinics.net/lists/ceph-devel/msg07335.html and
http://www.spinics.net/lists/ceph-devel/msg07691.html)

The problem only seems to hit for me when I do random writes - can you
try fio with sequential writes (rw=write) and see if your problem also
disappears?  It might help confirm this as an issue.

Thanks,

Matthew



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance
  2012-07-20 10:24 Ceph write performance George Shuklin
       [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
  2012-07-20 15:53 ` Ceph write performance Matthew Richardson
@ 2012-07-20 16:37 ` Gregory Farnum
  2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
  3 siblings, 0 replies; 31+ messages in thread
From: Gregory Farnum @ 2012-07-20 16:37 UTC (permalink / raw)
  To: George Shuklin; +Cc: ceph-devel

On Fri, Jul 20, 2012 at 3:24 AM, George Shuklin <shuklin@selectel.ru> wrote:
> Good day.
>
> I've start to play with Ceph... And I found some kinda strange performance
> issues. I'm not sure if this is due ceph limitation or my bad setup.
>
> Setup:
>
> osd - xfs on ramdisk (only one osd)
> mds - raid0 on 10 disks
> mon - second raid0 on 10 disks

I'm not going to butt in on the performance discussion, but just FYI,
the MDS does not use any local storage — it puts everything on the
OSDs. :)
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance on RAM-DISK
  2012-07-20 11:49     ` Mark Nelson
@ 2012-07-20 20:36       ` Dieter Kasper
  2012-07-20 21:28         ` Mark Nelson
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-07-20 20:36 UTC (permalink / raw)
  To: Mark Nelson; +Cc: George Shuklin, ceph-devel, Dieter Kasper (KD)

[-- Attachment #1: Type: text/plain, Size: 5220 bytes --]

Hi Mark, George,

I can observe a similar (poor) Performance on my system with fio on /dev/rbd1

#--- seq. write RBD
RX37-0:~ # dd if=/dev/zero of=/dev/rbd1 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 41.1819 s, 255 MB/s

#--- seq. read RBD
RX37-0:~ # dd of=/dev/zero if=/dev/rbd1 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40.9595 s, 256 MB/s

#--- seq. read /dev/ramX
RX37-0:~ # dd of=/dev/zero if=/dev/ram0 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 4.68389 s, 2.2 GB/s

Does ceph-osd/filestore 'eat' 90% of my resources/bandwidth/latency ?


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
(...)
  write: io=461592KB, bw=15371KB/s, iops=3842 , runt= 30030msec
  write: io=5120.0MB, bw=893927KB/s, iops=223481 , runt=  5865msec (on /dev/ram0)


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randread --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
(...)
  read : io=698356KB, bw=23240KB/s, iops=5809 , runt= 30050msec
  read : io=5120.0MB, bw=1631.1MB/s, iops=417559 , runt=  3139msec (on /dev/ram0)


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=1m --size=5G --numjobs=4 --runtime=10 --group_reporting --name=file1
(...)
  write: io=6377.0MB, bw=217125KB/s, iops=212 , runt= 30075msec
  write: io=5120.0MB, bw=2114.9MB/s, iops=2114 , runt=  2421msec (on /dev/ram0)


Where is the bottleneck ?
What is filestore doing ?
How can I disable the journal and write only to the btrfs OSDs ? (like as they would be SSDs)
How can I get better performance ?


Regards,
Dieter 

P.S. I will try to get the "test_filestore_workloadgen" 


On Fri, Jul 20, 2012 at 06:49:30AM -0500, Mark Nelson wrote:
> Hi George,
> 
> I think you may find that the limitation is in the the filestore.
> It's one of the things I've been working on trying to track down as
> I've seen low performance on SSDs with small request sizes as well.
> You can use the test_filestore_workloadgen to specifically test the
> filestore code with small requests if you'd like.  I'm not sure if
> it is included with the binary distribution but it can be compiled
> if you download the src.  I think it's "make
> test_filestore_workloadgen" in the src directory.
> 
> Mark
> 
> On 7/20/12 5:48 AM, George Shuklin wrote:
> >On 20.07.2012 14:41, Dieter Kasper (KD) wrote:
> >
> >Good day.
> >
> >Thank you for attention.
> >
> >ramdisk size ~70Gb (modprobe brd rd_size=70000000)
> >journal seems be on same device as storage
> >size of OSD was unchanged (... means I create it by manual and do not
> >make any specific changes)
> >
> >During test I watch IO load closely, IO on MDS/MON was insignificant
> >(most of the time zero, sometimes few very mild peaks).
> >
> >Just in case, configs:
> >
> >ceph.conf:
> >
> >[osd]
> >         osd journal size = 1000
> >         filestore xattr use omap = true
> >
> >[mon.a]
> >         host = srv1
> >         mon addr = 192.168.0.1:6789
> >
> >[osd.0]
> >         host = srv1
> >
> >[mds.a]
> >         host = srv1
> >
> >fio.ini:
> >[test]
> >blocksize=4k
> >filename=/media/test
> >size=16g
> >fallocate=posix
> >rw=randread
> >direct=1
> >buffered=0
> >ioengine=libaio
> >iodepth=32
> >
> >
> >Thanks for advising, I'll recheck with new settings.
> >
> >>George,
> >>
> >>please share more details of your config:
> >>- RAM size of your system
> >>- location of the journal
> >>- size of your OSD
> >>
> >>Can you try (just for the 1st test) to
> >>.. put the journal on RAM disk
> >>.. put the MDS on RAM disk
> >>.. put the MON on RAM disk
> >>.. use btrfs for OSD
> >>
> >>As an alternative to isolate the bottleneck you can try to
> >>- run without a journal
> >>- use RBD instead Ceph-FS
> >>   + create a File System on top of the /dev/rbd0
> >>
> >>Regards,
> >>Dieter Kasper
> >>
> >>
> >>On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
> >>>Good day.
> >>>
> >>>I've start to play with Ceph... And I found some kinda strange
> >>>performance issues. I'm not sure if this is due ceph limitation or my
> >>>bad setup.
> >>>
> >>>Setup:
> >>>
> >>>osd - xfs on ramdisk (only one osd)
> >>>mds - raid0 on 10 disks
> >>>mon - second raid0 on 10 disks
> >>>
> >>>I've mount ceph share at localhost and run FIO (randwrite, 4k,
> >>>iodepth=32)
> >>>
> >>>What I've got: 1900 IOPS on writing (4k block, 1Gb span).
> >>>
> >>>Normally fio shows about 200kIOPS writing on ramdisk.
> >>>
> >>>Why it was so slow? I've  done setup exactly like described here:
> >>>http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
> >>>(but one osd).
> >>>
> >>>Thanks.
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[-- Attachment #2: ceph.conf --]
[-- Type: text/plain, Size: 864 bytes --]

[global]
        pid file = /var/run/ceph/$name.pid
        debug ms = 0
        auth supported = cephx
        keyring = /etc/ceph/keyring.client
[mon]
        mon data = /tmp/mon$id
[mon.a]
	host = localhost
	mon addr = 127.0.0.1:6789

[osd]
        journal dio = false
        osd data = /data/$name
	osd journal = /mnt/osd.journal/$name/journal
        osd journal size = 1000

        keyring = /etc/ceph/keyring.$name
        # debug osd = 20
        # debug ms = 1         ; message traffic
        # debug filestore = 20 ; local object storage
        # debug journal = 20   ; local journaling
        # debug monc = 5      ; monitor interaction, startup

[osd.0]
	host = localhost
        btrfs devs = /dev/ram0

[osd.1]
	host = localhost
        btrfs devs = /dev/ram1

[osd.2]
	host = localhost
        btrfs devs = /dev/ram2

[mds.a]
	host = localhost

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Ceph write performance on RAM-DISK
  2012-07-20 20:36       ` Ceph write performance on RAM-DISK Dieter Kasper
@ 2012-07-20 21:28         ` Mark Nelson
  0 siblings, 0 replies; 31+ messages in thread
From: Mark Nelson @ 2012-07-20 21:28 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: George Shuklin, ceph-devel

On 07/20/2012 03:36 PM, Dieter Kasper wrote:
> Hi Mark, George,
>
> I can observe a similar (poor) Performance on my system with fio on /dev/rbd1
>
> #--- seq. write RBD
> RX37-0:~ # dd if=/dev/zero of=/dev/rbd1 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 41.1819 s, 255 MB/s
>
> #--- seq. read RBD
> RX37-0:~ # dd of=/dev/zero if=/dev/rbd1 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 40.9595 s, 256 MB/s
>
> #--- seq. read /dev/ramX
> RX37-0:~ # dd of=/dev/zero if=/dev/ram0 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 4.68389 s, 2.2 GB/s
>
> Does ceph-osd/filestore 'eat' 90% of my resources/bandwidth/latency ?
>

Well, there are multiple layers involved here, so it's possible that 
some of the code for RBD is playing a part in this too.  I have 
specifically seen slow performance with smaller requests with the 
filestore though, so that is where I'm focusing my energy right now.

>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
> (...)
>    write: io=461592KB, bw=15371KB/s, iops=3842 , runt= 30030msec
>    write: io=5120.0MB, bw=893927KB/s, iops=223481 , runt=  5865msec (on /dev/ram0)
>
>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randread --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
> (...)
>    read : io=698356KB, bw=23240KB/s, iops=5809 , runt= 30050msec
>    read : io=5120.0MB, bw=1631.1MB/s, iops=417559 , runt=  3139msec (on /dev/ram0)
>
>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=1m --size=5G --numjobs=4 --runtime=10 --group_reporting --name=file1
> (...)
>    write: io=6377.0MB, bw=217125KB/s, iops=212 , runt= 30075msec
>    write: io=5120.0MB, bw=2114.9MB/s, iops=2114 , runt=  2421msec (on /dev/ram0)
>
>
> Where is the bottleneck ?
> What is filestore doing ?
> How can I disable the journal and write only to the btrfs OSDs ? (like as they would be SSDs)
> How can I get better performance ?

Not yet sure where the bottleneck is, but we are actively looking into 
it.  Sadly the process has been complicated by potential bottleneck in 
our test hardware that could be masking real issues in the code.

>
>
> Regards,
> Dieter
>
> P.S. I will try to get the "test_filestore_workloadgen"
>
>
> On Fri, Jul 20, 2012 at 06:49:30AM -0500, Mark Nelson wrote:
>> Hi George,
>>
>> I think you may find that the limitation is in the the filestore.
>> It's one of the things I've been working on trying to track down as
>> I've seen low performance on SSDs with small request sizes as well.
>> You can use the test_filestore_workloadgen to specifically test the
>> filestore code with small requests if you'd like.  I'm not sure if
>> it is included with the binary distribution but it can be compiled
>> if you download the src.  I think it's "make
>> test_filestore_workloadgen" in the src directory.
>>
>> Mark
>>
>> On 7/20/12 5:48 AM, George Shuklin wrote:
>>> On 20.07.2012 14:41, Dieter Kasper (KD) wrote:
>>>
>>> Good day.
>>>
>>> Thank you for attention.
>>>
>>> ramdisk size ~70Gb (modprobe brd rd_size=70000000)
>>> journal seems be on same device as storage
>>> size of OSD was unchanged (... means I create it by manual and do not
>>> make any specific changes)
>>>
>>> During test I watch IO load closely, IO on MDS/MON was insignificant
>>> (most of the time zero, sometimes few very mild peaks).
>>>
>>> Just in case, configs:
>>>
>>> ceph.conf:
>>>
>>> [osd]
>>>          osd journal size = 1000
>>>          filestore xattr use omap = true
>>>
>>> [mon.a]
>>>          host = srv1
>>>          mon addr = 192.168.0.1:6789
>>>
>>> [osd.0]
>>>          host = srv1
>>>
>>> [mds.a]
>>>          host = srv1
>>>
>>> fio.ini:
>>> [test]
>>> blocksize=4k
>>> filename=/media/test
>>> size=16g
>>> fallocate=posix
>>> rw=randread
>>> direct=1
>>> buffered=0
>>> ioengine=libaio
>>> iodepth=32
>>>
>>>
>>> Thanks for advising, I'll recheck with new settings.
>>>
>>>> George,
>>>>
>>>> please share more details of your config:
>>>> - RAM size of your system
>>>> - location of the journal
>>>> - size of your OSD
>>>>
>>>> Can you try (just for the 1st test) to
>>>> .. put the journal on RAM disk
>>>> .. put the MDS on RAM disk
>>>> .. put the MON on RAM disk
>>>> .. use btrfs for OSD
>>>>
>>>> As an alternative to isolate the bottleneck you can try to
>>>> - run without a journal
>>>> - use RBD instead Ceph-FS
>>>>    + create a File System on top of the /dev/rbd0
>>>>
>>>> Regards,
>>>> Dieter Kasper
>>>>
>>>>
>>>> On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
>>>>> Good day.
>>>>>
>>>>> I've start to play with Ceph... And I found some kinda strange
>>>>> performance issues. I'm not sure if this is due ceph limitation or my
>>>>> bad setup.
>>>>>
>>>>> Setup:
>>>>>
>>>>> osd - xfs on ramdisk (only one osd)
>>>>> mds - raid0 on 10 disks
>>>>> mon - second raid0 on 10 disks
>>>>>
>>>>> I've mount ceph share at localhost and run FIO (randwrite, 4k,
>>>>> iodepth=32)
>>>>>
>>>>> What I've got: 1900 IOPS on writing (4k block, 1Gb span).
>>>>>
>>>>> Normally fio shows about 200kIOPS writing on ramdisk.
>>>>>
>>>>> Why it was so slow? I've  done setup exactly like described here:
>>>>> http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
>>>>> (but one osd).
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


-- 
Mark Nelson
Performance Engineer
Inktank

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RBD performance - tuning hints
  2012-07-20 10:24 Ceph write performance George Shuklin
                   ` (2 preceding siblings ...)
  2012-07-20 16:37 ` Gregory Farnum
@ 2012-08-28 17:48 ` Dieter Kasper
  2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
  2012-08-29  8:50   ` Alexandre DERUMIER
  3 siblings, 2 replies; 31+ messages in thread
From: Dieter Kasper @ 2012-08-28 17:48 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org; +Cc: Dieter Kasper (KD)

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

Hi,

on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
I can observe a pretty nice rados bench performance 
(see bench-rados.txt for details):

Bandwidth (MB/sec):     961.710 
Max bandwidth (MB/sec): 1040
Min bandwidth (MB/sec): 772


Also the bandwidth performance generated with
  fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}

.... is acceptable, e.g.
fio_write_4m_16		795 MB/s
fio_randwrite_8m_128	717 MB/s
fio_randwrite_8m_16	714 MB/s
fio_randwrite_2m_32	692 MB/s


But, the write IOPS seems to be limited around 19k ...
RBD                     4M      64k (= optimal_io_size)
fio_randread_512_128    53286   55925
fio_randread_4k_128     51110   44382
fio_randread_8k_128     30854   29938
fio_randwrite_512_128   18888    2386
fio_randwrite_512_64    18844    2582
fio_randwrite_8k_64     17350    2445
(...)
fio_read_4k_128         10073   53151
fio_read_4k_64           9500   39757
fio_read_4k_32           9220   23650
(...)
fio_read_4k_16           9122   14322
fio_write_4k_128         2190   14306
fio_read_8k_32            706   13894
fio_write_4k_64          2197   12297
fio_write_8k_64          3563   11705
fio_write_8k_128         3444   11219


Any hints for tuning the IOPS (read and/or write) would be appreciated.

How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)


Kind Regards,
-Dieter

[-- Attachment #2: bench-rados.txt --]
[-- Type: text/plain, Size: 1746 bytes --]

rados bench -p pbench 60 write
 Maintaining 16 concurrent writes of 4194304 bytes for at least 60 seconds.
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16       228       212   847.857       848  0.042984 0.0684383
     2      16       451       435    869.88       892  0.084162 0.0700566
     3      16       695       679   905.223       976  0.057677 0.0695337
     4      16       942       926   925.894       988  0.038117 0.0685357
     5      16      1162      1146     916.7       880  0.042098 0.0693864
     6      16      1400      1384   922.569       952  0.063983 0.0689167
     7      16      1644      1628   930.189       976  0.065745 0.0684646
     8      16      1895      1879   939.404      1004  0.051277 0.0677953
     9      16      2145      2129   946.127      1000  0.055165  0.067354
(...)
    57      16     13704     13688    960.47       996  0.082716 0.0665862
    58      16     13954     13938    961.15      1000  0.041879 0.0665307
    59      16     14194     14178   961.129       960  0.046657 0.0664642
2012-08-28 17:32:18.620060min lat: 0.030234 max lat: 3.17834 avg lat: 0.0664676
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    60      16     14446     14430   961.909      1008  0.051635 0.0664676
 Total time run:         60.084612
Total writes made:      14446
Write size:             4194304
Bandwidth (MB/sec):     961.710 

Stddev Bandwidth:       54.0809
Max bandwidth (MB/sec): 1040
Min bandwidth (MB/sec): 772
Average Latency:        0.0665337
Stddev Latency:         0.0800225
Max latency:            3.17834
Min latency:            0.030234

[-- Attachment #3: bench-config.txt --]
[-- Type: text/plain, Size: 26557 bytes --]

--- RX37-3c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-3 3.0.41-5.1-default #1 SMP Wed Aug 22 00:54:03 UTC 2012 (9c63123) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856332 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdm 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdn 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 198232151949312
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     39 C
  Blocks sent to initiator = 188127268306944
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     42 C
  Blocks sent to initiator = 241646771896320
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     33 C
  Blocks sent to initiator = 202151376715776
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 186279543177216
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 200414079221760
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     40 C
  Blocks sent to initiator = 301595287879680
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     30 C
  Blocks sent to initiator = 190686448058368
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdm on /data/osd.30 type btrfs (rw,noatime)
/dev/sdn on /data/osd.31 type btrfs (rw,noatime)
/dev/sdo on /data/osd.32 type btrfs (rw,noatime)
/dev/sdp on /data/osd.33 type btrfs (rw,noatime)
/dev/sdq on /data/osd.34 type btrfs (rw,noatime)
/dev/sdr on /data/osd.35 type btrfs (rw,noatime)
/dev/sds on /data/osd.36 type btrfs (rw,noatime)
/dev/sdt on /data/osd.37 type btrfs (rw,noatime)
--- RX37-4c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-4 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856432 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdd 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sde 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdf 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdg 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdh 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdi 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdj 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdk 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     33 C
  Blocks sent to initiator = 326270260871168
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     29 C
  Blocks sent to initiator = 230247207272448
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 168513041858560
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 171904673513472
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     30 C
  Blocks sent to initiator = 175995797635072
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 206814587125760
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     26 C
  Blocks sent to initiator = 239652363567104
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     32 C
  Blocks sent to initiator = 221954917269504
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdd on /data/osd.40 type btrfs (rw,noatime)
/dev/sde on /data/osd.41 type btrfs (rw,noatime)
/dev/sdf on /data/osd.42 type btrfs (rw,noatime)
/dev/sdg on /data/osd.43 type btrfs (rw,noatime)
/dev/sdh on /data/osd.44 type btrfs (rw,noatime)
/dev/sdi on /data/osd.45 type btrfs (rw,noatime)
/dev/sdj on /data/osd.46 type btrfs (rw,noatime)
/dev/sdk on /data/osd.47 type btrfs (rw,noatime)
--- RX37-5c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-5 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       74226012 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdu 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdv 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 195550280417280
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 177656960122880
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     41 C
  Blocks sent to initiator = 238550402465792
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     31 C
  Blocks sent to initiator = 226579741409280
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     33 C
  Blocks sent to initiator = 186652383248384
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 219684389519360
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     39 C
  Blocks sent to initiator = 223471107833856
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     29 C
  Blocks sent to initiator = 190300723085312
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdo on /data/osd.50 type btrfs (rw,noatime)
/dev/sdp on /data/osd.51 type btrfs (rw,noatime)
/dev/sdq on /data/osd.52 type btrfs (rw,noatime)
/dev/sdr on /data/osd.53 type btrfs (rw,noatime)
/dev/sds on /data/osd.54 type btrfs (rw,noatime)
/dev/sdt on /data/osd.55 type btrfs (rw,noatime)
/dev/sdu on /data/osd.56 type btrfs (rw,noatime)
/dev/sdv on /data/osd.57 type btrfs (rw,noatime)
--- RX37-6c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-6 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856344 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdn 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdu 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     41 C
  Blocks sent to initiator = 195597608943616
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 197325225984000
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     42 C
  Blocks sent to initiator = 182463498289152
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     45 C
  Blocks sent to initiator = 250870398713856
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 209343584665600
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     33 C
  Blocks sent to initiator = 226728102330368
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     43 C
  Blocks sent to initiator = 213839006138368
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     38 C
  Blocks sent to initiator = 179503745728512
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdn on /data/osd.60 type btrfs (rw,noatime)
/dev/sdo on /data/osd.61 type btrfs (rw,noatime)
/dev/sdp on /data/osd.62 type btrfs (rw,noatime)
/dev/sdq on /data/osd.63 type btrfs (rw,noatime)
/dev/sdr on /data/osd.64 type btrfs (rw,noatime)
/dev/sds on /data/osd.65 type btrfs (rw,noatime)
/dev/sdt on /data/osd.66 type btrfs (rw,noatime)
/dev/sdu on /data/osd.67 type btrfs (rw,noatime)
--- RX37-7c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-7 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856344 kB
optimal_io_size: 4194304
65536
scheduler:       [noop] deadline cfq 
noop deadline [cfq] 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
--- RX37-8c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-8 3.0.36-16-default #1 SMP Wed Jul 18 00:18:54 UTC 2012 (544e41f) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       65952088 kB
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
--------------------------------------------------------------------------------

dumped osdmap epoch 15
epoch 15
fsid 7ab4662b-0575-4875-b59d-3bef85bb918d
created 2012-08-26 15:10:43.529294
modifed 2012-08-26 15:11:09.537529
flags 

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0

max_osd 68
osd.30 up   in  weight 1 up_from 2 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6800/7884 192.168.114.52:6800/7884 192.168.114.52:6801/7884 exists,up f1912b6b-2abf-4eef-83e0-8657d78e48f8
osd.31 up   in  weight 1 up_from 4 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6801/8057 192.168.114.52:6802/8057 192.168.114.52:6803/8057 exists,up 2a254612-5242-4ae8-8ba7-3fe2eaa3eec5
osd.32 up   in  weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6802/8225 192.168.114.52:6804/8225 192.168.114.52:6805/8225 exists,up d41508ee-131c-47b8-9218-8f81bc7f7716
osd.33 up   in  weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6803/8415 192.168.114.52:6806/8415 192.168.114.52:6807/8415 exists,up 2e5a96be-ca3a-4c7d-8895-b61c07d858ac
osd.34 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6804/8588 192.168.114.52:6808/8588 192.168.114.52:6809/8588 exists,up 214d8253-ad9b-4268-ba67-365ae9bc612a
osd.35 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6805/8777 192.168.114.52:6810/8777 192.168.114.52:6811/8777 exists,up 9d328117-581a-4fdb-bee8-e373e74ee013
osd.36 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6806/8966 192.168.114.52:6812/8966 192.168.114.52:6813/8966 exists,up 0d046c45-ddd3-4c24-814c-36ace0632167
osd.37 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.52:6807/9155 192.168.114.52:6814/9155 192.168.114.52:6815/9155 exists,up 2265a65a-624c-4729-bf64-47850270b4a9
osd.40 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6800/14455 192.168.114.53:6800/14455 192.168.114.53:6801/14455 exists,up e782364f-c5ee-4181-98ba-8e8009a789db
osd.41 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6801/14639 192.168.114.53:6802/14639 192.168.114.53:6803/14639 exists,up 3154b1e5-e49a-417a-9b80-d64995afb2c8
osd.42 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6802/14816 192.168.114.53:6804/14816 192.168.114.53:6805/14816 exists,up a7cab833-70b2-4067-83a3-a8a7b7ccb1c2
osd.43 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6803/15013 192.168.114.53:6806/15013 192.168.114.53:6807/15013 exists,up 5afeea03-5a5d-4643-bbde-aaadda1bde01
osd.44 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6804/15190 192.168.114.53:6808/15190 192.168.114.53:6809/15190 exists,up 5b1a90a2-596d-40d4-b33d-cf74142f7e96
osd.45 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6805/15420 192.168.114.53:6810/15420 192.168.114.53:6811/15420 exists,up e4d85019-c8d4-4dc8-bec3-ceaddab60b99
osd.46 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6806/15623 192.168.114.53:6812/15623 192.168.114.53:6813/15623 exists,up 0a1b6a02-1b70-457f-9602-8f02e00d7ae1
osd.47 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.53:6807/15826 192.168.114.53:6814/15826 192.168.114.53:6815/15826 exists,up 7be9d381-8c38-440c-ae22-fc29a9349351
osd.50 up   in  weight 1 up_from 5 up_thru 12 down_at 0 last_clean_interval [0,0) 192.168.113.54:6800/1915 192.168.114.54:6800/1915 192.168.114.54:6801/1915 exists,up 7653343d-5602-4a6e-ac69-a278dab28c8c
osd.51 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.54:6801/2155 192.168.114.54:6802/2155 192.168.114.54:6803/2155 exists,up a58bfbfb-8f21-4939-8ca1-b8209be68a30
osd.52 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.54:6802/2322 192.168.114.54:6804/2322 192.168.114.54:6805/2322 exists,up 81daeb73-23f4-4f68-b56b-7d5a1b95e7e0
osd.53 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.54:6803/2515 192.168.114.54:6806/2515 192.168.114.54:6807/2515 exists,up b3978c52-f689-45e8-9ee2-681e3bdeeeb2
osd.54 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.54:6804/2702 192.168.114.54:6808/2702 192.168.114.54:6809/2702 exists,up 205b59d3-176a-4048-84c5-81dd181a8e71
osd.55 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 192.168.113.54:6805/2889 192.168.114.54:6810/2889 192.168.114.54:6811/2889 exists,up cd4d82de-0da8-48b0-a54f-d1372b611958
osd.56 up   in  weight 1 up_from 6 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.54:6806/3082 192.168.114.54:6812/3082 192.168.114.54:6813/3082 exists,up b82b38a6-64ad-487a-899b-6c62ebe6bb13
osd.57 up   in  weight 1 up_from 6 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.54:6807/3269 192.168.114.54:6814/3269 192.168.114.54:6815/3269 exists,up c155cf46-d287-4439-a39e-ff80c22e0caa
osd.60 up   in  weight 1 up_from 7 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6800/30607 192.168.114.55:6800/30607 192.168.114.55:6801/30607 exists,up ab8370bf-c722-4eab-9842-498b6dfef765
osd.61 up   in  weight 1 up_from 7 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6801/30801 192.168.114.55:6802/30801 192.168.114.55:6803/30801 exists,up a189a254-efcd-4129-867e-384cd0765d19
osd.62 up   in  weight 1 up_from 8 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6802/30946 192.168.114.55:6804/30946 192.168.114.55:6805/30946 exists,up 2ddc9000-a5be-4c7f-9362-2c525b93db7f
osd.63 up   in  weight 1 up_from 9 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6803/31139 192.168.114.55:6806/31139 192.168.114.55:6807/31139 exists,up 5c4661fb-4c6c-411d-bf46-b4ead15a019a
osd.64 up   in  weight 1 up_from 9 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6804/31332 192.168.114.55:6808/31332 192.168.114.55:6809/31332 exists,up b67f9e9b-d0f6-41b9-ac7f-0c355950316f
osd.65 up   in  weight 1 up_from 10 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6805/31525 192.168.114.55:6810/31525 192.168.114.55:6811/31525 exists,up 9e179b5f-b0ca-4799-8b02-13fc3a78eda5
osd.66 up   in  weight 1 up_from 10 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6806/31814 192.168.114.55:6812/31814 192.168.114.55:6813/31814 exists,up e300060b-ac96-4ed0-9670-ffe3d7547a18
osd.67 up   in  weight 1 up_from 11 up_thru 14 down_at 0 last_clean_interval [0,0) 192.168.113.55:6807/32063 192.168.114.55:6814/32063 192.168.114.55:6815/32063 exists,up f87f78b3-61ba-403a-b012-ddd055ced47f



ceph.conf
---content---
# global
[global]
	# enable secure authentication
	auth supported = none

        # allow ourselves to open a lot of files
        #max open files = 1100000
        max open files = 131072

        # set log file
        log file = /ceph/log/$name.log
        # log_to_syslog = true        # uncomment this line to log to syslog

        # set up pid files
        pid file = /var/run/ceph/$name.pid

        # If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible
        #ms bind ipv6 = true
	public network = 192.168.113.0/24
	cluster network = 192.168.114.0/24

# monitors
#  You need at least one.  You need at least three if you want to
#  tolerate any node failures.  Always create an odd number.
[mon]
        mon data = /ceph/$name

        # If you are using for example the RADOS Gateway and want to have your newly created
        # pools a higher replication level, you can set a default
        #osd pool default size = 3

        # You can also specify a CRUSH rule for new pools
        # Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
        #osd pool default crush rule = 0

        # Timing is critical for monitors, but if you want to allow the clocks to drift a
        # bit more, you can specify the max drift.
        #mon clock drift allowed = 1

        # Tell the monitor to backoff from this warning for 30 seconds
        #mon clock drift warn backoff = 30

	# logging, for debugging monitor crashes, in order of
	# their likelihood of being helpful :)
	#debug ms = 1
	#debug mon = 20
	#debug paxos = 20
	#debug auth = 20
	debug optracker = 0

[mon.0]
	host = RX37-3c
	mon addr = 192.168.113.52:6789
[mon.1]
	host = RX37-7c
	mon addr = 192.168.113.56:6789
[mon.2]
	host = RX37-8c
	mon addr = 192.168.113.57:6789
	
# mds
#  You need at least one.  Define two to get a standby.
[mds]
#        mds data = /ceph/$name
	# where the mds keeps it's secret encryption keys
	#keyring = /data/keyring.$name

	# mds logging to debug issues.
	#debug ms = 1
	#debug mds = 20
	debug optracker = 0

[mds.0]
        host = RX37-8c

# osd
#  You need at least one.  Two if you want data to be replicated.
#  Define as many as you like.
[osd]
	# This is where the btrfs volume will be mounted.
	osd data = /data/$name

#        journal dio = true
#        osd op threads = 24
#        osd disk threads = 24
#        filestore op threads = 6
#        filestore queue max ops = 24

	# Ideally, make this a separate disk or partition.  A few
 	# hundred MB should be enough; more if you have fast or many
 	# disks.  You can use a file under the osd data dir if need be
 	# (e.g. /data/$name/journal), but it will be slower than a
 	# separate disk or partition.

        # This is an example of a file-based journal.
	# osd journal = /ceph/$name/journal
	# osd journal size = 2048 
	# journal size, in megabytes

        # If you want to run the journal on a tmpfs, disable DirectIO
        #journal dio = false

        # You can change the number of recovery operations to speed up recovery
        # or slow it down if your machines can't handle it
        # osd recovery max active = 3

	# osd logging to debug osd issues, in order of likelihood of being
	# helpful
	#debug ms = 1
	#debug osd = 20
	#debug filestore = 20
	#debug journal = 20
	debug optracker = 0
	fstype = btrfs

[osd.30]
	host = RX37-3c
	devs = /dev/sdm
	osd journal = /dev/ram0
[osd.31]
	host = RX37-3c
	devs = /dev/sdn
	osd journal = /dev/ram1
[osd.32]
	host = RX37-3c
	devs = /dev/sdo
	osd journal = /dev/ram2
[osd.33]
	host = RX37-3c
	devs = /dev/sdp
	osd journal = /dev/ram3
[osd.34]
	host = RX37-3c
	devs = /dev/sdq
	osd journal = /dev/ram4
[osd.35]
	host = RX37-3c
	devs = /dev/sdr
	osd journal = /dev/ram5
[osd.36]
	host = RX37-3c
	devs = /dev/sds
	osd journal = /dev/ram6
[osd.37]
	host = RX37-3c
	devs = /dev/sdt
	osd journal = /dev/ram7
[osd.40]
	host = RX37-4c
	devs = /dev/sdd
	osd journal = /dev/ram0
[osd.41]
	host = RX37-4c
	devs = /dev/sde
	osd journal = /dev/ram1
[osd.42]
	host = RX37-4c
	devs = /dev/sdf
	osd journal = /dev/ram2
[osd.43]
	host = RX37-4c
	devs = /dev/sdg
	osd journal = /dev/ram3
[osd.44]
	host = RX37-4c
	devs = /dev/sdh
	osd journal = /dev/ram4
[osd.45]
	host = RX37-4c
	devs = /dev/sdi
	osd journal = /dev/ram5
[osd.46]
	host = RX37-4c
	devs = /dev/sdj
	osd journal = /dev/ram6
[osd.47]
	host = RX37-4c
	devs = /dev/sdk
	osd journal = /dev/ram7
[osd.50]
	host = RX37-5c
	devs = /dev/sdo
	osd journal = /dev/ram0
[osd.51]
	host = RX37-5c
	devs = /dev/sdp
	osd journal = /dev/ram1
[osd.52]
	host = RX37-5c
	devs = /dev/sdq
	osd journal = /dev/ram2
[osd.53]
	host = RX37-5c
	devs = /dev/sdr
	osd journal = /dev/ram3
[osd.54]
	host = RX37-5c
	devs = /dev/sds
	osd journal = /dev/ram4
[osd.55]
	host = RX37-5c
	devs = /dev/sdt
	osd journal = /dev/ram5
[osd.56]
	host = RX37-5c
	devs = /dev/sdu
	osd journal = /dev/ram6
[osd.57]
	host = RX37-5c
	devs = /dev/sdv
	osd journal = /dev/ram7
[osd.60]
	host = RX37-6c
	devs = /dev/sdn
	osd journal = /dev/ram0
[osd.61]
	host = RX37-6c
	devs = /dev/sdo
	osd journal = /dev/ram1
[osd.62]
	host = RX37-6c
	devs = /dev/sdp
	osd journal = /dev/ram2
[osd.63]
	host = RX37-6c
	devs = /dev/sdq
	osd journal = /dev/ram3
[osd.64]
	host = RX37-6c
	devs = /dev/sdr
	osd journal = /dev/ram4
[osd.65]
	host = RX37-6c
	devs = /dev/sds
	osd journal = /dev/ram5
[osd.66]
	host = RX37-6c
	devs = /dev/sdt
	osd journal = /dev/ram6
[osd.67]
	host = RX37-6c
	devs = /dev/sdu
	osd journal = /dev/ram7
	devs = /dev/sdc

[client.01]
	client hostname = RX37-7c


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
@ 2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
  2012-08-28 19:04     ` Dieter Kasper
  2012-08-29  8:50   ` Alexandre DERUMIER
  1 sibling, 1 reply; 31+ messages in thread
From: Smart Weblications GmbH - Florian Wiessner @ 2012-08-28 18:53 UTC (permalink / raw)
  To: Dieter Kasper, ceph-devel

Am 28.08.2012 19:48, schrieb Dieter Kasper:
> Hi,
> 
> on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> I can observe a pretty nice rados bench performance 
> (see bench-rados.txt for details):

i'd like to know which 10GE Switch you have used? Do you use 10GE-Base-T?




-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
@ 2012-08-28 19:04     ` Dieter Kasper
  0 siblings, 0 replies; 31+ messages in thread
From: Dieter Kasper @ 2012-08-28 19:04 UTC (permalink / raw)
  To: Smart Weblications GmbH - Florian Wiessner; +Cc: ceph-devel@vger.kernel.org

On Tue, Aug 28, 2012 at 08:53:46PM +0200, Smart Weblications GmbH - Florian Wiessner wrote:
> Am 28.08.2012 19:48, schrieb Dieter Kasper:
> > Hi,
> > 
> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> > I can observe a pretty nice rados bench performance 
> > (see bench-rados.txt for details):
> 
> i'd like to know which 10GE Switch you have used? Do you use 10GE-Base-T?
http://www.brocade.com/products/all/switches/product-details/turboiron-24x-switch/index.page

Mit freundlichen Grüßen
Dieter Kasper

> 
> 
> 
> 
> -- 
> 
> Mit freundlichen Grüßen,
> 
> Florian Wiessner
> 
> Smart Weblications GmbH
> Martinsberger Str. 1
> D-95119 Naila
> 
> fon.: +49 9282 9638 200
> fax.: +49 9282 9638 205
> 24/7: +49 900 144 000 00 - 0,99 EUR/Min*
> http://www.smart-weblications.de
> 
> --
> Sitz der Gesellschaft: Naila
> Geschäftsführer: Florian Wiessner
> HRB-Nr.: HRB 3840 Amtsgericht Hof
> *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
  2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
@ 2012-08-29  8:50   ` Alexandre DERUMIER
  2012-08-29 17:37     ` Josh Durgin
  2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
  1 sibling, 2 replies; 31+ messages in thread
From: Alexandre DERUMIER @ 2012-08-29  8:50 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: ceph-devel

Nice results !
(can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)

>>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
I think you can try to tune these values

filestore max sync interval = 30
filestore min sync interval = 29
filestore flusher = false
filestore queue max ops = 10000



----- Mail original ----- 

De: "Dieter Kasper" <d.kasper@kabelmail.de> 
À: ceph-devel@vger.kernel.org 
Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
Envoyé: Mardi 28 Août 2012 19:48:42 
Objet: RBD performance - tuning hints 

Hi, 

on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
I can observe a pretty nice rados bench performance 
(see bench-rados.txt for details): 

Bandwidth (MB/sec): 961.710 
Max bandwidth (MB/sec): 1040 
Min bandwidth (MB/sec): 772 


Also the bandwidth performance generated with 
fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 

.... is acceptable, e.g. 
fio_write_4m_16 795 MB/s 
fio_randwrite_8m_128 717 MB/s 
fio_randwrite_8m_16 714 MB/s 
fio_randwrite_2m_32 692 MB/s 


But, the write IOPS seems to be limited around 19k ... 
RBD 4M 64k (= optimal_io_size) 
fio_randread_512_128 53286 55925 
fio_randread_4k_128 51110 44382 
fio_randread_8k_128 30854 29938 
fio_randwrite_512_128 18888 2386 
fio_randwrite_512_64 18844 2582 
fio_randwrite_8k_64 17350 2445 
(...) 
fio_read_4k_128 10073 53151 
fio_read_4k_64 9500 39757 
fio_read_4k_32 9220 23650 
(...) 
fio_read_4k_16 9122 14322 
fio_write_4k_128 2190 14306 
fio_read_8k_32 706 13894 
fio_write_4k_64 2197 12297 
fio_write_8k_64 3563 11705 
fio_write_8k_128 3444 11219 


Any hints for tuning the IOPS (read and/or write) would be appreciated. 

How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 


Kind Regards, 
-Dieter 



-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-29  8:50   ` Alexandre DERUMIER
@ 2012-08-29 17:37     ` Josh Durgin
  2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
  2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
  1 sibling, 1 reply; 31+ messages in thread
From: Josh Durgin @ 2012-08-29 17:37 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Dieter Kasper, ceph-devel

On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
> Nice results !
> (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>
>>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> I think you can try to tune these values
>
> filestore max sync interval = 30
> filestore min sync interval = 29
> filestore flusher = false
> filestore queue max ops = 10000

Increasing filestore_op_threads might help as well.

> ----- Mail original -----
>
> De: "Dieter Kasper" <d.kasper@kabelmail.de>
> À: ceph-devel@vger.kernel.org
> Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> Envoyé: Mardi 28 Août 2012 19:48:42
> Objet: RBD performance - tuning hints
>
> Hi,
>
> on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> I can observe a pretty nice rados bench performance
> (see bench-rados.txt for details):
>
> Bandwidth (MB/sec): 961.710
> Max bandwidth (MB/sec): 1040
> Min bandwidth (MB/sec): 772
>
>
> Also the bandwidth performance generated with
> fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>
> .... is acceptable, e.g.
> fio_write_4m_16 795 MB/s
> fio_randwrite_8m_128 717 MB/s
> fio_randwrite_8m_16 714 MB/s
> fio_randwrite_2m_32 692 MB/s
>
>
> But, the write IOPS seems to be limited around 19k ...
> RBD 4M 64k (= optimal_io_size)
> fio_randread_512_128 53286 55925
> fio_randread_4k_128 51110 44382
> fio_randread_8k_128 30854 29938
> fio_randwrite_512_128 18888 2386
> fio_randwrite_512_64 18844 2582
> fio_randwrite_8k_64 17350 2445
> (...)
> fio_read_4k_128 10073 53151
> fio_read_4k_64 9500 39757
> fio_read_4k_32 9220 23650
> (...)
> fio_read_4k_16 9122 14322
> fio_write_4k_128 2190 14306
> fio_read_8k_32 706 13894
> fio_write_4k_64 2197 12297
> fio_write_8k_64 3563 11705
> fio_write_8k_128 3444 11219
>
>
> Any hints for tuning the IOPS (read and/or write) would be appreciated.
>
> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>
>
> Kind Regards,
> -Dieter
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / parameter doc
  2012-08-29 17:37     ` Josh Durgin
@ 2012-08-29 19:29       ` Dieter Kasper
  2012-08-29 22:34         ` Samuel Just
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-29 19:29 UTC (permalink / raw)
  To: Josh Durgin
  Cc: Alexandre DERUMIER, ceph-devel@vger.kernel.org,
	Dieter Kasper (KD)

Hi Josh,

thanks for the hint.
Can you please spend a view words about the meaing of these parameters ?
- filestore min/max sync interval = 	int/float ?	seconds ? of what ?
- filestore flusher = false
- filestore queue max ops = 10000	
	what is 'one op' ?	queue in front of what ?
- filestore op threads =	
	what are useful values here ?

- journal dio = true/false
- osd op threads = 
- osd disk threads = 


Kind Regards,
-Dieter


On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote:
> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
> > Nice results !
> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> >
> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> > I think you can try to tune these values
> >
> > filestore max sync interval = 30
> > filestore min sync interval = 29
> > filestore flusher = false
> > filestore queue max ops = 10000
> 
> Increasing filestore_op_threads might help as well.
> 
> > ----- Mail original -----
> >
> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> > À: ceph-devel@vger.kernel.org
> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> > Envoyé: Mardi 28 Août 2012 19:48:42
> > Objet: RBD performance - tuning hints
> >
> > Hi,
> >
> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> > I can observe a pretty nice rados bench performance
> > (see bench-rados.txt for details):
> >
> > Bandwidth (MB/sec): 961.710
> > Max bandwidth (MB/sec): 1040
> > Min bandwidth (MB/sec): 772
> >
> >
> > Also the bandwidth performance generated with
> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
> >
> > .... is acceptable, e.g.
> > fio_write_4m_16 795 MB/s
> > fio_randwrite_8m_128 717 MB/s
> > fio_randwrite_8m_16 714 MB/s
> > fio_randwrite_2m_32 692 MB/s
> >
> >
> > But, the write IOPS seems to be limited around 19k ...
> > RBD 4M 64k (= optimal_io_size)
> > fio_randread_512_128 53286 55925
> > fio_randread_4k_128 51110 44382
> > fio_randread_8k_128 30854 29938
> > fio_randwrite_512_128 18888 2386
> > fio_randwrite_512_64 18844 2582
> > fio_randwrite_8k_64 17350 2445
> > (...)
> > fio_read_4k_128 10073 53151
> > fio_read_4k_64 9500 39757
> > fio_read_4k_32 9220 23650
> > (...)
> > fio_read_4k_16 9122 14322
> > fio_write_4k_128 2190 14306
> > fio_read_8k_32 706 13894
> > fio_write_4k_64 2197 12297
> > fio_write_8k_64 3563 11705
> > fio_write_8k_128 3444 11219
> >
> >
> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
> >
> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >
> >
> > Kind Regards,
> > -Dieter
> >
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / parameter doc
  2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
@ 2012-08-29 22:34         ` Samuel Just
  2012-08-30 15:08           ` Dieter Kasper
  0 siblings, 1 reply; 31+ messages in thread
From: Samuel Just @ 2012-08-29 22:34 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: Josh Durgin, Alexandre DERUMIER, ceph-devel@vger.kernel.org

filestore [min|max] sync interval:

Periodically, the filestore needs to quiesce writes and do a syncfs in
order to create
a consistent commit point up to which it can free journal entries.  Syncing more
frequently tends to reduce the time required to do the sync, and
reduces the amount
of data that needs to remain in the journal.  Less frequent syncs
would allow the
backing filesystem to better coalesce small writes and metadata
updates hopefully
resulting in more efficient syncs.  'filestore max sync interval'
defines the maximum
time period between syncs, 'filestore min sync interval' defines the
minimum time
period between syncs.

filestore flusher:

The filestore flusher forces data from large writes to be written out
using sync_file_range
before the sync in order to (hopefully) reduce the cost of the
eventual sync.  In practice,
disabling 'filestore flusher' seems to improve performance in some cases.

filestore queue max ops:

'filestore queue max ops' defines the number of in progress ops the
filestore will accept
before blocking on queueing new ones.  This mostly shouldn't have much
of an effect
on performance and should probably be ignored.

filestore op threads:

'filestore op threads' defines the number of threads used to submit
filesystem operations
in parallel.

journal dio:

'journal dio' enables using O_DIRECT for writing to the journal.  This
should usually
be enabled.  If possible, 'journal aio' should also be enabled to
allow use of libaio
to do asynchronous writes.

osd op threads:

'osd op threads' defines the size of the thread pool used to service
OSD operations
such as client requests.  Increasing this may increase the rate of
request processing.

osd disk threads:

'osd disk threads' defines the number of threads used to perform background disk
intensive osd operations such as scrubbing and snap trimming.

On Wed, Aug 29, 2012 at 12:29 PM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
> Hi Josh,
>
> thanks for the hint.
> Can you please spend a view words about the meaing of these parameters ?
> - filestore min/max sync interval =     int/float ?     seconds ? of what ?
> - filestore flusher = false
> - filestore queue max ops = 10000
>         what is 'one op' ?      queue in front of what ?
> - filestore op threads =
>         what are useful values here ?
>
> - journal dio = true/false
> - osd op threads =
> - osd disk threads =
>
>
> Kind Regards,
> -Dieter
>
>
> On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote:
>> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
>> > Nice results !
>> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
>> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>> >
>> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> > I think you can try to tune these values
>> >
>> > filestore max sync interval = 30
>> > filestore min sync interval = 29
>> > filestore flusher = false
>> > filestore queue max ops = 10000
>>
>> Increasing filestore_op_threads might help as well.
>>
>> > ----- Mail original -----
>> >
>> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> > À: ceph-devel@vger.kernel.org
>> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
>> > Envoyé: Mardi 28 Août 2012 19:48:42
>> > Objet: RBD performance - tuning hints
>> >
>> > Hi,
>> >
>> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
>> > I can observe a pretty nice rados bench performance
>> > (see bench-rados.txt for details):
>> >
>> > Bandwidth (MB/sec): 961.710
>> > Max bandwidth (MB/sec): 1040
>> > Min bandwidth (MB/sec): 772
>> >
>> >
>> > Also the bandwidth performance generated with
>> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>> >
>> > .... is acceptable, e.g.
>> > fio_write_4m_16 795 MB/s
>> > fio_randwrite_8m_128 717 MB/s
>> > fio_randwrite_8m_16 714 MB/s
>> > fio_randwrite_2m_32 692 MB/s
>> >
>> >
>> > But, the write IOPS seems to be limited around 19k ...
>> > RBD 4M 64k (= optimal_io_size)
>> > fio_randread_512_128 53286 55925
>> > fio_randread_4k_128 51110 44382
>> > fio_randread_8k_128 30854 29938
>> > fio_randwrite_512_128 18888 2386
>> > fio_randwrite_512_64 18844 2582
>> > fio_randwrite_8k_64 17350 2445
>> > (...)
>> > fio_read_4k_128 10073 53151
>> > fio_read_4k_64 9500 39757
>> > fio_read_4k_32 9220 23650
>> > (...)
>> > fio_read_4k_16 9122 14322
>> > fio_write_4k_128 2190 14306
>> > fio_read_8k_32 706 13894
>> > fio_write_4k_64 2197 12297
>> > fio_write_8k_64 3563 11705
>> > fio_write_8k_128 3444 11219
>> >
>> >
>> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
>> >
>> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> >
>> >
>> > Kind Regards,
>> > -Dieter
>> >
>> >
>> >
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-29  8:50   ` Alexandre DERUMIER
  2012-08-29 17:37     ` Josh Durgin
@ 2012-08-30 14:56     ` Dieter Kasper
  2012-08-30 15:28       ` Alexandre DERUMIER
  1 sibling, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-30 14:56 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel@vger.kernel.org

Hi Alexandre,

with the 4 filestore parameter below some fio values could be increased:
  filestore max sync interval = 30
  filestore min sync interval = 29
  filestore flusher = false
  filestore queue max ops = 10000

###### IOPS 
fio_read_4k_64:              9373 
fio_read_4k_128:             9939 
fio_randwrite_8k_16:        12376 
fio_randwrite_4k_16:        13315 
fio_randwrite_512_32:       13660 
fio_randwrite_8k_32:        17318 
fio_randwrite_4k_32:        18057 
fio_randwrite_8k_64:        19693 
fio_randwrite_512_64:       20015 	<<<
fio_randwrite_4k_64:        20024 	<<<
fio_randwrite_8k_128:       20547 	<<<
fio_randwrite_4k_128:       20839 	<<<
fio_randwrite_512_128:      21417 	<<<
fio_randread_8k_128:        48872 
fio_randread_4k_128:        50002 
fio_randread_512_128:       51202 

###### MB/s 
fio_randread_2m_32:           628 
fio_read_4m_64:               630 
fio_randread_8m_32:           633 
fio_read_2m_32:               637 
fio_read_4m_16:               640 
fio_randread_4m_16:           652 
fio_write_2m_32:              660 
fio_randread_4m_32:           677 
fio_read_4m_32:               678 
(...)
fio_write_4m_64:              771 
fio_randwrite_2m_64:          789 
fio_write_8m_128:             796 
fio_write_4m_32:              802 
fio_randwrite_4m_128:         807 	<<<
fio_randwrite_2m_32:          811 	<<<
fio_write_2m_128:             833 	<<<
fio_write_8m_64:              901 	<<<

Best Regards,
-Dieter


On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
> Nice results !
> (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> 
> >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> I think you can try to tune these values
> 
> filestore max sync interval = 30
> filestore min sync interval = 29
> filestore flusher = false
> filestore queue max ops = 10000
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: ceph-devel@vger.kernel.org 
> Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> Envoyé: Mardi 28 Août 2012 19:48:42 
> Objet: RBD performance - tuning hints 
> 
> Hi, 
> 
> on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> I can observe a pretty nice rados bench performance 
> (see bench-rados.txt for details): 
> 
> Bandwidth (MB/sec): 961.710 
> Max bandwidth (MB/sec): 1040 
> Min bandwidth (MB/sec): 772 
> 
> 
> Also the bandwidth performance generated with 
> fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> 
> .... is acceptable, e.g. 
> fio_write_4m_16 795 MB/s 
> fio_randwrite_8m_128 717 MB/s 
> fio_randwrite_8m_16 714 MB/s 
> fio_randwrite_2m_32 692 MB/s 
> 
> 
> But, the write IOPS seems to be limited around 19k ... 
> RBD 4M 64k (= optimal_io_size) 
> fio_randread_512_128 53286 55925 
> fio_randread_4k_128 51110 44382 
> fio_randread_8k_128 30854 29938 
> fio_randwrite_512_128 18888 2386 
> fio_randwrite_512_64 18844 2582 
> fio_randwrite_8k_64 17350 2445 
> (...) 
> fio_read_4k_128 10073 53151 
> fio_read_4k_64 9500 39757 
> fio_read_4k_32 9220 23650 
> (...) 
> fio_read_4k_16 9122 14322 
> fio_write_4k_128 2190 14306 
> fio_read_8k_32 706 13894 
> fio_write_4k_64 2197 12297 
> fio_write_8k_64 3563 11705 
> fio_write_8k_128 3444 11219 
> 
> 
> Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> 
> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> 
> 
> Kind Regards, 
> -Dieter 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 	
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / parameter doc
  2012-08-29 22:34         ` Samuel Just
@ 2012-08-30 15:08           ` Dieter Kasper
  2012-08-30 20:39             ` Samuel Just
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-30 15:08 UTC (permalink / raw)
  To: Samuel Just; +Cc: Josh Durgin, Alexandre DERUMIER, ceph-devel@vger.kernel.org

Samuel,

thank you very much for this explicitely description!

As far as I understand the journal acts as a ringbuffer in front of the OSD.
Using time as a parameter to trigger sync might not be the best for 
a dynamic Storage subsystem. On a high workload e.g. 10/20 for min/max 
might be optimal for for 4 nodes with 10 OSDs each, 
but not after adding 4 additional nodes.

Are there parameters to trigger the syncs to OSD
in relation to the fill grade of the journal ?
e.g.
filestore [min|max] sync percent:

Do not sync before min-% full; sync after max-% full

What would happen if I set "filestore [min|max] sync interval" to 999999 ?
Will the journal sync start at 100% full or at X% ?
What is 'X' by defaut ?
How can I set 'X' ?

Best Regards,
-Dieter


On Thu, Aug 30, 2012 at 12:34:43AM +0200, Samuel Just wrote:
> filestore [min|max] sync interval:
> 
> Periodically, the filestore needs to quiesce writes and do a syncfs in
> order to create
> a consistent commit point up to which it can free journal entries.  Syncing more
> frequently tends to reduce the time required to do the sync, and
> reduces the amount
> of data that needs to remain in the journal.  Less frequent syncs
> would allow the
> backing filesystem to better coalesce small writes and metadata
> updates hopefully
> resulting in more efficient syncs.  'filestore max sync interval'
> defines the maximum
> time period between syncs, 'filestore min sync interval' defines the
> minimum time
> period between syncs.
> 
> filestore flusher:
> 
> The filestore flusher forces data from large writes to be written out
> using sync_file_range
> before the sync in order to (hopefully) reduce the cost of the
> eventual sync.  In practice,
> disabling 'filestore flusher' seems to improve performance in some cases.
> 
> filestore queue max ops:
> 
> 'filestore queue max ops' defines the number of in progress ops the
> filestore will accept
> before blocking on queueing new ones.  This mostly shouldn't have much
> of an effect
> on performance and should probably be ignored.
> 
> filestore op threads:
> 
> 'filestore op threads' defines the number of threads used to submit
> filesystem operations
> in parallel.
> 
> journal dio:
> 
> 'journal dio' enables using O_DIRECT for writing to the journal.  This
> should usually
> be enabled.  If possible, 'journal aio' should also be enabled to
> allow use of libaio
> to do asynchronous writes.
> 
> osd op threads:
> 
> 'osd op threads' defines the size of the thread pool used to service
> OSD operations
> such as client requests.  Increasing this may increase the rate of
> request processing.
> 
> osd disk threads:
> 
> 'osd disk threads' defines the number of threads used to perform background disk
> intensive osd operations such as scrubbing and snap trimming.
> 
> On Wed, Aug 29, 2012 at 12:29 PM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
> > Hi Josh,
> >
> > thanks for the hint.
> > Can you please spend a view words about the meaing of these parameters ?
> > - filestore min/max sync interval =     int/float ?     seconds ? of what ?
> > - filestore flusher = false
> > - filestore queue max ops = 10000
> >         what is 'one op' ?      queue in front of what ?
> > - filestore op threads =
> >         what are useful values here ?
> >
> > - journal dio = true/false
> > - osd op threads =
> > - osd disk threads =
> >
> >
> > Kind Regards,
> > -Dieter
> >
> >
> > On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote:
> >> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
> >> > Nice results !
> >> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> >> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> >> >
> >> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >> > I think you can try to tune these values
> >> >
> >> > filestore max sync interval = 30
> >> > filestore min sync interval = 29
> >> > filestore flusher = false
> >> > filestore queue max ops = 10000
> >>
> >> Increasing filestore_op_threads might help as well.
> >>
> >> > ----- Mail original -----
> >> >
> >> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> >> > À: ceph-devel@vger.kernel.org
> >> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> >> > Envoyé: Mardi 28 Août 2012 19:48:42
> >> > Objet: RBD performance - tuning hints
> >> >
> >> > Hi,
> >> >
> >> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> >> > I can observe a pretty nice rados bench performance
> >> > (see bench-rados.txt for details):
> >> >
> >> > Bandwidth (MB/sec): 961.710
> >> > Max bandwidth (MB/sec): 1040
> >> > Min bandwidth (MB/sec): 772
> >> >
> >> >
> >> > Also the bandwidth performance generated with
> >> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
> >> >
> >> > .... is acceptable, e.g.
> >> > fio_write_4m_16 795 MB/s
> >> > fio_randwrite_8m_128 717 MB/s
> >> > fio_randwrite_8m_16 714 MB/s
> >> > fio_randwrite_2m_32 692 MB/s
> >> >
> >> >
> >> > But, the write IOPS seems to be limited around 19k ...
> >> > RBD 4M 64k (= optimal_io_size)
> >> > fio_randread_512_128 53286 55925
> >> > fio_randread_4k_128 51110 44382
> >> > fio_randread_8k_128 30854 29938
> >> > fio_randwrite_512_128 18888 2386
> >> > fio_randwrite_512_64 18844 2582
> >> > fio_randwrite_8k_64 17350 2445
> >> > (...)
> >> > fio_read_4k_128 10073 53151
> >> > fio_read_4k_64 9500 39757
> >> > fio_read_4k_32 9220 23650
> >> > (...)
> >> > fio_read_4k_16 9122 14322
> >> > fio_write_4k_128 2190 14306
> >> > fio_read_8k_32 706 13894
> >> > fio_write_4k_64 2197 12297
> >> > fio_write_8k_64 3563 11705
> >> > fio_write_8k_128 3444 11219
> >> >
> >> >
> >> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
> >> >
> >> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >> >
> >> >
> >> > Kind Regards,
> >> > -Dieter
> >> >
> >> >
> >> >
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
@ 2012-08-30 15:28       ` Alexandre DERUMIER
  2012-08-30 15:33         ` Dieter Kasper
  0 siblings, 1 reply; 31+ messages in thread
From: Alexandre DERUMIER @ 2012-08-30 15:28 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: ceph-devel

Thanks for the report !

vs your first benchmark, it's with RBD 4M or 64K ?

(how much ssd by node?)



----- Mail original ----- 

De: "Dieter Kasper" <d.kasper@kabelmail.de> 
À: "Alexandre DERUMIER" <aderumier@odiso.com> 
Cc: ceph-devel@vger.kernel.org 
Envoyé: Jeudi 30 Août 2012 16:56:34 
Objet: Re: RBD performance - tuning hints 

Hi Alexandre, 

with the 4 filestore parameter below some fio values could be increased: 
filestore max sync interval = 30 
filestore min sync interval = 29 
filestore flusher = false 
filestore queue max ops = 10000 

###### IOPS 
fio_read_4k_64: 9373 
fio_read_4k_128: 9939 
fio_randwrite_8k_16: 12376 
fio_randwrite_4k_16: 13315 
fio_randwrite_512_32: 13660 
fio_randwrite_8k_32: 17318 
fio_randwrite_4k_32: 18057 
fio_randwrite_8k_64: 19693 
fio_randwrite_512_64: 20015 <<< 
fio_randwrite_4k_64: 20024 <<< 
fio_randwrite_8k_128: 20547 <<< 
fio_randwrite_4k_128: 20839 <<< 
fio_randwrite_512_128: 21417 <<< 
fio_randread_8k_128: 48872 
fio_randread_4k_128: 50002 
fio_randread_512_128: 51202 

###### MB/s 
fio_randread_2m_32: 628 
fio_read_4m_64: 630 
fio_randread_8m_32: 633 
fio_read_2m_32: 637 
fio_read_4m_16: 640 
fio_randread_4m_16: 652 
fio_write_2m_32: 660 
fio_randread_4m_32: 677 
fio_read_4m_32: 678 
(...) 
fio_write_4m_64: 771 
fio_randwrite_2m_64: 789 
fio_write_8m_128: 796 
fio_write_4m_32: 802 
fio_randwrite_4m_128: 807 <<< 
fio_randwrite_2m_32: 811 <<< 
fio_write_2m_128: 833 <<< 
fio_write_8m_64: 901 <<< 

Best Regards, 
-Dieter 


On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> Nice results ! 
> (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> 
> >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> I think you can try to tune these values 
> 
> filestore max sync interval = 30 
> filestore min sync interval = 29 
> filestore flusher = false 
> filestore queue max ops = 10000 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: ceph-devel@vger.kernel.org 
> Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> Envoyé: Mardi 28 Août 2012 19:48:42 
> Objet: RBD performance - tuning hints 
> 
> Hi, 
> 
> on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> I can observe a pretty nice rados bench performance 
> (see bench-rados.txt for details): 
> 
> Bandwidth (MB/sec): 961.710 
> Max bandwidth (MB/sec): 1040 
> Min bandwidth (MB/sec): 772 
> 
> 
> Also the bandwidth performance generated with 
> fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> 
> .... is acceptable, e.g. 
> fio_write_4m_16 795 MB/s 
> fio_randwrite_8m_128 717 MB/s 
> fio_randwrite_8m_16 714 MB/s 
> fio_randwrite_2m_32 692 MB/s 
> 
> 
> But, the write IOPS seems to be limited around 19k ... 
> RBD 4M 64k (= optimal_io_size) 
> fio_randread_512_128 53286 55925 
> fio_randread_4k_128 51110 44382 
> fio_randread_8k_128 30854 29938 
> fio_randwrite_512_128 18888 2386 
> fio_randwrite_512_64 18844 2582 
> fio_randwrite_8k_64 17350 2445 
> (...) 
> fio_read_4k_128 10073 53151 
> fio_read_4k_64 9500 39757 
> fio_read_4k_32 9220 23650 
> (...) 
> fio_read_4k_16 9122 14322 
> fio_write_4k_128 2190 14306 
> fio_read_8k_32 706 13894 
> fio_write_4k_64 2197 12297 
> fio_write_8k_64 3563 11705 
> fio_write_8k_128 3444 11219 
> 
> 
> Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> 
> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> 
> 
> Kind Regards, 
> -Dieter 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 




-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 15:28       ` Alexandre DERUMIER
@ 2012-08-30 15:33         ` Dieter Kasper
  2012-08-30 15:46           ` Alexandre DERUMIER
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-30 15:33 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote:
> Thanks for the report !
> 
> vs your first benchmark, it's with RBD 4M or 64K ?
with 4MB (see attached config info)

Cheers,
-Dieter

> 
> (how much ssd by node?)
8x SSD, 200GB each

> 
> 
> 
> ----- Mail original -----
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org
> Envoyé: Jeudi 30 Août 2012 16:56:34
> Objet: Re: RBD performance - tuning hints
> 
> Hi Alexandre,
> 
> with the 4 filestore parameter below some fio values could be increased:
> filestore max sync interval = 30
> filestore min sync interval = 29
> filestore flusher = false
> filestore queue max ops = 10000
> 
> ###### IOPS
> fio_read_4k_64: 9373
> fio_read_4k_128: 9939
> fio_randwrite_8k_16: 12376
> fio_randwrite_4k_16: 13315
> fio_randwrite_512_32: 13660
> fio_randwrite_8k_32: 17318
> fio_randwrite_4k_32: 18057
> fio_randwrite_8k_64: 19693
> fio_randwrite_512_64: 20015 <<<
> fio_randwrite_4k_64: 20024 <<<
> fio_randwrite_8k_128: 20547 <<<
> fio_randwrite_4k_128: 20839 <<<
> fio_randwrite_512_128: 21417 <<<
> fio_randread_8k_128: 48872
> fio_randread_4k_128: 50002
> fio_randread_512_128: 51202
> 
> ###### MB/s
> fio_randread_2m_32: 628
> fio_read_4m_64: 630
> fio_randread_8m_32: 633
> fio_read_2m_32: 637
> fio_read_4m_16: 640
> fio_randread_4m_16: 652
> fio_write_2m_32: 660
> fio_randread_4m_32: 677
> fio_read_4m_32: 678
> (...)
> fio_write_4m_64: 771
> fio_randwrite_2m_64: 789
> fio_write_8m_128: 796
> fio_write_4m_32: 802
> fio_randwrite_4m_128: 807 <<<
> fio_randwrite_2m_32: 811 <<<
> fio_write_2m_128: 833 <<<
> fio_write_8m_64: 901 <<<
> 
> Best Regards,
> -Dieter
> 
> 
> On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
> > Nice results !
> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
> >
> > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> > I think you can try to tune these values
> >
> > filestore max sync interval = 30
> > filestore min sync interval = 29
> > filestore flusher = false
> > filestore queue max ops = 10000
> >
> >
> >
> > ----- Mail original -----
> >
> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
> > À: ceph-devel@vger.kernel.org
> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
> > Envoyé: Mardi 28 Août 2012 19:48:42
> > Objet: RBD performance - tuning hints
> >
> > Hi,
> >
> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
> > I can observe a pretty nice rados bench performance
> > (see bench-rados.txt for details):
> >
> > Bandwidth (MB/sec): 961.710
> > Max bandwidth (MB/sec): 1040
> > Min bandwidth (MB/sec): 772
> >
> >
> > Also the bandwidth performance generated with
> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
> >
> > .... is acceptable, e.g.
> > fio_write_4m_16 795 MB/s
> > fio_randwrite_8m_128 717 MB/s
> > fio_randwrite_8m_16 714 MB/s
> > fio_randwrite_2m_32 692 MB/s
> >
> >
> > But, the write IOPS seems to be limited around 19k ...
> > RBD 4M 64k (= optimal_io_size)
> > fio_randread_512_128 53286 55925
> > fio_randread_4k_128 51110 44382
> > fio_randread_8k_128 30854 29938
> > fio_randwrite_512_128 18888 2386
> > fio_randwrite_512_64 18844 2582
> > fio_randwrite_8k_64 17350 2445
> > (...)
> > fio_read_4k_128 10073 53151
> > fio_read_4k_64 9500 39757
> > fio_read_4k_32 9220 23650
> > (...)
> > fio_read_4k_16 9122 14322
> > fio_write_4k_128 2190 14306
> > fio_read_8k_32 706 13894
> > fio_write_4k_64 2197 12297
> > fio_write_8k_64 3563 11705
> > fio_write_8k_128 3444 11219
> >
> >
> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
> >
> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
> >
> >
> > Kind Regards,
> > -Dieter
> >
> >
> >
> > --
> >
> > --
> >
> >
> >
> >
> >
> > Alexandre D e rumier
> >
> > Ingénieur Systèmes et Réseaux
> >
> >
> > Fixe : 03 20 68 88 85
> >
> > Fax : 03 20 68 90 88
> >
> >
> > 45 Bvd du Général Leclerc 59100 Roubaix
> > 12 rue Marivaux 75002 Paris
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 
> --
> 
> --
> 
> 
> 
> 
> 
> Alexandre D e rumier
> 
> Ingénieur Systèmes et Réseaux
> 
> 
> Fixe : 03 20 68 88 85
> 
> Fax : 03 20 68 90 88
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix
> 12 rue Marivaux 75002 Paris
> 

[-- Attachment #2: hwconf.txt --]
[-- Type: text/plain, Size: 26784 bytes --]

--- RX37-3c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-3 3.0.41-5.1-default #1 SMP Wed Aug 22 00:54:03 UTC 2012 (9c63123) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856332 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdm 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdn 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     38 C
  Blocks sent to initiator = 257379169992704
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     40 C
  Blocks sent to initiator = 238453816033280
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     43 C
  Blocks sent to initiator = 297650494636032
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 254438979665920
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     35 C
  Blocks sent to initiator = 238876987752448
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 259011676995584
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     41 C
  Blocks sent to initiator = 359638046343168
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     31 C
  Blocks sent to initiator = 247008082264064
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdm on /data/osd.30 type xfs (rw,noatime)
/dev/sdn on /data/osd.31 type xfs (rw,noatime)
/dev/sdo on /data/osd.32 type xfs (rw,noatime)
/dev/sdp on /data/osd.33 type xfs (rw,noatime)
/dev/sdq on /data/osd.34 type xfs (rw,noatime)
/dev/sdr on /data/osd.35 type xfs (rw,noatime)
/dev/sds on /data/osd.36 type xfs (rw,noatime)
/dev/sdt on /data/osd.37 type xfs (rw,noatime)
--- RX37-4c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-4 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856432 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdd 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sde 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdf 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdg 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdh 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdi 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdj 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdk 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 389173798240256
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     30 C
  Blocks sent to initiator = 286249688498176
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     35 C
  Blocks sent to initiator = 220455000604672
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     38 C
  Blocks sent to initiator = 223169319272448
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     31 C
  Blocks sent to initiator = 232096593346560
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 264802534424576
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     27 C
  Blocks sent to initiator = 288896512425984
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     32 C
  Blocks sent to initiator = 282331621359616
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdd on /data/osd.40 type xfs (rw,noatime)
/dev/sde on /data/osd.41 type xfs (rw,noatime)
/dev/sdf on /data/osd.42 type xfs (rw,noatime)
/dev/sdg on /data/osd.43 type xfs (rw,noatime)
/dev/sdh on /data/osd.44 type xfs (rw,noatime)
/dev/sdi on /data/osd.45 type xfs (rw,noatime)
/dev/sdj on /data/osd.46 type xfs (rw,noatime)
/dev/sdk on /data/osd.47 type xfs (rw,noatime)
--- RX37-5c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-5 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       74226012 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdu 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdv 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 247461838848000
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     38 C
  Blocks sent to initiator = 231320898764800
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     41 C
  Blocks sent to initiator = 290086906232832
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     32 C
  Blocks sent to initiator = 287719053852672
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     33 C
  Blocks sent to initiator = 243922265702400
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     35 C
  Blocks sent to initiator = 272285122428928
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     40 C
  Blocks sent to initiator = 279561266790400
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     29 C
  Blocks sent to initiator = 247978778427392
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdo on /data/osd.50 type xfs (rw,noatime)
/dev/sdp on /data/osd.51 type xfs (rw,noatime)
/dev/sdq on /data/osd.52 type xfs (rw,noatime)
/dev/sdr on /data/osd.53 type xfs (rw,noatime)
/dev/sds on /data/osd.54 type xfs (rw,noatime)
/dev/sdt on /data/osd.55 type xfs (rw,noatime)
/dev/sdu on /data/osd.56 type xfs (rw,noatime)
/dev/sdv on /data/osd.57 type xfs (rw,noatime)
--- RX37-6c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-6 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       32856344 kB
Disk /dev/ram0: 2048 MB, 2048000000 bytes
Disk /dev/ram1: 2048 MB, 2048000000 bytes
Disk /dev/ram2: 2048 MB, 2048000000 bytes
Disk /dev/ram3: 2048 MB, 2048000000 bytes
Disk /dev/ram4: 2048 MB, 2048000000 bytes
Disk /dev/ram5: 2048 MB, 2048000000 bytes
Disk /dev/ram6: 2048 MB, 2048000000 bytes
Disk /dev/ram7: 2048 MB, 2048000000 bytes
[10:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdn 
[10:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdo 
[10:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdp 
[10:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdq 
[11:0:0:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdr 
[11:0:1:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sds 
[11:0:2:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdt 
[11:0:3:0]   disk    INTEL(R)  SSD 910 200GB   a411  /dev/sdu 
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     41 C
  Blocks sent to initiator = 259148495192064
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     36 C
  Blocks sent to initiator = 250183472381952
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     43 C
  Blocks sent to initiator = 232864704626688
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     46 C
  Blocks sent to initiator = 313614921629696
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     37 C
  Blocks sent to initiator = 269851218149376
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     34 C
  Blocks sent to initiator = 278551060283392
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     43 C
  Blocks sent to initiator = 267839076302848
Device: INTEL(R)  SSD 910 200GB   Version: a411
Current Drive Temperature:     39 C
  Blocks sent to initiator = 233988811653120
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
/dev/sdn on /data/osd.60 type xfs (rw,noatime)
/dev/sdo on /data/osd.61 type xfs (rw,noatime)
/dev/sdp on /data/osd.62 type xfs (rw,noatime)
/dev/sdq on /data/osd.63 type xfs (rw,noatime)
/dev/sdr on /data/osd.64 type xfs (rw,noatime)
/dev/sds on /data/osd.65 type xfs (rw,noatime)
/dev/sdt on /data/osd.66 type xfs (rw,noatime)
/dev/sdu on /data/osd.67 type xfs (rw,noatime)
--- RX37-7c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-7 3.0.36-10-default #1 SMP Mon Jul 9 14:42:03 UTC 2012 (595894d) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 1.20 GHz (asserted by call to hardware).
MemTotal:       32856344 kB
optimal_io_size: 4194304
4194304
4194304
scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
noop deadline [cfq] 
--- RX37-8c --------------------------------------------------------------------
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
Linux RX37-8 3.0.36-16-default #1 SMP Wed Jul 18 00:18:54 UTC 2012 (544e41f) x86_64 x86_64 x86_64 GNU/Linux

model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Logial CPUs: 12
  current CPU frequency is 2.30 GHz (asserted by call to hardware).
MemTotal:       65952088 kB
optimal_io_size: scheduler:       [noop] deadline cfq 
[noop] deadline cfq 
[noop] deadline cfq 
--------------------------------------------------------------------------------

dumped osdmap epoch 19
epoch 19
fsid 31dc8e8c-45cb-4b94-b581-a9258964f1a6
created 2012-08-29 22:08:58.870313
modifed 2012-08-29 22:09:50.084564
flags 

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 4352 pgp_num 4352 last_change 1 owner 0
pool 3 'pbench' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 768 pgp_num 768 last_change 18 owner 0

max_osd 68
osd.30 up   in  weight 1 up_from 3 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6800/24876 192.168.114.52:6800/24876 192.168.114.52:6801/24876 exists,up 0a9a6db3-1c0d-4d66-ac99-bd900076c42c
osd.31 up   in  weight 1 up_from 3 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6801/25090 192.168.114.52:6802/25090 192.168.114.52:6803/25090 exists,up 0adab61b-c1c3-479f-b58e-42bec92bd5b0
osd.32 up   in  weight 1 up_from 3 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6802/25276 192.168.114.52:6804/25276 192.168.114.52:6805/25276 exists,up 331bf096-d785-4ae8-b790-d746a0abb694
osd.33 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6803/25464 192.168.114.52:6806/25464 192.168.114.52:6807/25464 exists,up a1f9ea5b-e0db-474c-b7bc-6cb3d3a213a4
osd.34 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6804/25650 192.168.114.52:6808/25650 192.168.114.52:6809/25650 exists,up dcbe68e7-fef3-430d-a857-560db28de27f
osd.35 up   in  weight 1 up_from 2 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6805/25838 192.168.114.52:6810/25838 192.168.114.52:6811/25838 exists,up ab1589d0-e725-4484-8f5d-f65bc5c64643
osd.36 up   in  weight 1 up_from 3 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6806/26026 192.168.114.52:6812/26026 192.168.114.52:6813/26026 exists,up 2eea079f-bcfe-48a4-abb5-a15c7daf80ba
osd.37 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.52:6807/26218 192.168.114.52:6814/26218 192.168.114.52:6815/26218 exists,up 9822d872-79a6-4cd3-898f-2e905fbce44a
osd.40 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6800/18525 192.168.114.53:6800/18525 192.168.114.53:6801/18525 exists,up 0f0c61ea-4d78-429c-9928-b3422ad2dec7
osd.41 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6801/18750 192.168.114.53:6802/18750 192.168.114.53:6803/18750 exists,up 3935c6a7-61ff-4c97-88b9-472051ba8b6c
osd.42 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6802/18946 192.168.114.53:6804/18946 192.168.114.53:6805/18946 exists,up 3efc6383-5097-4e95-9af2-e0e7bc9ddc10
osd.43 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6803/19154 192.168.114.53:6806/19154 192.168.114.53:6807/19154 exists,up cdb8cf82-077b-40c2-adbc-fae29ba41645
osd.44 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6804/19350 192.168.114.53:6808/19350 192.168.114.53:6809/19350 exists,up 5ab69e45-a73a-4cd4-9837-2d54fb4ea4ec
osd.45 up   in  weight 1 up_from 4 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6805/19546 192.168.114.53:6810/19546 192.168.114.53:6811/19546 exists,up ec3d2118-6f46-4ef8-a431-553710f33a18
osd.46 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6806/19766 192.168.114.53:6812/19766 192.168.114.53:6813/19766 exists,up dcd94df3-b679-46a6-b670-5269a29913c1
osd.47 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.53:6807/19968 192.168.114.53:6814/19968 192.168.114.53:6815/19968 exists,up 41019d97-c4f3-4c8d-9189-bae642c31678
osd.50 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6800/3848 192.168.114.54:6800/3848 192.168.114.54:6801/3848 exists,up 0b9ebe8e-9cb8-440d-948e-d4c8aa16b407
osd.51 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6801/4061 192.168.114.54:6802/4061 192.168.114.54:6803/4061 exists,up 3c2e8031-d01d-4bf9-965e-1b77563d5f8f
osd.52 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6802/4248 192.168.114.54:6804/4248 192.168.114.54:6805/4248 exists,up 4d641c3c-0a7a-4b20-b047-9042b61685bb
osd.53 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6803/4446 192.168.114.54:6806/4446 192.168.114.54:6807/4446 exists,up e335a6e9-9c32-48c6-8f15-11aa84a6287d
osd.54 up   in  weight 1 up_from 5 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6804/4632 192.168.114.54:6808/4632 192.168.114.54:6809/4632 exists,up 16f3955c-9eee-442b-86d8-cbbc5938efbf
osd.55 up   in  weight 1 up_from 6 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6805/4836 192.168.114.54:6810/4836 192.168.114.54:6811/4836 exists,up 83e59145-9ff8-4c0b-b066-2b2e4e9c9953
osd.56 up   in  weight 1 up_from 6 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6806/5029 192.168.114.54:6812/5029 192.168.114.54:6813/5029 exists,up dfdeb186-5c96-4466-b4d3-5f32fa712792
osd.57 up   in  weight 1 up_from 7 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.54:6807/5351 192.168.114.54:6814/5351 192.168.114.54:6815/5351 exists,up adf7a484-b0f1-4bf7-a8e7-2c1e64dfb77f
osd.60 up   in  weight 1 up_from 7 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6800/31038 192.168.114.55:6800/31038 192.168.114.55:6801/31038 exists,up e9b949c8-1b47-4749-9408-1e9f7b89b0e6
osd.61 up   in  weight 1 up_from 8 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6801/31257 192.168.114.55:6802/31257 192.168.114.55:6803/31257 exists,up 19fcad53-d951-4645-a6d5-7dad1deba6fb
osd.62 up   in  weight 1 up_from 8 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6802/31449 192.168.114.55:6804/31449 192.168.114.55:6805/31449 exists,up 7e98db0e-2ae2-473d-9b03-798ec472b29b
osd.63 up   in  weight 1 up_from 9 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6803/31641 192.168.114.55:6806/31641 192.168.114.55:6807/31641 exists,up 9abc714c-06e4-40ba-8afe-8465209e0272
osd.64 up   in  weight 1 up_from 9 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6804/31937 192.168.114.55:6808/31937 192.168.114.55:6809/31937 exists,up 6a20e4b1-d1e9-4f69-b903-b403136ddb1d
osd.65 up   in  weight 1 up_from 10 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6805/32175 192.168.114.55:6810/32175 192.168.114.55:6811/32175 exists,up e95ad5b2-6866-4161-8060-781a31d7ece2
osd.66 up   in  weight 1 up_from 10 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6806/32487 192.168.114.55:6812/32487 192.168.114.55:6813/32487 exists,up f3126979-ecd6-45de-b0bf-54cb2b0af042
osd.67 up   in  weight 1 up_from 11 up_thru 18 down_at 0 last_clean_interval [0,0) 192.168.113.55:6807/32679 192.168.114.55:6814/32679 192.168.114.55:6815/32679 exists,up 37d3f121-b6f4-4c6f-ac9b-30533e8fa60a



ceph.conf
---content---
# global
[global]
	# enable secure authentication
	auth supported = none

        # allow ourselves to open a lot of files
        #max open files = 1100000
        max open files = 131072

        # set log file
        log file = /ceph/log/$name.log
        # log_to_syslog = true        # uncomment this line to log to syslog

        # set up pid files
        pid file = /var/run/ceph/$name.pid

        # If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible
        #ms bind ipv6 = true
	public network = 192.168.113.0/24
	cluster network = 192.168.114.0/24

# monitors
#  You need at least one.  You need at least three if you want to
#  tolerate any node failures.  Always create an odd number.
[mon]
        mon data = /ceph/$name

        # If you are using for example the RADOS Gateway and want to have your newly created
        # pools a higher replication level, you can set a default
        #osd pool default size = 3

        # You can also specify a CRUSH rule for new pools
        # Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
        #osd pool default crush rule = 0

        # Timing is critical for monitors, but if you want to allow the clocks to drift a
        # bit more, you can specify the max drift.
        #mon clock drift allowed = 1

        # Tell the monitor to backoff from this warning for 30 seconds
        #mon clock drift warn backoff = 30

	# logging, for debugging monitor crashes, in order of
	# their likelihood of being helpful :)
	#debug ms = 1
	#debug mon = 20
	#debug paxos = 20
	#debug auth = 20
	debug optracker = 0

[mon.0]
	host = RX37-3c
	mon addr = 192.168.113.52:6789
[mon.1]
	host = RX37-7c
	mon addr = 192.168.113.56:6789
[mon.2]
	host = RX37-8c
	mon addr = 192.168.113.57:6789
	
# mds
#  You need at least one.  Define two to get a standby.
[mds]
#        mds data = /ceph/$name
	# where the mds keeps it's secret encryption keys
	#keyring = /data/keyring.$name

	# mds logging to debug issues.
	#debug ms = 1
	#debug mds = 20
	debug optracker = 0

[mds.0]
        host = RX37-8c

# osd
#  You need at least one.  Two if you want data to be replicated.
#  Define as many as you like.
[osd]
	# This is where the btrfs volume will be mounted.
	osd data = /data/$name

#        journal dio = true
#        osd op threads = 24
#        osd disk threads = 24
#        filestore op threads = 6
#        filestore queue max ops = 24
	filestore max sync interval = 30
	filestore min sync interval = 29
	filestore flusher = false
	filestore queue max ops = 10000

	# Ideally, make this a separate disk or partition.  A few
 	# hundred MB should be enough; more if you have fast or many
 	# disks.  You can use a file under the osd data dir if need be
 	# (e.g. /data/$name/journal), but it will be slower than a
 	# separate disk or partition.

        # This is an example of a file-based journal.
	# osd journal = /ceph/$name/journal
	# osd journal size = 2048 
	# journal size, in megabytes

        # If you want to run the journal on a tmpfs, disable DirectIO
        #journal dio = false

        # You can change the number of recovery operations to speed up recovery
        # or slow it down if your machines can't handle it
        # osd recovery max active = 3

	# osd logging to debug osd issues, in order of likelihood of being
	# helpful
	#debug ms = 1
	#debug osd = 20
	#debug filestore = 20
	#debug journal = 20
	debug optracker = 0
	fstype = xfs

[osd.30]
	host = RX37-3c
	devs = /dev/sdm
	osd journal = /dev/ram0
[osd.31]
	host = RX37-3c
	devs = /dev/sdn
	osd journal = /dev/ram1
[osd.32]
	host = RX37-3c
	devs = /dev/sdo
	osd journal = /dev/ram2
[osd.33]
	host = RX37-3c
	devs = /dev/sdp
	osd journal = /dev/ram3
[osd.34]
	host = RX37-3c
	devs = /dev/sdq
	osd journal = /dev/ram4
[osd.35]
	host = RX37-3c
	devs = /dev/sdr
	osd journal = /dev/ram5
[osd.36]
	host = RX37-3c
	devs = /dev/sds
	osd journal = /dev/ram6
[osd.37]
	host = RX37-3c
	devs = /dev/sdt
	osd journal = /dev/ram7
[osd.40]
	host = RX37-4c
	devs = /dev/sdd
	osd journal = /dev/ram0
[osd.41]
	host = RX37-4c
	devs = /dev/sde
	osd journal = /dev/ram1
[osd.42]
	host = RX37-4c
	devs = /dev/sdf
	osd journal = /dev/ram2
[osd.43]
	host = RX37-4c
	devs = /dev/sdg
	osd journal = /dev/ram3
[osd.44]
	host = RX37-4c
	devs = /dev/sdh
	osd journal = /dev/ram4
[osd.45]
	host = RX37-4c
	devs = /dev/sdi
	osd journal = /dev/ram5
[osd.46]
	host = RX37-4c
	devs = /dev/sdj
	osd journal = /dev/ram6
[osd.47]
	host = RX37-4c
	devs = /dev/sdk
	osd journal = /dev/ram7
[osd.50]
	host = RX37-5c
	devs = /dev/sdo
	osd journal = /dev/ram0
[osd.51]
	host = RX37-5c
	devs = /dev/sdp
	osd journal = /dev/ram1
[osd.52]
	host = RX37-5c
	devs = /dev/sdq
	osd journal = /dev/ram2
[osd.53]
	host = RX37-5c
	devs = /dev/sdr
	osd journal = /dev/ram3
[osd.54]
	host = RX37-5c
	devs = /dev/sds
	osd journal = /dev/ram4
[osd.55]
	host = RX37-5c
	devs = /dev/sdt
	osd journal = /dev/ram5
[osd.56]
	host = RX37-5c
	devs = /dev/sdu
	osd journal = /dev/ram6
[osd.57]
	host = RX37-5c
	devs = /dev/sdv
	osd journal = /dev/ram7
[osd.60]
	host = RX37-6c
	devs = /dev/sdn
	osd journal = /dev/ram0
[osd.61]
	host = RX37-6c
	devs = /dev/sdo
	osd journal = /dev/ram1
[osd.62]
	host = RX37-6c
	devs = /dev/sdp
	osd journal = /dev/ram2
[osd.63]
	host = RX37-6c
	devs = /dev/sdq
	osd journal = /dev/ram3
[osd.64]
	host = RX37-6c
	devs = /dev/sdr
	osd journal = /dev/ram4
[osd.65]
	host = RX37-6c
	devs = /dev/sds
	osd journal = /dev/ram5
[osd.66]
	host = RX37-6c
	devs = /dev/sdt
	osd journal = /dev/ram6
[osd.67]
	host = RX37-6c
	devs = /dev/sdu
	osd journal = /dev/ram7
	devs = /dev/sdc

[client.01]
	client hostname = RX37-7c


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 15:33         ` Dieter Kasper
@ 2012-08-30 15:46           ` Alexandre DERUMIER
  2012-08-30 16:02             ` Dieter Kasper
  0 siblings, 1 reply; 31+ messages in thread
From: Alexandre DERUMIER @ 2012-08-30 15:46 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: ceph-devel

Thanks

>> 8x SSD, 200GB each 

20000 iops seem pretty low,no ?


for @intank:

Is their a bottleneck somewhere in ceph ?

I said that, because I would like to know if it's scale by adding new nodes.

Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)


----- Mail original ----- 

De: "Dieter Kasper" <d.kasper@kabelmail.de> 
À: "Alexandre DERUMIER" <aderumier@odiso.com> 
Cc: ceph-devel@vger.kernel.org 
Envoyé: Jeudi 30 Août 2012 17:33:42 
Objet: Re: RBD performance - tuning hints 

On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
> Thanks for the report ! 
> 
> vs your first benchmark, it's with RBD 4M or 64K ? 
with 4MB (see attached config info) 

Cheers, 
-Dieter 

> 
> (how much ssd by node?) 
8x SSD, 200GB each 

> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> Cc: ceph-devel@vger.kernel.org 
> Envoyé: Jeudi 30 Août 2012 16:56:34 
> Objet: Re: RBD performance - tuning hints 
> 
> Hi Alexandre, 
> 
> with the 4 filestore parameter below some fio values could be increased: 
> filestore max sync interval = 30 
> filestore min sync interval = 29 
> filestore flusher = false 
> filestore queue max ops = 10000 
> 
> ###### IOPS 
> fio_read_4k_64: 9373 
> fio_read_4k_128: 9939 
> fio_randwrite_8k_16: 12376 
> fio_randwrite_4k_16: 13315 
> fio_randwrite_512_32: 13660 
> fio_randwrite_8k_32: 17318 
> fio_randwrite_4k_32: 18057 
> fio_randwrite_8k_64: 19693 
> fio_randwrite_512_64: 20015 <<< 
> fio_randwrite_4k_64: 20024 <<< 
> fio_randwrite_8k_128: 20547 <<< 
> fio_randwrite_4k_128: 20839 <<< 
> fio_randwrite_512_128: 21417 <<< 
> fio_randread_8k_128: 48872 
> fio_randread_4k_128: 50002 
> fio_randread_512_128: 51202 
> 
> ###### MB/s 
> fio_randread_2m_32: 628 
> fio_read_4m_64: 630 
> fio_randread_8m_32: 633 
> fio_read_2m_32: 637 
> fio_read_4m_16: 640 
> fio_randread_4m_16: 652 
> fio_write_2m_32: 660 
> fio_randread_4m_32: 677 
> fio_read_4m_32: 678 
> (...) 
> fio_write_4m_64: 771 
> fio_randwrite_2m_64: 789 
> fio_write_8m_128: 796 
> fio_write_4m_32: 802 
> fio_randwrite_4m_128: 807 <<< 
> fio_randwrite_2m_32: 811 <<< 
> fio_write_2m_128: 833 <<< 
> fio_write_8m_64: 901 <<< 
> 
> Best Regards, 
> -Dieter 
> 
> 
> On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> > Nice results ! 
> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> > 
> > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > I think you can try to tune these values 
> > 
> > filestore max sync interval = 30 
> > filestore min sync interval = 29 
> > filestore flusher = false 
> > filestore queue max ops = 10000 
> > 
> > 
> > 
> > ----- Mail original ----- 
> > 
> > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > À: ceph-devel@vger.kernel.org 
> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> > Envoyé: Mardi 28 Août 2012 19:48:42 
> > Objet: RBD performance - tuning hints 
> > 
> > Hi, 
> > 
> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> > I can observe a pretty nice rados bench performance 
> > (see bench-rados.txt for details): 
> > 
> > Bandwidth (MB/sec): 961.710 
> > Max bandwidth (MB/sec): 1040 
> > Min bandwidth (MB/sec): 772 
> > 
> > 
> > Also the bandwidth performance generated with 
> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> > 
> > .... is acceptable, e.g. 
> > fio_write_4m_16 795 MB/s 
> > fio_randwrite_8m_128 717 MB/s 
> > fio_randwrite_8m_16 714 MB/s 
> > fio_randwrite_2m_32 692 MB/s 
> > 
> > 
> > But, the write IOPS seems to be limited around 19k ... 
> > RBD 4M 64k (= optimal_io_size) 
> > fio_randread_512_128 53286 55925 
> > fio_randread_4k_128 51110 44382 
> > fio_randread_8k_128 30854 29938 
> > fio_randwrite_512_128 18888 2386 
> > fio_randwrite_512_64 18844 2582 
> > fio_randwrite_8k_64 17350 2445 
> > (...) 
> > fio_read_4k_128 10073 53151 
> > fio_read_4k_64 9500 39757 
> > fio_read_4k_32 9220 23650 
> > (...) 
> > fio_read_4k_16 9122 14322 
> > fio_write_4k_128 2190 14306 
> > fio_read_8k_32 706 13894 
> > fio_write_4k_64 2197 12297 
> > fio_write_8k_64 3563 11705 
> > fio_write_8k_128 3444 11219 
> > 
> > 
> > Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> > 
> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > 
> > 
> > Kind Regards, 
> > -Dieter 
> > 
> > 
> > 
> > -- 
> > 
> > -- 
> > 
> > 
> > 
> > 
> > 
> > Alexandre D e rumier 
> > 
> > Ingénieur Systèmes et Réseaux 
> > 
> > 
> > Fixe : 03 20 68 88 85 
> > 
> > Fax : 03 20 68 90 88 
> > 
> > 
> > 45 Bvd du Général Leclerc 59100 Roubaix 
> > 12 rue Marivaux 75002 Paris 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@vger.kernel.org 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> 



-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 15:46           ` Alexandre DERUMIER
@ 2012-08-30 16:02             ` Dieter Kasper
  2012-08-30 16:12               ` Alexandre DERUMIER
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-30 16:02 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel@vger.kernel.org, Andreas Bluemle

On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote:
> Thanks
> 
> >> 8x SSD, 200GB each 
> 
> 20000 iops seem pretty low,no ?
well, you have to compare
- pure a SSD (via PCIe or SAS-6G)	vs.
- Ceph-Journal, which goes 2x over 10GbE with IP
  Client -> primary-copy -> 2nd-copy
  (= redundancy over Ethernet distance)

I'm curious about the answer from Inktank,

-Dieter

> 
> 
> for @intank:
> 
> Is their a bottleneck somewhere in ceph ?
Maybe "SimpleMessenger dispatching: cause of performance problems?"
from Thu, 16 Aug 2012 18:08:39 +0200
by <andreas.bluemle@itxperts.de>
can be an answer.
Especially if a small number of OSDs is used.

> 
> I said that, because I would like to know if it's scale by adding new nodes.
> 
> Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)
> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> Cc: ceph-devel@vger.kernel.org 
> Envoyé: Jeudi 30 Août 2012 17:33:42 
> Objet: Re: RBD performance - tuning hints 
> 
> On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
> > Thanks for the report ! 
> > 
> > vs your first benchmark, it's with RBD 4M or 64K ? 
> with 4MB (see attached config info) 
> 
> Cheers, 
> -Dieter 
> 
> > 
> > (how much ssd by node?) 
> 8x SSD, 200GB each 
> 
> > 
> > 
> > 
> > ----- Mail original ----- 
> > 
> > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> > Cc: ceph-devel@vger.kernel.org 
> > Envoyé: Jeudi 30 Août 2012 16:56:34 
> > Objet: Re: RBD performance - tuning hints 
> > 
> > Hi Alexandre, 
> > 
> > with the 4 filestore parameter below some fio values could be increased: 
> > filestore max sync interval = 30 
> > filestore min sync interval = 29 
> > filestore flusher = false 
> > filestore queue max ops = 10000 
> > 
> > ###### IOPS 
> > fio_read_4k_64: 9373 
> > fio_read_4k_128: 9939 
> > fio_randwrite_8k_16: 12376 
> > fio_randwrite_4k_16: 13315 
> > fio_randwrite_512_32: 13660 
> > fio_randwrite_8k_32: 17318 
> > fio_randwrite_4k_32: 18057 
> > fio_randwrite_8k_64: 19693 
> > fio_randwrite_512_64: 20015 <<< 
> > fio_randwrite_4k_64: 20024 <<< 
> > fio_randwrite_8k_128: 20547 <<< 
> > fio_randwrite_4k_128: 20839 <<< 
> > fio_randwrite_512_128: 21417 <<< 
> > fio_randread_8k_128: 48872 
> > fio_randread_4k_128: 50002 
> > fio_randread_512_128: 51202 
> > 
> > ###### MB/s 
> > fio_randread_2m_32: 628 
> > fio_read_4m_64: 630 
> > fio_randread_8m_32: 633 
> > fio_read_2m_32: 637 
> > fio_read_4m_16: 640 
> > fio_randread_4m_16: 652 
> > fio_write_2m_32: 660 
> > fio_randread_4m_32: 677 
> > fio_read_4m_32: 678 
> > (...) 
> > fio_write_4m_64: 771 
> > fio_randwrite_2m_64: 789 
> > fio_write_8m_128: 796 
> > fio_write_4m_32: 802 
> > fio_randwrite_4m_128: 807 <<< 
> > fio_randwrite_2m_32: 811 <<< 
> > fio_write_2m_128: 833 <<< 
> > fio_write_8m_64: 901 <<< 
> > 
> > Best Regards, 
> > -Dieter 
> > 
> > 
> > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> > > Nice results ! 
> > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> > > 
> > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > I think you can try to tune these values 
> > > 
> > > filestore max sync interval = 30 
> > > filestore min sync interval = 29 
> > > filestore flusher = false 
> > > filestore queue max ops = 10000 
> > > 
> > > 
> > > 
> > > ----- Mail original ----- 
> > > 
> > > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > > À: ceph-devel@vger.kernel.org 
> > > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> > > Envoyé: Mardi 28 Août 2012 19:48:42 
> > > Objet: RBD performance - tuning hints 
> > > 
> > > Hi, 
> > > 
> > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> > > I can observe a pretty nice rados bench performance 
> > > (see bench-rados.txt for details): 
> > > 
> > > Bandwidth (MB/sec): 961.710 
> > > Max bandwidth (MB/sec): 1040 
> > > Min bandwidth (MB/sec): 772 
> > > 
> > > 
> > > Also the bandwidth performance generated with 
> > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> > > 
> > > .... is acceptable, e.g. 
> > > fio_write_4m_16 795 MB/s 
> > > fio_randwrite_8m_128 717 MB/s 
> > > fio_randwrite_8m_16 714 MB/s 
> > > fio_randwrite_2m_32 692 MB/s 
> > > 
> > > 
> > > But, the write IOPS seems to be limited around 19k ... 
> > > RBD 4M 64k (= optimal_io_size) 
> > > fio_randread_512_128 53286 55925 
> > > fio_randread_4k_128 51110 44382 
> > > fio_randread_8k_128 30854 29938 
> > > fio_randwrite_512_128 18888 2386 
> > > fio_randwrite_512_64 18844 2582 
> > > fio_randwrite_8k_64 17350 2445 
> > > (...) 
> > > fio_read_4k_128 10073 53151 
> > > fio_read_4k_64 9500 39757 
> > > fio_read_4k_32 9220 23650 
> > > (...) 
> > > fio_read_4k_16 9122 14322 
> > > fio_write_4k_128 2190 14306 
> > > fio_read_8k_32 706 13894 
> > > fio_write_4k_64 2197 12297 
> > > fio_write_8k_64 3563 11705 
> > > fio_write_8k_128 3444 11219 
> > > 
> > > 
> > > Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> > > 
> > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > 
> > > 
> > > Kind Regards, 
> > > -Dieter 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > > -- 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Alexandre D e rumier 
> > > 
> > > Ingénieur Systèmes et Réseaux 
> > > 
> > > 
> > > Fixe : 03 20 68 88 85 
> > > 
> > > Fax : 03 20 68 90 88 
> > > 
> > > 
> > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > 12 rue Marivaux 75002 Paris 
> > > -- 
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > > the body of a message to majordomo@vger.kernel.org 
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > -- 
> > 
> > 
> > 
> > 
> > 
> > Alexandre D e rumier 
> > 
> > Ingénieur Systèmes et Réseaux 
> > 
> > 
> > Fixe : 03 20 68 88 85 
> > 
> > Fax : 03 20 68 90 88 
> > 
> > 
> > 45 Bvd du Général Leclerc 59100 Roubaix 
> > 12 rue Marivaux 75002 Paris 
> > 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 	
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 16:02             ` Dieter Kasper
@ 2012-08-30 16:12               ` Alexandre DERUMIER
  2012-08-30 16:16                 ` Josh Durgin
  2012-08-30 16:48                 ` Dieter Kasper
  0 siblings, 2 replies; 31+ messages in thread
From: Alexandre DERUMIER @ 2012-08-30 16:12 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: ceph-devel, Andreas Bluemle

>>well, you have to compare
>>- pure a SSD (via PCIe or SAS-6G)        vs.
>>- Ceph-Journal, which goes 2x over 10GbE with IP
>>  Client -> primary-copy -> 2nd-copy
>>  (= redundancy over Ethernet distance)

Sure but the first osd ack to the client,before replicating to the others osd.

Client -> primary-copy -> 2nd-copy
       <-ack
         primary-copy -> 2nd-copy
                      -> 3st-copy

Or I'm wrong ?


----- Mail original ----- 

De: "Dieter Kasper" <d.kasper@kabelmail.de> 
À: "Alexandre DERUMIER" <aderumier@odiso.com> 
Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de> 
Envoyé: Jeudi 30 Août 2012 18:02:05 
Objet: Re: RBD performance - tuning hints 

On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote: 
> Thanks 
> 
> >> 8x SSD, 200GB each 
> 
> 20000 iops seem pretty low,no ? 
well, you have to compare 
- pure a SSD (via PCIe or SAS-6G) vs. 
- Ceph-Journal, which goes 2x over 10GbE with IP 
Client -> primary-copy -> 2nd-copy 
(= redundancy over Ethernet distance) 

I'm curious about the answer from Inktank, 

-Dieter 

> 
> 
> for @intank: 
> 
> Is their a bottleneck somewhere in ceph ? 
Maybe "SimpleMessenger dispatching: cause of performance problems?" 
from Thu, 16 Aug 2012 18:08:39 +0200 
by <andreas.bluemle@itxperts.de> 
can be an answer. 
Especially if a small number of OSDs is used. 

> 
> I said that, because I would like to know if it's scale by adding new nodes. 
> 
> Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list) 
> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> Cc: ceph-devel@vger.kernel.org 
> Envoyé: Jeudi 30 Août 2012 17:33:42 
> Objet: Re: RBD performance - tuning hints 
> 
> On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
> > Thanks for the report ! 
> > 
> > vs your first benchmark, it's with RBD 4M or 64K ? 
> with 4MB (see attached config info) 
> 
> Cheers, 
> -Dieter 
> 
> > 
> > (how much ssd by node?) 
> 8x SSD, 200GB each 
> 
> > 
> > 
> > 
> > ----- Mail original ----- 
> > 
> > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> > Cc: ceph-devel@vger.kernel.org 
> > Envoyé: Jeudi 30 Août 2012 16:56:34 
> > Objet: Re: RBD performance - tuning hints 
> > 
> > Hi Alexandre, 
> > 
> > with the 4 filestore parameter below some fio values could be increased: 
> > filestore max sync interval = 30 
> > filestore min sync interval = 29 
> > filestore flusher = false 
> > filestore queue max ops = 10000 
> > 
> > ###### IOPS 
> > fio_read_4k_64: 9373 
> > fio_read_4k_128: 9939 
> > fio_randwrite_8k_16: 12376 
> > fio_randwrite_4k_16: 13315 
> > fio_randwrite_512_32: 13660 
> > fio_randwrite_8k_32: 17318 
> > fio_randwrite_4k_32: 18057 
> > fio_randwrite_8k_64: 19693 
> > fio_randwrite_512_64: 20015 <<< 
> > fio_randwrite_4k_64: 20024 <<< 
> > fio_randwrite_8k_128: 20547 <<< 
> > fio_randwrite_4k_128: 20839 <<< 
> > fio_randwrite_512_128: 21417 <<< 
> > fio_randread_8k_128: 48872 
> > fio_randread_4k_128: 50002 
> > fio_randread_512_128: 51202 
> > 
> > ###### MB/s 
> > fio_randread_2m_32: 628 
> > fio_read_4m_64: 630 
> > fio_randread_8m_32: 633 
> > fio_read_2m_32: 637 
> > fio_read_4m_16: 640 
> > fio_randread_4m_16: 652 
> > fio_write_2m_32: 660 
> > fio_randread_4m_32: 677 
> > fio_read_4m_32: 678 
> > (...) 
> > fio_write_4m_64: 771 
> > fio_randwrite_2m_64: 789 
> > fio_write_8m_128: 796 
> > fio_write_4m_32: 802 
> > fio_randwrite_4m_128: 807 <<< 
> > fio_randwrite_2m_32: 811 <<< 
> > fio_write_2m_128: 833 <<< 
> > fio_write_8m_64: 901 <<< 
> > 
> > Best Regards, 
> > -Dieter 
> > 
> > 
> > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> > > Nice results ! 
> > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> > > 
> > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > I think you can try to tune these values 
> > > 
> > > filestore max sync interval = 30 
> > > filestore min sync interval = 29 
> > > filestore flusher = false 
> > > filestore queue max ops = 10000 
> > > 
> > > 
> > > 
> > > ----- Mail original ----- 
> > > 
> > > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > > À: ceph-devel@vger.kernel.org 
> > > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> > > Envoyé: Mardi 28 Août 2012 19:48:42 
> > > Objet: RBD performance - tuning hints 
> > > 
> > > Hi, 
> > > 
> > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> > > I can observe a pretty nice rados bench performance 
> > > (see bench-rados.txt for details): 
> > > 
> > > Bandwidth (MB/sec): 961.710 
> > > Max bandwidth (MB/sec): 1040 
> > > Min bandwidth (MB/sec): 772 
> > > 
> > > 
> > > Also the bandwidth performance generated with 
> > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> > > 
> > > .... is acceptable, e.g. 
> > > fio_write_4m_16 795 MB/s 
> > > fio_randwrite_8m_128 717 MB/s 
> > > fio_randwrite_8m_16 714 MB/s 
> > > fio_randwrite_2m_32 692 MB/s 
> > > 
> > > 
> > > But, the write IOPS seems to be limited around 19k ... 
> > > RBD 4M 64k (= optimal_io_size) 
> > > fio_randread_512_128 53286 55925 
> > > fio_randread_4k_128 51110 44382 
> > > fio_randread_8k_128 30854 29938 
> > > fio_randwrite_512_128 18888 2386 
> > > fio_randwrite_512_64 18844 2582 
> > > fio_randwrite_8k_64 17350 2445 
> > > (...) 
> > > fio_read_4k_128 10073 53151 
> > > fio_read_4k_64 9500 39757 
> > > fio_read_4k_32 9220 23650 
> > > (...) 
> > > fio_read_4k_16 9122 14322 
> > > fio_write_4k_128 2190 14306 
> > > fio_read_8k_32 706 13894 
> > > fio_write_4k_64 2197 12297 
> > > fio_write_8k_64 3563 11705 
> > > fio_write_8k_128 3444 11219 
> > > 
> > > 
> > > Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> > > 
> > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > 
> > > 
> > > Kind Regards, 
> > > -Dieter 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > > -- 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Alexandre D e rumier 
> > > 
> > > Ingénieur Systèmes et Réseaux 
> > > 
> > > 
> > > Fixe : 03 20 68 88 85 
> > > 
> > > Fax : 03 20 68 90 88 
> > > 
> > > 
> > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > 12 rue Marivaux 75002 Paris 
> > > -- 
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > > the body of a message to majordomo@vger.kernel.org 
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > -- 
> > 
> > 
> > 
> > 
> > 
> > Alexandre D e rumier 
> > 
> > Ingénieur Systèmes et Réseaux 
> > 
> > 
> > Fixe : 03 20 68 88 85 
> > 
> > Fax : 03 20 68 90 88 
> > 
> > 
> > 45 Bvd du Général Leclerc 59100 Roubaix 
> > 12 rue Marivaux 75002 Paris 
> > 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 




-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 16:12               ` Alexandre DERUMIER
@ 2012-08-30 16:16                 ` Josh Durgin
  2012-08-31  7:46                   ` Alexandre DERUMIER
  2012-08-30 16:48                 ` Dieter Kasper
  1 sibling, 1 reply; 31+ messages in thread
From: Josh Durgin @ 2012-08-30 16:16 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Dieter Kasper, ceph-devel, Andreas Bluemle

On 08/30/2012 09:12 AM, Alexandre DERUMIER wrote:
>>> well, you have to compare
>>> - pure a SSD (via PCIe or SAS-6G)        vs.
>>> - Ceph-Journal, which goes 2x over 10GbE with IP
>>>   Client -> primary-copy -> 2nd-copy
>>>   (= redundancy over Ethernet distance)
>
> Sure but the first osd ack to the client,before replicating to the others osd.
>
> Client -> primary-copy -> 2nd-copy
>         <-ack
>           primary-copy -> 2nd-copy
>                        -> 3st-copy
>
> Or I'm wrong ?

RBD waits for the data to be on disk on all replicas. It's pretty easy
to relax this to in memory on all replicas, but there's no option for
that right now.

Josh

>
> ----- Mail original -----
>
> De: "Dieter Kasper" <d.kasper@kabelmail.de>
> À: "Alexandre DERUMIER" <aderumier@odiso.com>
> Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de>
> Envoyé: Jeudi 30 Août 2012 18:02:05
> Objet: Re: RBD performance - tuning hints
>
> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote:
>> Thanks
>>
>>>> 8x SSD, 200GB each
>>
>> 20000 iops seem pretty low,no ?
> well, you have to compare
> - pure a SSD (via PCIe or SAS-6G) vs.
> - Ceph-Journal, which goes 2x over 10GbE with IP
> Client -> primary-copy -> 2nd-copy
> (= redundancy over Ethernet distance)
>
> I'm curious about the answer from Inktank,
>
> -Dieter
>
>>
>>
>> for @intank:
>>
>> Is their a bottleneck somewhere in ceph ?
> Maybe "SimpleMessenger dispatching: cause of performance problems?"
> from Thu, 16 Aug 2012 18:08:39 +0200
> by <andreas.bluemle@itxperts.de>
> can be an answer.
> Especially if a small number of OSDs is used.
>
>>
>> I said that, because I would like to know if it's scale by adding new nodes.
>>
>> Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)
>>
>>
>> ----- Mail original -----
>>
>> De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> À: "Alexandre DERUMIER" <aderumier@odiso.com>
>> Cc: ceph-devel@vger.kernel.org
>> Envoyé: Jeudi 30 Août 2012 17:33:42
>> Objet: Re: RBD performance - tuning hints
>>
>> On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote:
>>> Thanks for the report !
>>>
>>> vs your first benchmark, it's with RBD 4M or 64K ?
>> with 4MB (see attached config info)
>>
>> Cheers,
>> -Dieter
>>
>>>
>>> (how much ssd by node?)
>> 8x SSD, 200GB each
>>
>>>
>>>
>>>
>>> ----- Mail original -----
>>>
>>> De: "Dieter Kasper" <d.kasper@kabelmail.de>
>>> À: "Alexandre DERUMIER" <aderumier@odiso.com>
>>> Cc: ceph-devel@vger.kernel.org
>>> Envoyé: Jeudi 30 Août 2012 16:56:34
>>> Objet: Re: RBD performance - tuning hints
>>>
>>> Hi Alexandre,
>>>
>>> with the 4 filestore parameter below some fio values could be increased:
>>> filestore max sync interval = 30
>>> filestore min sync interval = 29
>>> filestore flusher = false
>>> filestore queue max ops = 10000
>>>
>>> ###### IOPS
>>> fio_read_4k_64: 9373
>>> fio_read_4k_128: 9939
>>> fio_randwrite_8k_16: 12376
>>> fio_randwrite_4k_16: 13315
>>> fio_randwrite_512_32: 13660
>>> fio_randwrite_8k_32: 17318
>>> fio_randwrite_4k_32: 18057
>>> fio_randwrite_8k_64: 19693
>>> fio_randwrite_512_64: 20015 <<<
>>> fio_randwrite_4k_64: 20024 <<<
>>> fio_randwrite_8k_128: 20547 <<<
>>> fio_randwrite_4k_128: 20839 <<<
>>> fio_randwrite_512_128: 21417 <<<
>>> fio_randread_8k_128: 48872
>>> fio_randread_4k_128: 50002
>>> fio_randread_512_128: 51202
>>>
>>> ###### MB/s
>>> fio_randread_2m_32: 628
>>> fio_read_4m_64: 630
>>> fio_randread_8m_32: 633
>>> fio_read_2m_32: 637
>>> fio_read_4m_16: 640
>>> fio_randread_4m_16: 652
>>> fio_write_2m_32: 660
>>> fio_randread_4m_32: 677
>>> fio_read_4m_32: 678
>>> (...)
>>> fio_write_4m_64: 771
>>> fio_randwrite_2m_64: 789
>>> fio_write_8m_128: 796
>>> fio_write_4m_32: 802
>>> fio_randwrite_4m_128: 807 <<<
>>> fio_randwrite_2m_32: 811 <<<
>>> fio_write_2m_128: 833 <<<
>>> fio_write_8m_64: 901 <<<
>>>
>>> Best Regards,
>>> -Dieter
>>>
>>>
>>> On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
>>>> Nice results !
>>>> (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
>>>> I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>>>>
>>>>>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>>>> I think you can try to tune these values
>>>>
>>>> filestore max sync interval = 30
>>>> filestore min sync interval = 29
>>>> filestore flusher = false
>>>> filestore queue max ops = 10000
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>>
>>>> De: "Dieter Kasper" <d.kasper@kabelmail.de>
>>>> À: ceph-devel@vger.kernel.org
>>>> Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
>>>> Envoyé: Mardi 28 Août 2012 19:48:42
>>>> Objet: RBD performance - tuning hints
>>>>
>>>> Hi,
>>>>
>>>> on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
>>>> I can observe a pretty nice rados bench performance
>>>> (see bench-rados.txt for details):
>>>>
>>>> Bandwidth (MB/sec): 961.710
>>>> Max bandwidth (MB/sec): 1040
>>>> Min bandwidth (MB/sec): 772
>>>>
>>>>
>>>> Also the bandwidth performance generated with
>>>> fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>>>>
>>>> .... is acceptable, e.g.
>>>> fio_write_4m_16 795 MB/s
>>>> fio_randwrite_8m_128 717 MB/s
>>>> fio_randwrite_8m_16 714 MB/s
>>>> fio_randwrite_2m_32 692 MB/s
>>>>
>>>>
>>>> But, the write IOPS seems to be limited around 19k ...
>>>> RBD 4M 64k (= optimal_io_size)
>>>> fio_randread_512_128 53286 55925
>>>> fio_randread_4k_128 51110 44382
>>>> fio_randread_8k_128 30854 29938
>>>> fio_randwrite_512_128 18888 2386
>>>> fio_randwrite_512_64 18844 2582
>>>> fio_randwrite_8k_64 17350 2445
>>>> (...)
>>>> fio_read_4k_128 10073 53151
>>>> fio_read_4k_64 9500 39757
>>>> fio_read_4k_32 9220 23650
>>>> (...)
>>>> fio_read_4k_16 9122 14322
>>>> fio_write_4k_128 2190 14306
>>>> fio_read_8k_32 706 13894
>>>> fio_write_4k_64 2197 12297
>>>> fio_write_8k_64 3563 11705
>>>> fio_write_8k_128 3444 11219
>>>>
>>>>
>>>> Any hints for tuning the IOPS (read and/or write) would be appreciated.
>>>>
>>>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>>>>
>>>>
>>>> Kind Regards,
>>>> -Dieter
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Alexandre D e rumier
>>>>
>>>> Ingénieur Systèmes et Réseaux
>>>>
>>>>
>>>> Fixe : 03 20 68 88 85
>>>>
>>>> Fax : 03 20 68 90 88
>>>>
>>>>
>>>> 45 Bvd du Général Leclerc 59100 Roubaix
>>>> 12 rue Marivaux 75002 Paris
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>>> --
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>> Alexandre D e rumier
>>>
>>> Ingénieur Systèmes et Réseaux
>>>
>>>
>>> Fixe : 03 20 68 88 85
>>>
>>> Fax : 03 20 68 90 88
>>>
>>>
>>> 45 Bvd du Général Leclerc 59100 Roubaix
>>> 12 rue Marivaux 75002 Paris
>>>
>>
>>
>>
>> --
>>
>> --
>>
>>
>>
>>
>>
>> Alexandre D e rumier
>>
>> Ingénieur Systèmes et Réseaux
>>
>>
>> Fixe : 03 20 68 88 85
>>
>> Fax : 03 20 68 90 88
>>
>>
>> 45 Bvd du Général Leclerc 59100 Roubaix
>> 12 rue Marivaux 75002 Paris
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 16:12               ` Alexandre DERUMIER
  2012-08-30 16:16                 ` Josh Durgin
@ 2012-08-30 16:48                 ` Dieter Kasper
  2012-08-30 18:10                   ` Gregory Farnum
  1 sibling, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-30 16:48 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel@vger.kernel.org, Andreas Bluemle

[-- Attachment #1: Type: text/plain, Size: 10043 bytes --]

On Thu, Aug 30, 2012 at 06:12:11PM +0200, Alexandre DERUMIER wrote:
> >>well, you have to compare
> >>- pure a SSD (via PCIe or SAS-6G)        vs.
> >>- Ceph-Journal, which goes 2x over 10GbE with IP
> >>  Client -> primary-copy -> 2nd-copy
> >>  (= redundancy over Ethernet distance)
> 
> Sure but the first osd ack to the client,before replicating to the others osd.
no 

> 
> Client -> primary-copy -> 2nd-copy
>        <-ack
>          primary-copy -> 2nd-copy
>                       -> 3st-copy
> 
> Or I'm wrong ?
yes,
please have a look at the attached file: ceph-replication-acks.png
The client usually will continue on 'ACK' and not wait for the 'commit'.

BTW. all my journals are in RAM (/dev/ramX)
32x 2GB = 32GB of data with replica 2x

If "filestore min/max sync interval" is set to 99999999
data should 'never' be written to OSD
('never' at least during the tests if the written data is < 32GB)

In such a configuration only the Ceph-Code and the Interconnect (10GbE/IP) would be the brakeman.

Cheers,
-Dieter


> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de> 
> Envoyé: Jeudi 30 Août 2012 18:02:05 
> Objet: Re: RBD performance - tuning hints 
> 
> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote: 
> > Thanks 
> > 
> > >> 8x SSD, 200GB each 
> > 
> > 20000 iops seem pretty low,no ? 
> well, you have to compare 
> - pure a SSD (via PCIe or SAS-6G) vs. 
> - Ceph-Journal, which goes 2x over 10GbE with IP 
> Client -> primary-copy -> 2nd-copy 
> (= redundancy over Ethernet distance) 
> 
> I'm curious about the answer from Inktank, 
> 
> -Dieter 
> 
> > 
> > 
> > for @intank: 
> > 
> > Is their a bottleneck somewhere in ceph ? 
> Maybe "SimpleMessenger dispatching: cause of performance problems?" 
> from Thu, 16 Aug 2012 18:08:39 +0200 
> by <andreas.bluemle@itxperts.de> 
> can be an answer. 
> Especially if a small number of OSDs is used. 
> 
> > 
> > I said that, because I would like to know if it's scale by adding new nodes. 
> > 
> > Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list) 
> > 
> > 
> > ----- Mail original ----- 
> > 
> > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> > Cc: ceph-devel@vger.kernel.org 
> > Envoyé: Jeudi 30 Août 2012 17:33:42 
> > Objet: Re: RBD performance - tuning hints 
> > 
> > On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
> > > Thanks for the report ! 
> > > 
> > > vs your first benchmark, it's with RBD 4M or 64K ? 
> > with 4MB (see attached config info) 
> > 
> > Cheers, 
> > -Dieter 
> > 
> > > 
> > > (how much ssd by node?) 
> > 8x SSD, 200GB each 
> > 
> > > 
> > > 
> > > 
> > > ----- Mail original ----- 
> > > 
> > > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > > À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> > > Cc: ceph-devel@vger.kernel.org 
> > > Envoyé: Jeudi 30 Août 2012 16:56:34 
> > > Objet: Re: RBD performance - tuning hints 
> > > 
> > > Hi Alexandre, 
> > > 
> > > with the 4 filestore parameter below some fio values could be increased: 
> > > filestore max sync interval = 30 
> > > filestore min sync interval = 29 
> > > filestore flusher = false 
> > > filestore queue max ops = 10000 
> > > 
> > > ###### IOPS 
> > > fio_read_4k_64: 9373 
> > > fio_read_4k_128: 9939 
> > > fio_randwrite_8k_16: 12376 
> > > fio_randwrite_4k_16: 13315 
> > > fio_randwrite_512_32: 13660 
> > > fio_randwrite_8k_32: 17318 
> > > fio_randwrite_4k_32: 18057 
> > > fio_randwrite_8k_64: 19693 
> > > fio_randwrite_512_64: 20015 <<< 
> > > fio_randwrite_4k_64: 20024 <<< 
> > > fio_randwrite_8k_128: 20547 <<< 
> > > fio_randwrite_4k_128: 20839 <<< 
> > > fio_randwrite_512_128: 21417 <<< 
> > > fio_randread_8k_128: 48872 
> > > fio_randread_4k_128: 50002 
> > > fio_randread_512_128: 51202 
> > > 
> > > ###### MB/s 
> > > fio_randread_2m_32: 628 
> > > fio_read_4m_64: 630 
> > > fio_randread_8m_32: 633 
> > > fio_read_2m_32: 637 
> > > fio_read_4m_16: 640 
> > > fio_randread_4m_16: 652 
> > > fio_write_2m_32: 660 
> > > fio_randread_4m_32: 677 
> > > fio_read_4m_32: 678 
> > > (...) 
> > > fio_write_4m_64: 771 
> > > fio_randwrite_2m_64: 789 
> > > fio_write_8m_128: 796 
> > > fio_write_4m_32: 802 
> > > fio_randwrite_4m_128: 807 <<< 
> > > fio_randwrite_2m_32: 811 <<< 
> > > fio_write_2m_128: 833 <<< 
> > > fio_write_8m_64: 901 <<< 
> > > 
> > > Best Regards, 
> > > -Dieter 
> > > 
> > > 
> > > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> > > > Nice results ! 
> > > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> > > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> > > > 
> > > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > > I think you can try to tune these values 
> > > > 
> > > > filestore max sync interval = 30 
> > > > filestore min sync interval = 29 
> > > > filestore flusher = false 
> > > > filestore queue max ops = 10000 
> > > > 
> > > > 
> > > > 
> > > > ----- Mail original ----- 
> > > > 
> > > > De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> > > > À: ceph-devel@vger.kernel.org 
> > > > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
> > > > Envoyé: Mardi 28 Août 2012 19:48:42 
> > > > Objet: RBD performance - tuning hints 
> > > > 
> > > > Hi, 
> > > > 
> > > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> > > > I can observe a pretty nice rados bench performance 
> > > > (see bench-rados.txt for details): 
> > > > 
> > > > Bandwidth (MB/sec): 961.710 
> > > > Max bandwidth (MB/sec): 1040 
> > > > Min bandwidth (MB/sec): 772 
> > > > 
> > > > 
> > > > Also the bandwidth performance generated with 
> > > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> > > > 
> > > > .... is acceptable, e.g. 
> > > > fio_write_4m_16 795 MB/s 
> > > > fio_randwrite_8m_128 717 MB/s 
> > > > fio_randwrite_8m_16 714 MB/s 
> > > > fio_randwrite_2m_32 692 MB/s 
> > > > 
> > > > 
> > > > But, the write IOPS seems to be limited around 19k ... 
> > > > RBD 4M 64k (= optimal_io_size) 
> > > > fio_randread_512_128 53286 55925 
> > > > fio_randread_4k_128 51110 44382 
> > > > fio_randread_8k_128 30854 29938 
> > > > fio_randwrite_512_128 18888 2386 
> > > > fio_randwrite_512_64 18844 2582 
> > > > fio_randwrite_8k_64 17350 2445 
> > > > (...) 
> > > > fio_read_4k_128 10073 53151 
> > > > fio_read_4k_64 9500 39757 
> > > > fio_read_4k_32 9220 23650 
> > > > (...) 
> > > > fio_read_4k_16 9122 14322 
> > > > fio_write_4k_128 2190 14306 
> > > > fio_read_8k_32 706 13894 
> > > > fio_write_4k_64 2197 12297 
> > > > fio_write_8k_64 3563 11705 
> > > > fio_write_8k_128 3444 11219 
> > > > 
> > > > 
> > > > Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> > > > 
> > > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > > 
> > > > 
> > > > Kind Regards, 
> > > > -Dieter 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > 
> > > > -- 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Alexandre D e rumier 
> > > > 
> > > > Ingénieur Systèmes et Réseaux 
> > > > 
> > > > 
> > > > Fixe : 03 20 68 88 85 
> > > > 
> > > > Fax : 03 20 68 90 88 
> > > > 
> > > > 
> > > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > > 12 rue Marivaux 75002 Paris 
> > > > -- 
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > > > the body of a message to majordomo@vger.kernel.org 
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > > -- 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Alexandre D e rumier 
> > > 
> > > Ingénieur Systèmes et Réseaux 
> > > 
> > > 
> > > Fixe : 03 20 68 88 85 
> > > 
> > > Fax : 03 20 68 90 88 
> > > 
> > > 
> > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > 12 rue Marivaux 75002 Paris 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> > -- 
> > 
> > 
> > 
> > 
> > 
> > Alexandre D e rumier 
> > 
> > Ingénieur Systèmes et Réseaux 
> > 
> > 
> > Fixe : 03 20 68 88 85 
> > 
> > Fax : 03 20 68 90 88 
> > 
> > 
> > 45 Bvd du Général Leclerc 59100 Roubaix 
> > 12 rue Marivaux 75002 Paris 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@vger.kernel.org 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 	
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Principal Consultant, Data Center Storage Architecture and Technology
FTS CTO
FUJITSU TECHNOLOGY SOLUTIONS GMBH
Mies-van-der-Rohe-Straße 8 / 4F
80807 München
Germany

Telephone:      +49 89 62060     1898
Telefax:	+49 89 62060 329 1898
Mobile: 	+49 170 8563173
Email:  	dieter.kasper@ts.fujitsu.com
Internet:       http://ts.fujitsu.com
Company Details: http://ts.fujitsu.com/imprint.html

[-- Attachment #2: ceph-replication-acks.png --]
[-- Type: image/png, Size: 18144 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 16:48                 ` Dieter Kasper
@ 2012-08-30 18:10                   ` Gregory Farnum
  0 siblings, 0 replies; 31+ messages in thread
From: Gregory Farnum @ 2012-08-30 18:10 UTC (permalink / raw)
  To: Dieter Kasper
  Cc: Alexandre DERUMIER, ceph-devel@vger.kernel.org, Andreas Bluemle,
	Samuel Just

On Thu, Aug 30, 2012 at 9:48 AM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
> On Thu, Aug 30, 2012 at 06:12:11PM +0200, Alexandre DERUMIER wrote:
>> >>well, you have to compare
>> >>- pure a SSD (via PCIe or SAS-6G)        vs.
>> >>- Ceph-Journal, which goes 2x over 10GbE with IP
>> >>  Client -> primary-copy -> 2nd-copy
>> >>  (= redundancy over Ethernet distance)
>>
>> Sure but the first osd ack to the client,before replicating to the others osd.
> no
>
>>
>> Client -> primary-copy -> 2nd-copy
>>        <-ack
>>          primary-copy -> 2nd-copy
>>                       -> 3st-copy
>>
>> Or I'm wrong ?
> yes,
> please have a look at the attached file: ceph-replication-acks.png
> The client usually will continue on 'ACK' and not wait for the 'commit'.
>
> BTW. all my journals are in RAM (/dev/ramX)
> 32x 2GB = 32GB of data with replica 2x
>
> If "filestore min/max sync interval" is set to 99999999
> data should 'never' be written to OSD
> ('never' at least during the tests if the written data is < 32GB)

I believe it actually will start syncing to disk when the journal is
half full (right, Sam?) — and even if it doesn't sync, there's a
reasonable chance that some of the data will be written out to disk in
the background (though that shouldn't slow anything down, of course).
:)
-Greg


>
> In such a configuration only the Ceph-Code and the Interconnect (10GbE/IP) would be the brakeman.
>
> Cheers,
> -Dieter
>
>
>>
>>
>> ----- Mail original -----
>>
>> De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> À: "Alexandre DERUMIER" <aderumier@odiso.com>
>> Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de>
>> Envoyé: Jeudi 30 Août 2012 18:02:05
>> Objet: Re: RBD performance - tuning hints
>>
>> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote:
>> > Thanks
>> >
>> > >> 8x SSD, 200GB each
>> >
>> > 20000 iops seem pretty low,no ?
>> well, you have to compare
>> - pure a SSD (via PCIe or SAS-6G) vs.
>> - Ceph-Journal, which goes 2x over 10GbE with IP
>> Client -> primary-copy -> 2nd-copy
>> (= redundancy over Ethernet distance)
>>
>> I'm curious about the answer from Inktank,
>>
>> -Dieter
>>
>> >
>> >
>> > for @intank:
>> >
>> > Is their a bottleneck somewhere in ceph ?
>> Maybe "SimpleMessenger dispatching: cause of performance problems?"
>> from Thu, 16 Aug 2012 18:08:39 +0200
>> by <andreas.bluemle@itxperts.de>
>> can be an answer.
>> Especially if a small number of OSDs is used.
>>
>> >
>> > I said that, because I would like to know if it's scale by adding new nodes.
>> >
>> > Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)
>> >
>> >
>> > ----- Mail original -----
>> >
>> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> > À: "Alexandre DERUMIER" <aderumier@odiso.com>
>> > Cc: ceph-devel@vger.kernel.org
>> > Envoyé: Jeudi 30 Août 2012 17:33:42
>> > Objet: Re: RBD performance - tuning hints
>> >
>> > On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote:
>> > > Thanks for the report !
>> > >
>> > > vs your first benchmark, it's with RBD 4M or 64K ?
>> > with 4MB (see attached config info)
>> >
>> > Cheers,
>> > -Dieter
>> >
>> > >
>> > > (how much ssd by node?)
>> > 8x SSD, 200GB each
>> >
>> > >
>> > >
>> > >
>> > > ----- Mail original -----
>> > >
>> > > De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> > > À: "Alexandre DERUMIER" <aderumier@odiso.com>
>> > > Cc: ceph-devel@vger.kernel.org
>> > > Envoyé: Jeudi 30 Août 2012 16:56:34
>> > > Objet: Re: RBD performance - tuning hints
>> > >
>> > > Hi Alexandre,
>> > >
>> > > with the 4 filestore parameter below some fio values could be increased:
>> > > filestore max sync interval = 30
>> > > filestore min sync interval = 29
>> > > filestore flusher = false
>> > > filestore queue max ops = 10000
>> > >
>> > > ###### IOPS
>> > > fio_read_4k_64: 9373
>> > > fio_read_4k_128: 9939
>> > > fio_randwrite_8k_16: 12376
>> > > fio_randwrite_4k_16: 13315
>> > > fio_randwrite_512_32: 13660
>> > > fio_randwrite_8k_32: 17318
>> > > fio_randwrite_4k_32: 18057
>> > > fio_randwrite_8k_64: 19693
>> > > fio_randwrite_512_64: 20015 <<<
>> > > fio_randwrite_4k_64: 20024 <<<
>> > > fio_randwrite_8k_128: 20547 <<<
>> > > fio_randwrite_4k_128: 20839 <<<
>> > > fio_randwrite_512_128: 21417 <<<
>> > > fio_randread_8k_128: 48872
>> > > fio_randread_4k_128: 50002
>> > > fio_randread_512_128: 51202
>> > >
>> > > ###### MB/s
>> > > fio_randread_2m_32: 628
>> > > fio_read_4m_64: 630
>> > > fio_randread_8m_32: 633
>> > > fio_read_2m_32: 637
>> > > fio_read_4m_16: 640
>> > > fio_randread_4m_16: 652
>> > > fio_write_2m_32: 660
>> > > fio_randread_4m_32: 677
>> > > fio_read_4m_32: 678
>> > > (...)
>> > > fio_write_4m_64: 771
>> > > fio_randwrite_2m_64: 789
>> > > fio_write_8m_128: 796
>> > > fio_write_4m_32: 802
>> > > fio_randwrite_4m_128: 807 <<<
>> > > fio_randwrite_2m_32: 811 <<<
>> > > fio_write_2m_128: 833 <<<
>> > > fio_write_8m_64: 901 <<<
>> > >
>> > > Best Regards,
>> > > -Dieter
>> > >
>> > >
>> > > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
>> > > > Nice results !
>> > > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
>> > > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>> > > >
>> > > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> > > > I think you can try to tune these values
>> > > >
>> > > > filestore max sync interval = 30
>> > > > filestore min sync interval = 29
>> > > > filestore flusher = false
>> > > > filestore queue max ops = 10000
>> > > >
>> > > >
>> > > >
>> > > > ----- Mail original -----
>> > > >
>> > > > De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> > > > À: ceph-devel@vger.kernel.org
>> > > > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
>> > > > Envoyé: Mardi 28 Août 2012 19:48:42
>> > > > Objet: RBD performance - tuning hints
>> > > >
>> > > > Hi,
>> > > >
>> > > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
>> > > > I can observe a pretty nice rados bench performance
>> > > > (see bench-rados.txt for details):
>> > > >
>> > > > Bandwidth (MB/sec): 961.710
>> > > > Max bandwidth (MB/sec): 1040
>> > > > Min bandwidth (MB/sec): 772
>> > > >
>> > > >
>> > > > Also the bandwidth performance generated with
>> > > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>> > > >
>> > > > .... is acceptable, e.g.
>> > > > fio_write_4m_16 795 MB/s
>> > > > fio_randwrite_8m_128 717 MB/s
>> > > > fio_randwrite_8m_16 714 MB/s
>> > > > fio_randwrite_2m_32 692 MB/s
>> > > >
>> > > >
>> > > > But, the write IOPS seems to be limited around 19k ...
>> > > > RBD 4M 64k (= optimal_io_size)
>> > > > fio_randread_512_128 53286 55925
>> > > > fio_randread_4k_128 51110 44382
>> > > > fio_randread_8k_128 30854 29938
>> > > > fio_randwrite_512_128 18888 2386
>> > > > fio_randwrite_512_64 18844 2582
>> > > > fio_randwrite_8k_64 17350 2445
>> > > > (...)
>> > > > fio_read_4k_128 10073 53151
>> > > > fio_read_4k_64 9500 39757
>> > > > fio_read_4k_32 9220 23650
>> > > > (...)
>> > > > fio_read_4k_16 9122 14322
>> > > > fio_write_4k_128 2190 14306
>> > > > fio_read_8k_32 706 13894
>> > > > fio_write_4k_64 2197 12297
>> > > > fio_write_8k_64 3563 11705
>> > > > fio_write_8k_128 3444 11219
>> > > >
>> > > >
>> > > > Any hints for tuning the IOPS (read and/or write) would be appreciated.
>> > > >
>> > > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> > > >
>> > > >
>> > > > Kind Regards,
>> > > > -Dieter
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > --
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Alexandre D e rumier
>> > > >
>> > > > Ingénieur Systèmes et Réseaux
>> > > >
>> > > >
>> > > > Fixe : 03 20 68 88 85
>> > > >
>> > > > Fax : 03 20 68 90 88
>> > > >
>> > > >
>> > > > 45 Bvd du Général Leclerc 59100 Roubaix
>> > > > 12 rue Marivaux 75002 Paris
>> > > > --
>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > > the body of a message to majordomo@vger.kernel.org
>> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > --
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Alexandre D e rumier
>> > >
>> > > Ingénieur Systèmes et Réseaux
>> > >
>> > >
>> > > Fixe : 03 20 68 88 85
>> > >
>> > > Fax : 03 20 68 90 88
>> > >
>> > >
>> > > 45 Bvd du Général Leclerc 59100 Roubaix
>> > > 12 rue Marivaux 75002 Paris
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > --
>> >
>> >
>> >
>> >
>> >
>> > Alexandre D e rumier
>> >
>> > Ingénieur Systèmes et Réseaux
>> >
>> >
>> > Fixe : 03 20 68 88 85
>> >
>> > Fax : 03 20 68 90 88
>> >
>> >
>> > 45 Bvd du Général Leclerc 59100 Roubaix
>> > 12 rue Marivaux 75002 Paris
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>> --
>>
>> --
>>
>>
>>
>>
>>
>> Alexandre D e rumier
>>
>> Ingénieur Systèmes et Réseaux
>>
>>
>> Fixe : 03 20 68 88 85
>>
>> Fax : 03 20 68 90 88
>>
>>
>> 45 Bvd du Général Leclerc 59100 Roubaix
>> 12 rue Marivaux 75002 Paris
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Principal Consultant, Data Center Storage Architecture and Technology
> FTS CTO
> FUJITSU TECHNOLOGY SOLUTIONS GMBH
> Mies-van-der-Rohe-Straße 8 / 4F
> 80807 München
> Germany
>
> Telephone:      +49 89 62060     1898
> Telefax:        +49 89 62060 329 1898
> Mobile:         +49 170 8563173
> Email:          dieter.kasper@ts.fujitsu.com
> Internet:       http://ts.fujitsu.com
> Company Details: http://ts.fujitsu.com/imprint.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / parameter doc
  2012-08-30 15:08           ` Dieter Kasper
@ 2012-08-30 20:39             ` Samuel Just
  0 siblings, 0 replies; 31+ messages in thread
From: Samuel Just @ 2012-08-30 20:39 UTC (permalink / raw)
  To: Dieter Kasper; +Cc: Josh Durgin, Alexandre DERUMIER, ceph-devel@vger.kernel.org

Ah, those are just min and max.  Sync is also triggered when the
journal hits the half-full mark.  We could make the percentage
configurable in the future.
-Sam

On Thu, Aug 30, 2012 at 8:08 AM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
> Samuel,
>
> thank you very much for this explicitely description!
>
> As far as I understand the journal acts as a ringbuffer in front of the OSD.
> Using time as a parameter to trigger sync might not be the best for
> a dynamic Storage subsystem. On a high workload e.g. 10/20 for min/max
> might be optimal for for 4 nodes with 10 OSDs each,
> but not after adding 4 additional nodes.
>
> Are there parameters to trigger the syncs to OSD
> in relation to the fill grade of the journal ?
> e.g.
> filestore [min|max] sync percent:
>
> Do not sync before min-% full; sync after max-% full
>
> What would happen if I set "filestore [min|max] sync interval" to 999999 ?
> Will the journal sync start at 100% full or at X% ?
> What is 'X' by defaut ?
> How can I set 'X' ?
>
> Best Regards,
> -Dieter
>
>
> On Thu, Aug 30, 2012 at 12:34:43AM +0200, Samuel Just wrote:
>> filestore [min|max] sync interval:
>>
>> Periodically, the filestore needs to quiesce writes and do a syncfs in
>> order to create
>> a consistent commit point up to which it can free journal entries.  Syncing more
>> frequently tends to reduce the time required to do the sync, and
>> reduces the amount
>> of data that needs to remain in the journal.  Less frequent syncs
>> would allow the
>> backing filesystem to better coalesce small writes and metadata
>> updates hopefully
>> resulting in more efficient syncs.  'filestore max sync interval'
>> defines the maximum
>> time period between syncs, 'filestore min sync interval' defines the
>> minimum time
>> period between syncs.
>>
>> filestore flusher:
>>
>> The filestore flusher forces data from large writes to be written out
>> using sync_file_range
>> before the sync in order to (hopefully) reduce the cost of the
>> eventual sync.  In practice,
>> disabling 'filestore flusher' seems to improve performance in some cases.
>>
>> filestore queue max ops:
>>
>> 'filestore queue max ops' defines the number of in progress ops the
>> filestore will accept
>> before blocking on queueing new ones.  This mostly shouldn't have much
>> of an effect
>> on performance and should probably be ignored.
>>
>> filestore op threads:
>>
>> 'filestore op threads' defines the number of threads used to submit
>> filesystem operations
>> in parallel.
>>
>> journal dio:
>>
>> 'journal dio' enables using O_DIRECT for writing to the journal.  This
>> should usually
>> be enabled.  If possible, 'journal aio' should also be enabled to
>> allow use of libaio
>> to do asynchronous writes.
>>
>> osd op threads:
>>
>> 'osd op threads' defines the size of the thread pool used to service
>> OSD operations
>> such as client requests.  Increasing this may increase the rate of
>> request processing.
>>
>> osd disk threads:
>>
>> 'osd disk threads' defines the number of threads used to perform background disk
>> intensive osd operations such as scrubbing and snap trimming.
>>
>> On Wed, Aug 29, 2012 at 12:29 PM, Dieter Kasper <d.kasper@kabelmail.de> wrote:
>> > Hi Josh,
>> >
>> > thanks for the hint.
>> > Can you please spend a view words about the meaing of these parameters ?
>> > - filestore min/max sync interval =     int/float ?     seconds ? of what ?
>> > - filestore flusher = false
>> > - filestore queue max ops = 10000
>> >         what is 'one op' ?      queue in front of what ?
>> > - filestore op threads =
>> >         what are useful values here ?
>> >
>> > - journal dio = true/false
>> > - osd op threads =
>> > - osd disk threads =
>> >
>> >
>> > Kind Regards,
>> > -Dieter
>> >
>> >
>> > On Wed, Aug 29, 2012 at 07:37:36PM +0200, Josh Durgin wrote:
>> >> On 08/29/2012 01:50 AM, Alexandre DERUMIER wrote:
>> >> > Nice results !
>> >> > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
>> >> > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>> >> >
>> >> >>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> >> > I think you can try to tune these values
>> >> >
>> >> > filestore max sync interval = 30
>> >> > filestore min sync interval = 29
>> >> > filestore flusher = false
>> >> > filestore queue max ops = 10000
>> >>
>> >> Increasing filestore_op_threads might help as well.
>> >>
>> >> > ----- Mail original -----
>> >> >
>> >> > De: "Dieter Kasper" <d.kasper@kabelmail.de>
>> >> > À: ceph-devel@vger.kernel.org
>> >> > Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de>
>> >> > Envoyé: Mardi 28 Août 2012 19:48:42
>> >> > Objet: RBD performance - tuning hints
>> >> >
>> >> > Hi,
>> >> >
>> >> > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
>> >> > I can observe a pretty nice rados bench performance
>> >> > (see bench-rados.txt for details):
>> >> >
>> >> > Bandwidth (MB/sec): 961.710
>> >> > Max bandwidth (MB/sec): 1040
>> >> > Min bandwidth (MB/sec): 772
>> >> >
>> >> >
>> >> > Also the bandwidth performance generated with
>> >> > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>> >> >
>> >> > .... is acceptable, e.g.
>> >> > fio_write_4m_16 795 MB/s
>> >> > fio_randwrite_8m_128 717 MB/s
>> >> > fio_randwrite_8m_16 714 MB/s
>> >> > fio_randwrite_2m_32 692 MB/s
>> >> >
>> >> >
>> >> > But, the write IOPS seems to be limited around 19k ...
>> >> > RBD 4M 64k (= optimal_io_size)
>> >> > fio_randread_512_128 53286 55925
>> >> > fio_randread_4k_128 51110 44382
>> >> > fio_randread_8k_128 30854 29938
>> >> > fio_randwrite_512_128 18888 2386
>> >> > fio_randwrite_512_64 18844 2582
>> >> > fio_randwrite_8k_64 17350 2445
>> >> > (...)
>> >> > fio_read_4k_128 10073 53151
>> >> > fio_read_4k_64 9500 39757
>> >> > fio_read_4k_32 9220 23650
>> >> > (...)
>> >> > fio_read_4k_16 9122 14322
>> >> > fio_write_4k_128 2190 14306
>> >> > fio_read_8k_32 706 13894
>> >> > fio_write_4k_64 2197 12297
>> >> > fio_write_8k_64 3563 11705
>> >> > fio_write_8k_128 3444 11219
>> >> >
>> >> >
>> >> > Any hints for tuning the IOPS (read and/or write) would be appreciated.
>> >> >
>> >> > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> >> >
>> >> >
>> >> > Kind Regards,
>> >> > -Dieter
>> >> >
>> >> >
>> >> >
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-30 16:16                 ` Josh Durgin
@ 2012-08-31  7:46                   ` Alexandre DERUMIER
  2012-08-31  8:11                     ` Dietmar Maurer
  0 siblings, 1 reply; 31+ messages in thread
From: Alexandre DERUMIER @ 2012-08-31  7:46 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Dieter Kasper, ceph-devel, Andreas Bluemle

>>RBD waits for the data to be on disk on all replicas. It's pretty easy
>>to relax this to in memory on all replicas, but there's no option for
>>that right now.

Ok, thanks, I miss that.

When you say disk, you mean journal ?



----- Mail original ----- 

De: "Josh Durgin" <josh.durgin@inktank.com> 
À: "Alexandre DERUMIER" <aderumier@odiso.com> 
Cc: "Dieter Kasper" <d.kasper@kabelmail.de>, ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de> 
Envoyé: Jeudi 30 Août 2012 18:16:47 
Objet: Re: RBD performance - tuning hints 

On 08/30/2012 09:12 AM, Alexandre DERUMIER wrote: 
>>> well, you have to compare 
>>> - pure a SSD (via PCIe or SAS-6G) vs. 
>>> - Ceph-Journal, which goes 2x over 10GbE with IP 
>>> Client -> primary-copy -> 2nd-copy 
>>> (= redundancy over Ethernet distance) 
> 
> Sure but the first osd ack to the client,before replicating to the others osd. 
> 
> Client -> primary-copy -> 2nd-copy 
> <-ack 
> primary-copy -> 2nd-copy 
> -> 3st-copy 
> 
> Or I'm wrong ? 

RBD waits for the data to be on disk on all replicas. It's pretty easy 
to relax this to in memory on all replicas, but there's no option for 
that right now. 

Josh 

> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
> Cc: ceph-devel@vger.kernel.org, "Andreas Bluemle" <andreas.bluemle@itxperts.de> 
> Envoyé: Jeudi 30 Août 2012 18:02:05 
> Objet: Re: RBD performance - tuning hints 
> 
> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote: 
>> Thanks 
>> 
>>>> 8x SSD, 200GB each 
>> 
>> 20000 iops seem pretty low,no ? 
> well, you have to compare 
> - pure a SSD (via PCIe or SAS-6G) vs. 
> - Ceph-Journal, which goes 2x over 10GbE with IP 
> Client -> primary-copy -> 2nd-copy 
> (= redundancy over Ethernet distance) 
> 
> I'm curious about the answer from Inktank, 
> 
> -Dieter 
> 
>> 
>> 
>> for @intank: 
>> 
>> Is their a bottleneck somewhere in ceph ? 
> Maybe "SimpleMessenger dispatching: cause of performance problems?" 
> from Thu, 16 Aug 2012 18:08:39 +0200 
> by <andreas.bluemle@itxperts.de> 
> can be an answer. 
> Especially if a small number of OSDs is used. 
> 
>> 
>> I said that, because I would like to know if it's scale by adding new nodes. 
>> 
>> Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list) 
>> 
>> 
>> ----- Mail original ----- 
>> 
>> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
>> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
>> Cc: ceph-devel@vger.kernel.org 
>> Envoyé: Jeudi 30 Août 2012 17:33:42 
>> Objet: Re: RBD performance - tuning hints 
>> 
>> On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
>>> Thanks for the report ! 
>>> 
>>> vs your first benchmark, it's with RBD 4M or 64K ? 
>> with 4MB (see attached config info) 
>> 
>> Cheers, 
>> -Dieter 
>> 
>>> 
>>> (how much ssd by node?) 
>> 8x SSD, 200GB each 
>> 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> 
>>> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
>>> À: "Alexandre DERUMIER" <aderumier@odiso.com> 
>>> Cc: ceph-devel@vger.kernel.org 
>>> Envoyé: Jeudi 30 Août 2012 16:56:34 
>>> Objet: Re: RBD performance - tuning hints 
>>> 
>>> Hi Alexandre, 
>>> 
>>> with the 4 filestore parameter below some fio values could be increased: 
>>> filestore max sync interval = 30 
>>> filestore min sync interval = 29 
>>> filestore flusher = false 
>>> filestore queue max ops = 10000 
>>> 
>>> ###### IOPS 
>>> fio_read_4k_64: 9373 
>>> fio_read_4k_128: 9939 
>>> fio_randwrite_8k_16: 12376 
>>> fio_randwrite_4k_16: 13315 
>>> fio_randwrite_512_32: 13660 
>>> fio_randwrite_8k_32: 17318 
>>> fio_randwrite_4k_32: 18057 
>>> fio_randwrite_8k_64: 19693 
>>> fio_randwrite_512_64: 20015 <<< 
>>> fio_randwrite_4k_64: 20024 <<< 
>>> fio_randwrite_8k_128: 20547 <<< 
>>> fio_randwrite_4k_128: 20839 <<< 
>>> fio_randwrite_512_128: 21417 <<< 
>>> fio_randread_8k_128: 48872 
>>> fio_randread_4k_128: 50002 
>>> fio_randread_512_128: 51202 
>>> 
>>> ###### MB/s 
>>> fio_randread_2m_32: 628 
>>> fio_read_4m_64: 630 
>>> fio_randread_8m_32: 633 
>>> fio_read_2m_32: 637 
>>> fio_read_4m_16: 640 
>>> fio_randread_4m_16: 652 
>>> fio_write_2m_32: 660 
>>> fio_randread_4m_32: 677 
>>> fio_read_4m_32: 678 
>>> (...) 
>>> fio_write_4m_64: 771 
>>> fio_randwrite_2m_64: 789 
>>> fio_write_8m_128: 796 
>>> fio_write_4m_32: 802 
>>> fio_randwrite_4m_128: 807 <<< 
>>> fio_randwrite_2m_32: 811 <<< 
>>> fio_write_2m_128: 833 <<< 
>>> fio_write_8m_64: 901 <<< 
>>> 
>>> Best Regards, 
>>> -Dieter 
>>> 
>>> 
>>> On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
>>>> Nice results ! 
>>>> (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
>>>> I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
>>>> 
>>>>>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
>>>> I think you can try to tune these values 
>>>> 
>>>> filestore max sync interval = 30 
>>>> filestore min sync interval = 29 
>>>> filestore flusher = false 
>>>> filestore queue max ops = 10000 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> 
>>>> De: "Dieter Kasper" <d.kasper@kabelmail.de> 
>>>> À: ceph-devel@vger.kernel.org 
>>>> Cc: "Dieter Kasper (KD)" <d.kasper@kabelmail.de> 
>>>> Envoyé: Mardi 28 Août 2012 19:48:42 
>>>> Objet: RBD performance - tuning hints 
>>>> 
>>>> Hi, 
>>>> 
>>>> on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
>>>> I can observe a pretty nice rados bench performance 
>>>> (see bench-rados.txt for details): 
>>>> 
>>>> Bandwidth (MB/sec): 961.710 
>>>> Max bandwidth (MB/sec): 1040 
>>>> Min bandwidth (MB/sec): 772 
>>>> 
>>>> 
>>>> Also the bandwidth performance generated with 
>>>> fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
>>>> 
>>>> .... is acceptable, e.g. 
>>>> fio_write_4m_16 795 MB/s 
>>>> fio_randwrite_8m_128 717 MB/s 
>>>> fio_randwrite_8m_16 714 MB/s 
>>>> fio_randwrite_2m_32 692 MB/s 
>>>> 
>>>> 
>>>> But, the write IOPS seems to be limited around 19k ... 
>>>> RBD 4M 64k (= optimal_io_size) 
>>>> fio_randread_512_128 53286 55925 
>>>> fio_randread_4k_128 51110 44382 
>>>> fio_randread_8k_128 30854 29938 
>>>> fio_randwrite_512_128 18888 2386 
>>>> fio_randwrite_512_64 18844 2582 
>>>> fio_randwrite_8k_64 17350 2445 
>>>> (...) 
>>>> fio_read_4k_128 10073 53151 
>>>> fio_read_4k_64 9500 39757 
>>>> fio_read_4k_32 9220 23650 
>>>> (...) 
>>>> fio_read_4k_16 9122 14322 
>>>> fio_write_4k_128 2190 14306 
>>>> fio_read_8k_32 706 13894 
>>>> fio_write_4k_64 2197 12297 
>>>> fio_write_8k_64 3563 11705 
>>>> fio_write_8k_128 3444 11219 
>>>> 
>>>> 
>>>> Any hints for tuning the IOPS (read and/or write) would be appreciated. 
>>>> 
>>>> How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
>>>> 
>>>> 
>>>> Kind Regards, 
>>>> -Dieter 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> -- 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Alexandre D e rumier 
>>>> 
>>>> Ingénieur Systèmes et Réseaux 
>>>> 
>>>> 
>>>> Fixe : 03 20 68 88 85 
>>>> 
>>>> Fax : 03 20 68 90 88 
>>>> 
>>>> 
>>>> 45 Bvd du Général Leclerc 59100 Roubaix 
>>>> 12 rue Marivaux 75002 Paris 
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>>>> the body of a message to majordomo@vger.kernel.org 
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Alexandre D e rumier 
>>> 
>>> Ingénieur Systèmes et Réseaux 
>>> 
>>> 
>>> Fixe : 03 20 68 88 85 
>>> 
>>> Fax : 03 20 68 90 88 
>>> 
>>> 
>>> 45 Bvd du Général Leclerc 59100 Roubaix 
>>> 12 rue Marivaux 75002 Paris 
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> -- 
>> 
>> 
>> 
>> 
>> 
>> Alexandre D e rumier 
>> 
>> Ingénieur Systèmes et Réseaux 
>> 
>> 
>> Fixe : 03 20 68 88 85 
>> 
>> Fax : 03 20 68 90 88 
>> 
>> 
>> 45 Bvd du Général Leclerc 59100 Roubaix 
>> 12 rue Marivaux 75002 Paris 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 




-- 

-- 



	

Alexandre D e rumier 

Ingénieur Systèmes et Réseaux 


Fixe : 03 20 68 88 85 

Fax : 03 20 68 90 88 


45 Bvd du Général Leclerc 59100 Roubaix 
12 rue Marivaux 75002 Paris 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: RBD performance - tuning hints
  2012-08-31  7:46                   ` Alexandre DERUMIER
@ 2012-08-31  8:11                     ` Dietmar Maurer
  2012-08-31  8:48                       ` Mark Kirkwood
  2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
  0 siblings, 2 replies; 31+ messages in thread
From: Dietmar Maurer @ 2012-08-31  8:11 UTC (permalink / raw)
  To: Alexandre DERUMIER, Josh Durgin
  Cc: Dieter Kasper, ceph-devel@vger.kernel.org, Andreas Bluemle

>>RBD waits for the data to be on disk on all replicas. It's pretty easy
>>to relax this to in memory on all replicas, but there's no option for
>>that right now.

I thought that is dangerous, because you can loose data?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints
  2012-08-31  8:11                     ` Dietmar Maurer
@ 2012-08-31  8:48                       ` Mark Kirkwood
  2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
  2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
  1 sibling, 1 reply; 31+ messages in thread
From: Mark Kirkwood @ 2012-08-31  8:48 UTC (permalink / raw)
  To: Dietmar Maurer
  Cc: Alexandre DERUMIER, Josh Durgin, Dieter Kasper,
	ceph-devel@vger.kernel.org, Andreas Bluemle

On 31/08/12 20:11, Dietmar Maurer wrote:
>>> RBD waits for the data to be on disk on all replicas. It's pretty easy
>>> to relax this to in memory on all replicas, but there's no option for
>>> that right now.
> I thought that is dangerous, because you can loose data?
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+��ݢj"��!tml=

And it is not immediately obvious that this is the bottleneck - from 
what I can see the 'sync' call being used (sync_file_range) is extremely 
fast and is *not* the major slowdown effect...

Regards

Mark

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / major slowdown effect(s)
  2012-08-31  8:48                       ` Mark Kirkwood
@ 2012-08-31  9:49                         ` Dieter Kasper
  2012-08-31 10:16                           ` Mark Kirkwood
  0 siblings, 1 reply; 31+ messages in thread
From: Dieter Kasper @ 2012-08-31  9:49 UTC (permalink / raw)
  To: Mark Kirkwood
  Cc: Dietmar Maurer, Alexandre DERUMIER, Josh Durgin,
	ceph-devel@vger.kernel.org, Andreas Bluemle

Mark, Inktank,

OK, it is very likely that 'sync_file_range' is not the major slowdown 'culprit'.

But, which areas (design, current implementation, protocol, interconnect, tuning parameter, ...)
would you rate as 'major slowdown effect(s)' ?

Best Regards,
-Dieter


On Fri, Aug 31, 2012 at 08:48:34PM +1200, Mark Kirkwood wrote:
> On 31/08/12 20:11, Dietmar Maurer wrote:
> >>>RBD waits for the data to be on disk on all replicas. It's pretty easy
> >>>to relax this to in memory on all replicas, but there's no option for
> >>>that right now.
> >I thought that is dangerous, because you can loose data?
> >N???????????????r??????y?????????b???X????????v???^???)??{.n???+?????????z???]z???{ay???\x1d???????,j\a??????f?????????h?????????z???\x1e???w?????????\f?????????j:+v?????????w???j???m????????????\a????????????zZ+????????j"??????!tml=
> 
> And it is not immediately obvious that this is the bottleneck - from
> what I can see the 'sync' call being used (sync_file_range) is
> extremely fast and is *not* the major slowdown effect...
> 
> Regards
> 
> Mark
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: RBD performance - tuning hints / major slowdown effect(s)
  2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
@ 2012-08-31 10:16                           ` Mark Kirkwood
  0 siblings, 0 replies; 31+ messages in thread
From: Mark Kirkwood @ 2012-08-31 10:16 UTC (permalink / raw)
  To: Dieter Kasper
  Cc: Dietmar Maurer, Alexandre DERUMIER, Josh Durgin,
	ceph-devel@vger.kernel.org, Andreas Bluemle

Sorry Dieter,

Not trying to say "you are wrong" or anything like that - just trying to 
add to the problem solving body of knowledge that from what *I* have 
tried out the 'sync' issue does not look to be the bad guy here - altho 
more analysis is always welcome (usual story - my findings should be 
confirm-able by others doing similar tests)!

regards

Mark

On 31/08/12 21:49, Dieter Kasper wrote:
> Mark, Inktank,
>
> OK, it is very likely that 'sync_file_range' is not the major slowdown 'culprit'.
>
> But, which areas (design, current implementation, protocol, interconnect, tuning parameter, ...)
> would you rate as 'major slowdown effect(s)' ?
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: RBD performance - tuning hints
  2012-08-31  8:11                     ` Dietmar Maurer
  2012-08-31  8:48                       ` Mark Kirkwood
@ 2012-08-31 10:58                       ` Jerker Nyberg
  1 sibling, 0 replies; 31+ messages in thread
From: Jerker Nyberg @ 2012-08-31 10:58 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

On Fri, 31 Aug 2012, Dietmar Maurer wrote:

>>> RBD waits for the data to be on disk on all replicas. It's pretty easy
>>> to relax this to in memory on all replicas, but there's no option for
>>> that right now.
>
> I thought that is dangerous, because you can loose data?

By putting the journal in a tmpfs then data written to the journal does 
not hit disk. If all replicas fail data will be lost.

For some use cases that might be ok. For example incremental backups or 
fast scratch space or volatile virtual machines etc.

Also see this previous discussion:

http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg06070.html

--jerker

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-08-31 10:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-20 10:24 Ceph write performance George Shuklin
     [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
2012-07-20 10:48   ` George Shuklin
2012-07-20 11:49     ` Mark Nelson
2012-07-20 20:36       ` Ceph write performance on RAM-DISK Dieter Kasper
2012-07-20 21:28         ` Mark Nelson
2012-07-20 15:53 ` Ceph write performance Matthew Richardson
2012-07-20 16:37 ` Gregory Farnum
2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
2012-08-28 19:04     ` Dieter Kasper
2012-08-29  8:50   ` Alexandre DERUMIER
2012-08-29 17:37     ` Josh Durgin
2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
2012-08-29 22:34         ` Samuel Just
2012-08-30 15:08           ` Dieter Kasper
2012-08-30 20:39             ` Samuel Just
2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
2012-08-30 15:28       ` Alexandre DERUMIER
2012-08-30 15:33         ` Dieter Kasper
2012-08-30 15:46           ` Alexandre DERUMIER
2012-08-30 16:02             ` Dieter Kasper
2012-08-30 16:12               ` Alexandre DERUMIER
2012-08-30 16:16                 ` Josh Durgin
2012-08-31  7:46                   ` Alexandre DERUMIER
2012-08-31  8:11                     ` Dietmar Maurer
2012-08-31  8:48                       ` Mark Kirkwood
2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
2012-08-31 10:16                           ` Mark Kirkwood
2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
2012-08-30 16:48                 ` Dieter Kasper
2012-08-30 18:10                   ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.