All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: Dieter Kasper <d.kasper@kabelmail.de>
Cc: George Shuklin <shuklin@selectel.ru>, ceph-devel@vger.kernel.org
Subject: Re: Ceph write performance on RAM-DISK
Date: Fri, 20 Jul 2012 16:28:24 -0500	[thread overview]
Message-ID: <5009CD78.7050301@inktank.com> (raw)
In-Reply-To: <20120720203646.GA6587@oder.kd-bie.de>

On 07/20/2012 03:36 PM, Dieter Kasper wrote:
> Hi Mark, George,
>
> I can observe a similar (poor) Performance on my system with fio on /dev/rbd1
>
> #--- seq. write RBD
> RX37-0:~ # dd if=/dev/zero of=/dev/rbd1 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 41.1819 s, 255 MB/s
>
> #--- seq. read RBD
> RX37-0:~ # dd of=/dev/zero if=/dev/rbd1 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 40.9595 s, 256 MB/s
>
> #--- seq. read /dev/ramX
> RX37-0:~ # dd of=/dev/zero if=/dev/ram0 bs=1024k count=10000
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 4.68389 s, 2.2 GB/s
>
> Does ceph-osd/filestore 'eat' 90% of my resources/bandwidth/latency ?
>

Well, there are multiple layers involved here, so it's possible that 
some of the code for RBD is playing a part in this too.  I have 
specifically seen slow performance with smaller requests with the 
filestore though, so that is where I'm focusing my energy right now.

>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
> (...)
>    write: io=461592KB, bw=15371KB/s, iops=3842 , runt= 30030msec
>    write: io=5120.0MB, bw=893927KB/s, iops=223481 , runt=  5865msec (on /dev/ram0)
>
>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randread --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
> (...)
>    read : io=698356KB, bw=23240KB/s, iops=5809 , runt= 30050msec
>    read : io=5120.0MB, bw=1631.1MB/s, iops=417559 , runt=  3139msec (on /dev/ram0)
>
>
> RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=1m --size=5G --numjobs=4 --runtime=10 --group_reporting --name=file1
> (...)
>    write: io=6377.0MB, bw=217125KB/s, iops=212 , runt= 30075msec
>    write: io=5120.0MB, bw=2114.9MB/s, iops=2114 , runt=  2421msec (on /dev/ram0)
>
>
> Where is the bottleneck ?
> What is filestore doing ?
> How can I disable the journal and write only to the btrfs OSDs ? (like as they would be SSDs)
> How can I get better performance ?

Not yet sure where the bottleneck is, but we are actively looking into 
it.  Sadly the process has been complicated by potential bottleneck in 
our test hardware that could be masking real issues in the code.

>
>
> Regards,
> Dieter
>
> P.S. I will try to get the "test_filestore_workloadgen"
>
>
> On Fri, Jul 20, 2012 at 06:49:30AM -0500, Mark Nelson wrote:
>> Hi George,
>>
>> I think you may find that the limitation is in the the filestore.
>> It's one of the things I've been working on trying to track down as
>> I've seen low performance on SSDs with small request sizes as well.
>> You can use the test_filestore_workloadgen to specifically test the
>> filestore code with small requests if you'd like.  I'm not sure if
>> it is included with the binary distribution but it can be compiled
>> if you download the src.  I think it's "make
>> test_filestore_workloadgen" in the src directory.
>>
>> Mark
>>
>> On 7/20/12 5:48 AM, George Shuklin wrote:
>>> On 20.07.2012 14:41, Dieter Kasper (KD) wrote:
>>>
>>> Good day.
>>>
>>> Thank you for attention.
>>>
>>> ramdisk size ~70Gb (modprobe brd rd_size=70000000)
>>> journal seems be on same device as storage
>>> size of OSD was unchanged (... means I create it by manual and do not
>>> make any specific changes)
>>>
>>> During test I watch IO load closely, IO on MDS/MON was insignificant
>>> (most of the time zero, sometimes few very mild peaks).
>>>
>>> Just in case, configs:
>>>
>>> ceph.conf:
>>>
>>> [osd]
>>>          osd journal size = 1000
>>>          filestore xattr use omap = true
>>>
>>> [mon.a]
>>>          host = srv1
>>>          mon addr = 192.168.0.1:6789
>>>
>>> [osd.0]
>>>          host = srv1
>>>
>>> [mds.a]
>>>          host = srv1
>>>
>>> fio.ini:
>>> [test]
>>> blocksize=4k
>>> filename=/media/test
>>> size=16g
>>> fallocate=posix
>>> rw=randread
>>> direct=1
>>> buffered=0
>>> ioengine=libaio
>>> iodepth=32
>>>
>>>
>>> Thanks for advising, I'll recheck with new settings.
>>>
>>>> George,
>>>>
>>>> please share more details of your config:
>>>> - RAM size of your system
>>>> - location of the journal
>>>> - size of your OSD
>>>>
>>>> Can you try (just for the 1st test) to
>>>> .. put the journal on RAM disk
>>>> .. put the MDS on RAM disk
>>>> .. put the MON on RAM disk
>>>> .. use btrfs for OSD
>>>>
>>>> As an alternative to isolate the bottleneck you can try to
>>>> - run without a journal
>>>> - use RBD instead Ceph-FS
>>>>    + create a File System on top of the /dev/rbd0
>>>>
>>>> Regards,
>>>> Dieter Kasper
>>>>
>>>>
>>>> On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
>>>>> Good day.
>>>>>
>>>>> I've start to play with Ceph... And I found some kinda strange
>>>>> performance issues. I'm not sure if this is due ceph limitation or my
>>>>> bad setup.
>>>>>
>>>>> Setup:
>>>>>
>>>>> osd - xfs on ramdisk (only one osd)
>>>>> mds - raid0 on 10 disks
>>>>> mon - second raid0 on 10 disks
>>>>>
>>>>> I've mount ceph share at localhost and run FIO (randwrite, 4k,
>>>>> iodepth=32)
>>>>>
>>>>> What I've got: 1900 IOPS on writing (4k block, 1Gb span).
>>>>>
>>>>> Normally fio shows about 200kIOPS writing on ramdisk.
>>>>>
>>>>> Why it was so slow? I've  done setup exactly like described here:
>>>>> http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
>>>>> (but one osd).
>>>>>
>>>>> Thanks.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


-- 
Mark Nelson
Performance Engineer
Inktank

  reply	other threads:[~2012-07-20 21:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-20 10:24 Ceph write performance George Shuklin
     [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
2012-07-20 10:48   ` George Shuklin
2012-07-20 11:49     ` Mark Nelson
2012-07-20 20:36       ` Ceph write performance on RAM-DISK Dieter Kasper
2012-07-20 21:28         ` Mark Nelson [this message]
2012-07-20 15:53 ` Ceph write performance Matthew Richardson
2012-07-20 16:37 ` Gregory Farnum
2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
2012-08-28 19:04     ` Dieter Kasper
2012-08-29  8:50   ` Alexandre DERUMIER
2012-08-29 17:37     ` Josh Durgin
2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
2012-08-29 22:34         ` Samuel Just
2012-08-30 15:08           ` Dieter Kasper
2012-08-30 20:39             ` Samuel Just
2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
2012-08-30 15:28       ` Alexandre DERUMIER
2012-08-30 15:33         ` Dieter Kasper
2012-08-30 15:46           ` Alexandre DERUMIER
2012-08-30 16:02             ` Dieter Kasper
2012-08-30 16:12               ` Alexandre DERUMIER
2012-08-30 16:16                 ` Josh Durgin
2012-08-31  7:46                   ` Alexandre DERUMIER
2012-08-31  8:11                     ` Dietmar Maurer
2012-08-31  8:48                       ` Mark Kirkwood
2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
2012-08-31 10:16                           ` Mark Kirkwood
2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
2012-08-30 16:48                 ` Dieter Kasper
2012-08-30 18:10                   ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5009CD78.7050301@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=d.kasper@kabelmail.de \
    --cc=shuklin@selectel.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.