All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dieter Kasper <d.kasper@kabelmail.de>
To: Mark Nelson <mark.nelson@inktank.com>
Cc: George Shuklin <shuklin@selectel.ru>,
	ceph-devel@vger.kernel.org,
	"Dieter Kasper (KD)" <d.kasper@kabelmail.de>
Subject: Re: Ceph write performance on RAM-DISK
Date: Fri, 20 Jul 2012 22:36:46 +0200	[thread overview]
Message-ID: <20120720203646.GA6587@oder.kd-bie.de> (raw)
In-Reply-To: <500945CA.8000406@inktank.com>

[-- Attachment #1: Type: text/plain, Size: 5220 bytes --]

Hi Mark, George,

I can observe a similar (poor) Performance on my system with fio on /dev/rbd1

#--- seq. write RBD
RX37-0:~ # dd if=/dev/zero of=/dev/rbd1 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 41.1819 s, 255 MB/s

#--- seq. read RBD
RX37-0:~ # dd of=/dev/zero if=/dev/rbd1 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40.9595 s, 256 MB/s

#--- seq. read /dev/ramX
RX37-0:~ # dd of=/dev/zero if=/dev/ram0 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 4.68389 s, 2.2 GB/s

Does ceph-osd/filestore 'eat' 90% of my resources/bandwidth/latency ?


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
(...)
  write: io=461592KB, bw=15371KB/s, iops=3842 , runt= 30030msec
  write: io=5120.0MB, bw=893927KB/s, iops=223481 , runt=  5865msec (on /dev/ram0)


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randread --bs=4k --size=5G --numjobs=64 --runtime=30 --group_reporting --name=file1
(...)
  read : io=698356KB, bw=23240KB/s, iops=5809 , runt= 30050msec
  read : io=5120.0MB, bw=1631.1MB/s, iops=417559 , runt=  3139msec (on /dev/ram0)


RX37-0:~ # fio --filename=/dev/rbd1 --direct=1 --rw=randwrite --bs=1m --size=5G --numjobs=4 --runtime=10 --group_reporting --name=file1
(...)
  write: io=6377.0MB, bw=217125KB/s, iops=212 , runt= 30075msec
  write: io=5120.0MB, bw=2114.9MB/s, iops=2114 , runt=  2421msec (on /dev/ram0)


Where is the bottleneck ?
What is filestore doing ?
How can I disable the journal and write only to the btrfs OSDs ? (like as they would be SSDs)
How can I get better performance ?


Regards,
Dieter 

P.S. I will try to get the "test_filestore_workloadgen" 


On Fri, Jul 20, 2012 at 06:49:30AM -0500, Mark Nelson wrote:
> Hi George,
> 
> I think you may find that the limitation is in the the filestore.
> It's one of the things I've been working on trying to track down as
> I've seen low performance on SSDs with small request sizes as well.
> You can use the test_filestore_workloadgen to specifically test the
> filestore code with small requests if you'd like.  I'm not sure if
> it is included with the binary distribution but it can be compiled
> if you download the src.  I think it's "make
> test_filestore_workloadgen" in the src directory.
> 
> Mark
> 
> On 7/20/12 5:48 AM, George Shuklin wrote:
> >On 20.07.2012 14:41, Dieter Kasper (KD) wrote:
> >
> >Good day.
> >
> >Thank you for attention.
> >
> >ramdisk size ~70Gb (modprobe brd rd_size=70000000)
> >journal seems be on same device as storage
> >size of OSD was unchanged (... means I create it by manual and do not
> >make any specific changes)
> >
> >During test I watch IO load closely, IO on MDS/MON was insignificant
> >(most of the time zero, sometimes few very mild peaks).
> >
> >Just in case, configs:
> >
> >ceph.conf:
> >
> >[osd]
> >         osd journal size = 1000
> >         filestore xattr use omap = true
> >
> >[mon.a]
> >         host = srv1
> >         mon addr = 192.168.0.1:6789
> >
> >[osd.0]
> >         host = srv1
> >
> >[mds.a]
> >         host = srv1
> >
> >fio.ini:
> >[test]
> >blocksize=4k
> >filename=/media/test
> >size=16g
> >fallocate=posix
> >rw=randread
> >direct=1
> >buffered=0
> >ioengine=libaio
> >iodepth=32
> >
> >
> >Thanks for advising, I'll recheck with new settings.
> >
> >>George,
> >>
> >>please share more details of your config:
> >>- RAM size of your system
> >>- location of the journal
> >>- size of your OSD
> >>
> >>Can you try (just for the 1st test) to
> >>.. put the journal on RAM disk
> >>.. put the MDS on RAM disk
> >>.. put the MON on RAM disk
> >>.. use btrfs for OSD
> >>
> >>As an alternative to isolate the bottleneck you can try to
> >>- run without a journal
> >>- use RBD instead Ceph-FS
> >>   + create a File System on top of the /dev/rbd0
> >>
> >>Regards,
> >>Dieter Kasper
> >>
> >>
> >>On Fri, Jul 20, 2012 at 12:24:15PM +0200, George Shuklin wrote:
> >>>Good day.
> >>>
> >>>I've start to play with Ceph... And I found some kinda strange
> >>>performance issues. I'm not sure if this is due ceph limitation or my
> >>>bad setup.
> >>>
> >>>Setup:
> >>>
> >>>osd - xfs on ramdisk (only one osd)
> >>>mds - raid0 on 10 disks
> >>>mon - second raid0 on 10 disks
> >>>
> >>>I've mount ceph share at localhost and run FIO (randwrite, 4k,
> >>>iodepth=32)
> >>>
> >>>What I've got: 1900 IOPS on writing (4k block, 1Gb span).
> >>>
> >>>Normally fio shows about 200kIOPS writing on ramdisk.
> >>>
> >>>Why it was so slow? I've  done setup exactly like described here:
> >>>http://ceph.com/docs/master/start/quick-start/#start-the-ceph-cluster
> >>>(but one osd).
> >>>
> >>>Thanks.
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[-- Attachment #2: ceph.conf --]
[-- Type: text/plain, Size: 864 bytes --]

[global]
        pid file = /var/run/ceph/$name.pid
        debug ms = 0
        auth supported = cephx
        keyring = /etc/ceph/keyring.client
[mon]
        mon data = /tmp/mon$id
[mon.a]
	host = localhost
	mon addr = 127.0.0.1:6789

[osd]
        journal dio = false
        osd data = /data/$name
	osd journal = /mnt/osd.journal/$name/journal
        osd journal size = 1000

        keyring = /etc/ceph/keyring.$name
        # debug osd = 20
        # debug ms = 1         ; message traffic
        # debug filestore = 20 ; local object storage
        # debug journal = 20   ; local journaling
        # debug monc = 5      ; monitor interaction, startup

[osd.0]
	host = localhost
        btrfs devs = /dev/ram0

[osd.1]
	host = localhost
        btrfs devs = /dev/ram1

[osd.2]
	host = localhost
        btrfs devs = /dev/ram2

[mds.a]
	host = localhost

  reply	other threads:[~2012-07-20 22:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-20 10:24 Ceph write performance George Shuklin
     [not found] ` <20120720104150.GA16630@oder.kd-bie.de>
2012-07-20 10:48   ` George Shuklin
2012-07-20 11:49     ` Mark Nelson
2012-07-20 20:36       ` Dieter Kasper [this message]
2012-07-20 21:28         ` Ceph write performance on RAM-DISK Mark Nelson
2012-07-20 15:53 ` Ceph write performance Matthew Richardson
2012-07-20 16:37 ` Gregory Farnum
2012-08-28 17:48 ` RBD performance - tuning hints Dieter Kasper
2012-08-28 18:53   ` Smart Weblications GmbH - Florian Wiessner
2012-08-28 19:04     ` Dieter Kasper
2012-08-29  8:50   ` Alexandre DERUMIER
2012-08-29 17:37     ` Josh Durgin
2012-08-29 19:29       ` RBD performance - tuning hints / parameter doc Dieter Kasper
2012-08-29 22:34         ` Samuel Just
2012-08-30 15:08           ` Dieter Kasper
2012-08-30 20:39             ` Samuel Just
2012-08-30 14:56     ` RBD performance - tuning hints Dieter Kasper
2012-08-30 15:28       ` Alexandre DERUMIER
2012-08-30 15:33         ` Dieter Kasper
2012-08-30 15:46           ` Alexandre DERUMIER
2012-08-30 16:02             ` Dieter Kasper
2012-08-30 16:12               ` Alexandre DERUMIER
2012-08-30 16:16                 ` Josh Durgin
2012-08-31  7:46                   ` Alexandre DERUMIER
2012-08-31  8:11                     ` Dietmar Maurer
2012-08-31  8:48                       ` Mark Kirkwood
2012-08-31  9:49                         ` RBD performance - tuning hints / major slowdown effect(s) Dieter Kasper
2012-08-31 10:16                           ` Mark Kirkwood
2012-08-31 10:58                       ` RBD performance - tuning hints Jerker Nyberg
2012-08-30 16:48                 ` Dieter Kasper
2012-08-30 18:10                   ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120720203646.GA6587@oder.kd-bie.de \
    --to=d.kasper@kabelmail.de \
    --cc=ceph-devel@vger.kernel.org \
    --cc=mark.nelson@inktank.com \
    --cc=shuklin@selectel.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.