CEPH filesystem development
 help / color / mirror / Atom feed
From: Mark Nelson <mnelson@redhat.com>
To: kernel neophyte <neophyte.hacker001@gmail.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: newstore performance update
Date: Wed, 29 Apr 2015 08:08:42 -0500	[thread overview]
Message-ID: <5540D7DA.2000503@redhat.com> (raw)
In-Reply-To: <CAFkUHxe=e+1Hk7ChtF+HHFkQ-RtBNBhzJ=VCu+x_aZF4MyMiRA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

Hi,

ceph.conf file attached.  It's a little ugly because I've been playing 
with various parameters.  You'll probably want to enable debug newstore 
= 30 if you plan to do any debugging.  Also, the code has been changing 
quickly so performance may have changed if you haven't tested within the 
last week.

Mark

On 04/28/2015 09:59 PM, kernel neophyte wrote:
> Hi Mark,
>
> I am trying to measure 4k RW performance on Newstore, and I am not
> anywhere close to the numbers you are getting!
>
> Could you share your ceph.conf for these test ?
>
> -Neo
>
> On Tue, Apr 28, 2015 at 5:07 PM, Mark Nelson <mnelson@redhat.com> wrote:
>> Nothing official, though roughly from memory:
>>
>> ~1.7GB/s and something crazy like 100K IOPS for the SSD.
>>
>> ~150MB/s and ~125-150 IOPS for the spinning disk.
>>
>> Mark
>>
>>
>> On 04/28/2015 07:00 PM, Venkateswara Rao Jujjuri wrote:
>>>
>>> Thanks for sharing; newstore numbers look lot better;
>>>
>>> Wondering if we have any base line numbers to put things into perspective.
>>> like what is it on XFS or on librados?
>>>
>>> JV
>>>
>>> On Tue, Apr 28, 2015 at 4:25 PM, Mark Nelson <mnelson@redhat.com> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>> Sage has been furiously working away at fixing bugs in newstore and
>>>> improving performance.  Specifically we've been focused on write
>>>> performance
>>>> as newstore was lagging filestore but quite a bit previously.  A lot of
>>>> work
>>>> has gone into implementing libaio behind the scenes and as a result
>>>> performance on spinning disks with SSD WAL (and SSD backed rocksdb) has
>>>> improved pretty dramatically. It's now often beating filestore:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> On the other hand, sequential writes are slower than random writes when
>>>> the
>>>> OSD, DB, and WAL are all on the same device be it a spinning disk or SSD.
>>>> In this situation newstore does better with random writes and sometimes
>>>> beats filestore (such as in the everything-on-spinning disk tests, and
>>>> when
>>>> IO sizes are small in the everything-on-ssd tests).
>>>>
>>>> Newstore is changing daily so keep in mind that these results are almost
>>>> assuredly going to change.  An interesting area of investigation will be
>>>> why
>>>> sequential writes are slower than random writes, and whether or not we
>>>> are
>>>> being limited by rocksdb ingest speed and how.
>>>>
>>>> I've also uploaded a quick perf call-graph I grabbed during the "all-SSD"
>>>> 32KB sequential write test to see if rocksdb was starving one of the
>>>> cores,
>>>> but found something that looks quite a bit different:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: ceph.conf.1osd --]
[-- Type: text/plain, Size: 4221 bytes --]

[global]
        osd pool default size = 1

        osd crush chooseleaf type = 0
        enable experimental unrecoverable data corrupting features = newstore rocksdb
        osd objectstore = newstore
#        newstore aio max queue depth = 4096 
#        newstore overlay max length = 8388608 
#        rocksdb wal dir = "/wal"
#        newstore db path = "/wal"
        newstore overlay max = 0
        newstore_wal_threads = 8
        rocksdb_write_buffer_size = 536870912
        rocksdb_write_buffer_num = 4
        rocksdb_min_write_buffer_number_to_merge = 2
        rocksdb_log = /home/nhm/tmp/cbt/ceph/log/rocksdb.log
        rocksdb_max_background_compactions = 4
        rocksdb_compaction_threads = 4
        rocksdb_level0_file_num_compaction_trigger = 4
        rocksdb_max_bytes_for_level_base = 104857600 //100MB
        rocksdb_target_file_size_base = 10485760      //10MB
        rocksdb_num_levels = 3
        rocksdb_compression = none

        keyring = /home/nhm/tmp/cbt/ceph/keyring
        osd pg bits = 8  
        osd pgp bits = 8
	auth supported = none
        log to syslog = false
        log file = /home/nhm/tmp/cbt/ceph/log/$name.log
        filestore xattr use omap = true
        auth cluster required = none
        auth service required = none
        auth client required = none

        public network = 192.168.10.0/24
        cluster network = 192.168.10.0/24
        rbd cache = true
        osd scrub load threshold = 0.01
        osd scrub min interval = 137438953472
        osd scrub max interval = 137438953472
        osd deep scrub interval = 137438953472
        osd max scrubs = 16

        filestore merge threshold = 40
        filestore split multiple = 8
        osd op threads = 8

        debug newstore = "0/0" 

        debug_lockdep = "0/0" 
        debug_context = "0/0"
        debug_crush = "0/0"
        debug_mds = "0/0"
        debug_mds_balancer = "0/0"
        debug_mds_locker = "0/0"
        debug_mds_log = "0/0"
        debug_mds_log_expire = "0/0"
        debug_mds_migrator = "0/0"
        debug_buffer = "0/0"
        debug_timer = "0/0"
        debug_filer = "0/0"
        debug_objecter = "0/0"
        debug_rados = "0/0"
        debug_rbd = "0/0"
        debug_journaler = "0/0"
        debug_objectcacher = "0/0"
        debug_client = "0/0"
        debug_osd = "0/0"
        debug_optracker = "0/0"
        debug_objclass = "0/0"
        debug_filestore = "0/0"
        debug_journal = "0/0"
        debug_ms = "0/0"
        debug_mon = "0/0"
        debug_monc = "0/0"
        debug_paxos = "0/0"
        debug_tp = "0/0"
        debug_auth = "0/0"
        debug_finisher = "0/0"
        debug_heartbeatmap = "0/0"
        debug_perfcounter = "0/0"
        debug_rgw = "0/0"
        debug_hadoop = "0/0"
        debug_asok = "0/0"
        debug_throttle = "0/0"

        mon pg warn max object skew = 100000
        mon pg warn min per osd = 0
        mon pg warn max per osd = 32768


#        debug optracker = 30
#        debug tp = 5
#        objecter infilght op bytes = 1073741824
#        objecter inflight ops = 8192
 
#        filestore wbthrottle enable = false
#        debug osd = 20

#        filestore wbthrottle xfs ios start flusher = 500
#        filestore wbthrottle xfs ios hard limit = 5000
#        filestore wbthrottle xfs inodes start flusher = 500
#        filestore wbthrottle xfs inodes hard limit = 5000
#        filestore wbthrottle xfs bytes start flusher = 41943040
#        filestore wbthrottle xfs bytes hard limit = 419430400

#        filestore wbthrottle btrfs ios start flusher = 500
#        filestore wbthrottle btrfs ios hard limit = 5000
#        filestore wbthrottle btrfs inodes start flusher = 500
#        filestore wbthrottle btrfs inodes hard limit = 5000
#        filestore wbthrottle btrfs bytes start flusher = 41943040
#        filestore wbthrottle btrfs bytes hard limit = 419430400

[mon]
	mon data = /home/nhm/tmp/cbt/ceph/mon.$id
        
[mon.a]
	host = burnupiX 
        mon addr = 127.0.0.1:6789

[osd.0]
	host = burnupiX
        osd data = /home/nhm/tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal
#        osd journal = /dev/sds1


  parent reply	other threads:[~2015-04-29 13:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 23:25 newstore performance update Mark Nelson
2015-04-29  0:00 ` Venkateswara Rao Jujjuri
2015-04-29  0:07   ` Mark Nelson
2015-04-29  2:59     ` kernel neophyte
2015-04-29  4:31       ` Alexandre DERUMIER
2015-04-29 13:11         ` Mark Nelson
2015-04-29 13:08       ` Mark Nelson [this message]
2015-04-29 15:55         ` Chen, Xiaoxi
2015-04-29 19:06           ` Mark Nelson
2015-04-30  1:08             ` Chen, Xiaoxi
2015-04-29  0:00 ` Mark Nelson
2015-04-29  8:33 ` Chen, Xiaoxi
2015-04-29 13:20   ` Mark Nelson
2015-04-29 15:00     ` Chen, Xiaoxi
2015-04-29 16:38   ` Sage Weil
2015-04-30 13:21     ` Haomai Wang
2015-04-30 16:20       ` Sage Weil
2015-04-30 13:28     ` Mark Nelson
2015-04-30 14:02       ` Chen, Xiaoxi
2015-04-30 14:11         ` Mark Nelson
2015-04-30 18:09           ` Sage Weil
2015-05-01 14:48             ` Mark Nelson
2015-05-01 15:22               ` Chen, Xiaoxi
2015-05-02  0:33               ` Sage Weil
2015-05-04 17:50                 ` Mark Nelson
2015-05-04 18:08                   ` Sage Weil
2015-05-05 17:43                     ` Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5540D7DA.2000503@redhat.com \
    --to=mnelson@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=neophyte.hacker001@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox