All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mnelson@redhat.com>
To: kernel neophyte <neophyte.hacker001@gmail.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: newstore performance update
Date: Wed, 29 Apr 2015 08:08:42 -0500	[thread overview]
Message-ID: <5540D7DA.2000503@redhat.com> (raw)
In-Reply-To: <CAFkUHxe=e+1Hk7ChtF+HHFkQ-RtBNBhzJ=VCu+x_aZF4MyMiRA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]

Hi,

ceph.conf file attached.  It's a little ugly because I've been playing 
with various parameters.  You'll probably want to enable debug newstore 
= 30 if you plan to do any debugging.  Also, the code has been changing 
quickly so performance may have changed if you haven't tested within the 
last week.

Mark

On 04/28/2015 09:59 PM, kernel neophyte wrote:
> Hi Mark,
>
> I am trying to measure 4k RW performance on Newstore, and I am not
> anywhere close to the numbers you are getting!
>
> Could you share your ceph.conf for these test ?
>
> -Neo
>
> On Tue, Apr 28, 2015 at 5:07 PM, Mark Nelson <mnelson@redhat.com> wrote:
>> Nothing official, though roughly from memory:
>>
>> ~1.7GB/s and something crazy like 100K IOPS for the SSD.
>>
>> ~150MB/s and ~125-150 IOPS for the spinning disk.
>>
>> Mark
>>
>>
>> On 04/28/2015 07:00 PM, Venkateswara Rao Jujjuri wrote:
>>>
>>> Thanks for sharing; newstore numbers look lot better;
>>>
>>> Wondering if we have any base line numbers to put things into perspective.
>>> like what is it on XFS or on librados?
>>>
>>> JV
>>>
>>> On Tue, Apr 28, 2015 at 4:25 PM, Mark Nelson <mnelson@redhat.com> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>> Sage has been furiously working away at fixing bugs in newstore and
>>>> improving performance.  Specifically we've been focused on write
>>>> performance
>>>> as newstore was lagging filestore but quite a bit previously.  A lot of
>>>> work
>>>> has gone into implementing libaio behind the scenes and as a result
>>>> performance on spinning disks with SSD WAL (and SSD backed rocksdb) has
>>>> improved pretty dramatically. It's now often beating filestore:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> On the other hand, sequential writes are slower than random writes when
>>>> the
>>>> OSD, DB, and WAL are all on the same device be it a spinning disk or SSD.
>>>> In this situation newstore does better with random writes and sometimes
>>>> beats filestore (such as in the everything-on-spinning disk tests, and
>>>> when
>>>> IO sizes are small in the everything-on-ssd tests).
>>>>
>>>> Newstore is changing daily so keep in mind that these results are almost
>>>> assuredly going to change.  An interesting area of investigation will be
>>>> why
>>>> sequential writes are slower than random writes, and whether or not we
>>>> are
>>>> being limited by rocksdb ingest speed and how.
>>>>
>>>> I've also uploaded a quick perf call-graph I grabbed during the "all-SSD"
>>>> 32KB sequential write test to see if rocksdb was starving one of the
>>>> cores,
>>>> but found something that looks quite a bit different:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: ceph.conf.1osd --]
[-- Type: text/plain, Size: 4221 bytes --]

[global]
        osd pool default size = 1

        osd crush chooseleaf type = 0
        enable experimental unrecoverable data corrupting features = newstore rocksdb
        osd objectstore = newstore
#        newstore aio max queue depth = 4096 
#        newstore overlay max length = 8388608 
#        rocksdb wal dir = "/wal"
#        newstore db path = "/wal"
        newstore overlay max = 0
        newstore_wal_threads = 8
        rocksdb_write_buffer_size = 536870912
        rocksdb_write_buffer_num = 4
        rocksdb_min_write_buffer_number_to_merge = 2
        rocksdb_log = /home/nhm/tmp/cbt/ceph/log/rocksdb.log
        rocksdb_max_background_compactions = 4
        rocksdb_compaction_threads = 4
        rocksdb_level0_file_num_compaction_trigger = 4
        rocksdb_max_bytes_for_level_base = 104857600 //100MB
        rocksdb_target_file_size_base = 10485760      //10MB
        rocksdb_num_levels = 3
        rocksdb_compression = none

        keyring = /home/nhm/tmp/cbt/ceph/keyring
        osd pg bits = 8  
        osd pgp bits = 8
	auth supported = none
        log to syslog = false
        log file = /home/nhm/tmp/cbt/ceph/log/$name.log
        filestore xattr use omap = true
        auth cluster required = none
        auth service required = none
        auth client required = none

        public network = 192.168.10.0/24
        cluster network = 192.168.10.0/24
        rbd cache = true
        osd scrub load threshold = 0.01
        osd scrub min interval = 137438953472
        osd scrub max interval = 137438953472
        osd deep scrub interval = 137438953472
        osd max scrubs = 16

        filestore merge threshold = 40
        filestore split multiple = 8
        osd op threads = 8

        debug newstore = "0/0" 

        debug_lockdep = "0/0" 
        debug_context = "0/0"
        debug_crush = "0/0"
        debug_mds = "0/0"
        debug_mds_balancer = "0/0"
        debug_mds_locker = "0/0"
        debug_mds_log = "0/0"
        debug_mds_log_expire = "0/0"
        debug_mds_migrator = "0/0"
        debug_buffer = "0/0"
        debug_timer = "0/0"
        debug_filer = "0/0"
        debug_objecter = "0/0"
        debug_rados = "0/0"
        debug_rbd = "0/0"
        debug_journaler = "0/0"
        debug_objectcacher = "0/0"
        debug_client = "0/0"
        debug_osd = "0/0"
        debug_optracker = "0/0"
        debug_objclass = "0/0"
        debug_filestore = "0/0"
        debug_journal = "0/0"
        debug_ms = "0/0"
        debug_mon = "0/0"
        debug_monc = "0/0"
        debug_paxos = "0/0"
        debug_tp = "0/0"
        debug_auth = "0/0"
        debug_finisher = "0/0"
        debug_heartbeatmap = "0/0"
        debug_perfcounter = "0/0"
        debug_rgw = "0/0"
        debug_hadoop = "0/0"
        debug_asok = "0/0"
        debug_throttle = "0/0"

        mon pg warn max object skew = 100000
        mon pg warn min per osd = 0
        mon pg warn max per osd = 32768


#        debug optracker = 30
#        debug tp = 5
#        objecter infilght op bytes = 1073741824
#        objecter inflight ops = 8192
 
#        filestore wbthrottle enable = false
#        debug osd = 20

#        filestore wbthrottle xfs ios start flusher = 500
#        filestore wbthrottle xfs ios hard limit = 5000
#        filestore wbthrottle xfs inodes start flusher = 500
#        filestore wbthrottle xfs inodes hard limit = 5000
#        filestore wbthrottle xfs bytes start flusher = 41943040
#        filestore wbthrottle xfs bytes hard limit = 419430400

#        filestore wbthrottle btrfs ios start flusher = 500
#        filestore wbthrottle btrfs ios hard limit = 5000
#        filestore wbthrottle btrfs inodes start flusher = 500
#        filestore wbthrottle btrfs inodes hard limit = 5000
#        filestore wbthrottle btrfs bytes start flusher = 41943040
#        filestore wbthrottle btrfs bytes hard limit = 419430400

[mon]
	mon data = /home/nhm/tmp/cbt/ceph/mon.$id
        
[mon.a]
	host = burnupiX 
        mon addr = 127.0.0.1:6789

[osd.0]
	host = burnupiX
        osd data = /home/nhm/tmp/cbt/mnt/osd-device-0-data
        osd journal = /dev/disk/by-partlabel/osd-device-0-journal
#        osd journal = /dev/sds1


  parent reply	other threads:[~2015-04-29 13:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 23:25 newstore performance update Mark Nelson
2015-04-29  0:00 ` Venkateswara Rao Jujjuri
2015-04-29  0:07   ` Mark Nelson
2015-04-29  2:59     ` kernel neophyte
2015-04-29  4:31       ` Alexandre DERUMIER
2015-04-29 13:11         ` Mark Nelson
2015-04-29 13:08       ` Mark Nelson [this message]
2015-04-29 15:55         ` Chen, Xiaoxi
2015-04-29 19:06           ` Mark Nelson
2015-04-30  1:08             ` Chen, Xiaoxi
2015-04-29  0:00 ` Mark Nelson
2015-04-29  8:33 ` Chen, Xiaoxi
2015-04-29 13:20   ` Mark Nelson
2015-04-29 15:00     ` Chen, Xiaoxi
2015-04-29 16:38   ` Sage Weil
2015-04-30 13:21     ` Haomai Wang
2015-04-30 16:20       ` Sage Weil
2015-04-30 13:28     ` Mark Nelson
2015-04-30 14:02       ` Chen, Xiaoxi
2015-04-30 14:11         ` Mark Nelson
2015-04-30 18:09           ` Sage Weil
2015-05-01 14:48             ` Mark Nelson
2015-05-01 15:22               ` Chen, Xiaoxi
2015-05-02  0:33               ` Sage Weil
2015-05-04 17:50                 ` Mark Nelson
2015-05-04 18:08                   ` Sage Weil
2015-05-05 17:43                     ` Mark Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5540D7DA.2000503@redhat.com \
    --to=mnelson@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=neophyte.hacker001@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.