From: Mark Nelson <mnelson@redhat.com>
To: kernel neophyte <neophyte.hacker001@gmail.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: newstore performance update
Date: Wed, 29 Apr 2015 08:08:42 -0500 [thread overview]
Message-ID: <5540D7DA.2000503@redhat.com> (raw)
In-Reply-To: <CAFkUHxe=e+1Hk7ChtF+HHFkQ-RtBNBhzJ=VCu+x_aZF4MyMiRA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]
Hi,
ceph.conf file attached. It's a little ugly because I've been playing
with various parameters. You'll probably want to enable debug newstore
= 30 if you plan to do any debugging. Also, the code has been changing
quickly so performance may have changed if you haven't tested within the
last week.
Mark
On 04/28/2015 09:59 PM, kernel neophyte wrote:
> Hi Mark,
>
> I am trying to measure 4k RW performance on Newstore, and I am not
> anywhere close to the numbers you are getting!
>
> Could you share your ceph.conf for these test ?
>
> -Neo
>
> On Tue, Apr 28, 2015 at 5:07 PM, Mark Nelson <mnelson@redhat.com> wrote:
>> Nothing official, though roughly from memory:
>>
>> ~1.7GB/s and something crazy like 100K IOPS for the SSD.
>>
>> ~150MB/s and ~125-150 IOPS for the spinning disk.
>>
>> Mark
>>
>>
>> On 04/28/2015 07:00 PM, Venkateswara Rao Jujjuri wrote:
>>>
>>> Thanks for sharing; newstore numbers look lot better;
>>>
>>> Wondering if we have any base line numbers to put things into perspective.
>>> like what is it on XFS or on librados?
>>>
>>> JV
>>>
>>> On Tue, Apr 28, 2015 at 4:25 PM, Mark Nelson <mnelson@redhat.com> wrote:
>>>>
>>>> Hi Guys,
>>>>
>>>> Sage has been furiously working away at fixing bugs in newstore and
>>>> improving performance. Specifically we've been focused on write
>>>> performance
>>>> as newstore was lagging filestore but quite a bit previously. A lot of
>>>> work
>>>> has gone into implementing libaio behind the scenes and as a result
>>>> performance on spinning disks with SSD WAL (and SSD backed rocksdb) has
>>>> improved pretty dramatically. It's now often beating filestore:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> On the other hand, sequential writes are slower than random writes when
>>>> the
>>>> OSD, DB, and WAL are all on the same device be it a spinning disk or SSD.
>>>> In this situation newstore does better with random writes and sometimes
>>>> beats filestore (such as in the everything-on-spinning disk tests, and
>>>> when
>>>> IO sizes are small in the everything-on-ssd tests).
>>>>
>>>> Newstore is changing daily so keep in mind that these results are almost
>>>> assuredly going to change. An interesting area of investigation will be
>>>> why
>>>> sequential writes are slower than random writes, and whether or not we
>>>> are
>>>> being limited by rocksdb ingest speed and how.
>>>>
>>>> I've also uploaded a quick perf call-graph I grabbed during the "all-SSD"
>>>> 32KB sequential write test to see if rocksdb was starving one of the
>>>> cores,
>>>> but found something that looks quite a bit different:
>>>>
>>>> http://nhm.ceph.com/newstore/newstore-5d96fe6-no_overlay.pdf
>>>>
>>>> Mark
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: ceph.conf.1osd --]
[-- Type: text/plain, Size: 4221 bytes --]
[global]
osd pool default size = 1
osd crush chooseleaf type = 0
enable experimental unrecoverable data corrupting features = newstore rocksdb
osd objectstore = newstore
# newstore aio max queue depth = 4096
# newstore overlay max length = 8388608
# rocksdb wal dir = "/wal"
# newstore db path = "/wal"
newstore overlay max = 0
newstore_wal_threads = 8
rocksdb_write_buffer_size = 536870912
rocksdb_write_buffer_num = 4
rocksdb_min_write_buffer_number_to_merge = 2
rocksdb_log = /home/nhm/tmp/cbt/ceph/log/rocksdb.log
rocksdb_max_background_compactions = 4
rocksdb_compaction_threads = 4
rocksdb_level0_file_num_compaction_trigger = 4
rocksdb_max_bytes_for_level_base = 104857600 //100MB
rocksdb_target_file_size_base = 10485760 //10MB
rocksdb_num_levels = 3
rocksdb_compression = none
keyring = /home/nhm/tmp/cbt/ceph/keyring
osd pg bits = 8
osd pgp bits = 8
auth supported = none
log to syslog = false
log file = /home/nhm/tmp/cbt/ceph/log/$name.log
filestore xattr use omap = true
auth cluster required = none
auth service required = none
auth client required = none
public network = 192.168.10.0/24
cluster network = 192.168.10.0/24
rbd cache = true
osd scrub load threshold = 0.01
osd scrub min interval = 137438953472
osd scrub max interval = 137438953472
osd deep scrub interval = 137438953472
osd max scrubs = 16
filestore merge threshold = 40
filestore split multiple = 8
osd op threads = 8
debug newstore = "0/0"
debug_lockdep = "0/0"
debug_context = "0/0"
debug_crush = "0/0"
debug_mds = "0/0"
debug_mds_balancer = "0/0"
debug_mds_locker = "0/0"
debug_mds_log = "0/0"
debug_mds_log_expire = "0/0"
debug_mds_migrator = "0/0"
debug_buffer = "0/0"
debug_timer = "0/0"
debug_filer = "0/0"
debug_objecter = "0/0"
debug_rados = "0/0"
debug_rbd = "0/0"
debug_journaler = "0/0"
debug_objectcacher = "0/0"
debug_client = "0/0"
debug_osd = "0/0"
debug_optracker = "0/0"
debug_objclass = "0/0"
debug_filestore = "0/0"
debug_journal = "0/0"
debug_ms = "0/0"
debug_mon = "0/0"
debug_monc = "0/0"
debug_paxos = "0/0"
debug_tp = "0/0"
debug_auth = "0/0"
debug_finisher = "0/0"
debug_heartbeatmap = "0/0"
debug_perfcounter = "0/0"
debug_rgw = "0/0"
debug_hadoop = "0/0"
debug_asok = "0/0"
debug_throttle = "0/0"
mon pg warn max object skew = 100000
mon pg warn min per osd = 0
mon pg warn max per osd = 32768
# debug optracker = 30
# debug tp = 5
# objecter infilght op bytes = 1073741824
# objecter inflight ops = 8192
# filestore wbthrottle enable = false
# debug osd = 20
# filestore wbthrottle xfs ios start flusher = 500
# filestore wbthrottle xfs ios hard limit = 5000
# filestore wbthrottle xfs inodes start flusher = 500
# filestore wbthrottle xfs inodes hard limit = 5000
# filestore wbthrottle xfs bytes start flusher = 41943040
# filestore wbthrottle xfs bytes hard limit = 419430400
# filestore wbthrottle btrfs ios start flusher = 500
# filestore wbthrottle btrfs ios hard limit = 5000
# filestore wbthrottle btrfs inodes start flusher = 500
# filestore wbthrottle btrfs inodes hard limit = 5000
# filestore wbthrottle btrfs bytes start flusher = 41943040
# filestore wbthrottle btrfs bytes hard limit = 419430400
[mon]
mon data = /home/nhm/tmp/cbt/ceph/mon.$id
[mon.a]
host = burnupiX
mon addr = 127.0.0.1:6789
[osd.0]
host = burnupiX
osd data = /home/nhm/tmp/cbt/mnt/osd-device-0-data
osd journal = /dev/disk/by-partlabel/osd-device-0-journal
# osd journal = /dev/sds1
next prev parent reply other threads:[~2015-04-29 13:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-28 23:25 newstore performance update Mark Nelson
2015-04-29 0:00 ` Venkateswara Rao Jujjuri
2015-04-29 0:07 ` Mark Nelson
2015-04-29 2:59 ` kernel neophyte
2015-04-29 4:31 ` Alexandre DERUMIER
2015-04-29 13:11 ` Mark Nelson
2015-04-29 13:08 ` Mark Nelson [this message]
2015-04-29 15:55 ` Chen, Xiaoxi
2015-04-29 19:06 ` Mark Nelson
2015-04-30 1:08 ` Chen, Xiaoxi
2015-04-29 0:00 ` Mark Nelson
2015-04-29 8:33 ` Chen, Xiaoxi
2015-04-29 13:20 ` Mark Nelson
2015-04-29 15:00 ` Chen, Xiaoxi
2015-04-29 16:38 ` Sage Weil
2015-04-30 13:21 ` Haomai Wang
2015-04-30 16:20 ` Sage Weil
2015-04-30 13:28 ` Mark Nelson
2015-04-30 14:02 ` Chen, Xiaoxi
2015-04-30 14:11 ` Mark Nelson
2015-04-30 18:09 ` Sage Weil
2015-05-01 14:48 ` Mark Nelson
2015-05-01 15:22 ` Chen, Xiaoxi
2015-05-02 0:33 ` Sage Weil
2015-05-04 17:50 ` Mark Nelson
2015-05-04 18:08 ` Sage Weil
2015-05-05 17:43 ` Mark Nelson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5540D7DA.2000503@redhat.com \
--to=mnelson@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=neophyte.hacker001@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox