From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: newstore performance update Date: Tue, 05 May 2015 12:43:11 -0500 Message-ID: <5549012F.3040407@redhat.com> References: <554016E2.3000104@redhat.com> <6F3FA899187F0043BA1827A69DA2F7CC021E4894@shsmsx102.ccr.corp.intel.com> , <55422E0A.6010204@redhat.com> <554237F8.5070907@redhat.com> <5543923E.1020607@redhat.com> <5547B156.8060508@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:37532 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755623AbbEERnR (ORCPT ); Tue, 5 May 2015 13:43:17 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: "Chen, Xiaoxi" , "ceph-devel@vger.kernel.org" On 05/04/2015 01:08 PM, Sage Weil wrote: > On Mon, 4 May 2015, Mark Nelson wrote: >> On 05/01/2015 07:33 PM, Sage Weil wrote: >> >> Ran through a bunch of tests on 0c728ccc over the weekend: >> >> http://nhm.ceph.com/newstore/5d96fe6f_vs_0c728ccc.pdf >> >> The good news is that sequential writes on spinning disks are looking >> significantly better! We went from 40x slower than filestore for small >> sequential IO to only about 30-40% slower and we become faster than filestore >> at 64kb+ IO sizes. >> >> 128kb-2MB sequential writes with data on spinning disk and rocksdb on SSD >> regressed. Newstore is no longer really any faster than filestore for those >> IO sizes. We saw something similar for random IO, where spinning disk only >> results improved and spinning disk + rocksdb on SSD regressed. >> >> With everything on SSD, we saw small sequential writes improve and nearly all >> random writes regress. Not sure how much these regressions are due to >> 0c728ccc vs other commits yet. > > That's surprising! I pushed a commit that makes this tunable, > > newstore sync submit transaction = false (default) > > Can you see if setting that to true (effectively reverting my last change) > fixes the ssd regression? > > It may also be that this is a simple locking issue that we can fix in > rocksdb. Again, the behavior I saw was that the db->submit_transaction() > call would block until the sync commit (from kv_sync_thread) finished. > I would expect rocksdb to be more careful about that, so maybe there is > something else funny/subtle going on. > > sage > Ok, ran through new SSD tests and wasn't able to replicate the poor random performance from 0c728ccc again. http://nhm.ceph.com/newstore/sync_submit_transaction.pdf Haven't dug into the blktrace or collectl data yet to see if there are any interesting differences, but I'll try to look at that later if I get a bit of free time. The good news is that sync submit transaction = false seems to make a pretty noticeable improvement with 8c8c5903 on an SSD backed newstore OSD. At small IO sizes we appear to be doing better than filestore for both random and sequential IO. Interestingly random writes still appear to be faster than sequential writes when everything is on SSD! It looks like the big remaining issue now is 64kb+ sized writes on SSD. Mark