From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: Initial newstore vs filestore results Date: Wed, 08 Apr 2015 09:38:46 -0500 Message-ID: <55253D76.1030409@redhat.com> References: <5523F069.3000400@redhat.com> <55242D15.8080800@redhat.com> <55248856.1010808@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:46262 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932180AbbDHOit (ORCPT ); Wed, 8 Apr 2015 10:38:49 -0400 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Sage Weil Cc: ceph-devel On 04/07/2015 09:58 PM, Sage Weil wrote: > On Tue, 7 Apr 2015, Mark Nelson wrote: >> On 04/07/2015 02:16 PM, Mark Nelson wrote: >>> On 04/07/2015 09:57 AM, Mark Nelson wrote: >>>> Hi Guys, >>>> >>>> I ran some quick tests on Sage's newstore branch. So far given that >>>> this is a prototype, things are looking pretty good imho. The 4MB >>>> object rados bench read/write and small read performance looks >>>> especially good. Keep in mind that this is not using the SSD journals >>>> in any way, so 640MB/s sequential writes is actually really good >>>> compared to filestore without SSD journals. >>>> >>>> small write performance appears to be fairly bad, especially in the RBD >>>> case where it's small writes to larger objects. I'm going to sit down >>>> and see if I can figure out what's going on. It's bad enough that I >>>> suspect there's just something odd going on. >>>> >>>> Mark >>> >>> Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those >>> interested: >>> >>> http://nhm.ceph.com/newstore/ >>> >>> Interestingly small object write/read performance with 4 OSDs was about >>> 1/3-1/4 the speed of the same cluster with 36 OSDs. >>> >>> Note: Thanks Dan for fixing the directory column width! >>> >>> Mark >> >> New fio/librbd results using Sage's latest code that attempts to keep small >> overwrite extents in the db. This is 4 OSD so not directly comparable to the >> 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: >> >> write read randw randr >> 4MB 57.9 319.6 55.2 285.9 >> 128KB 2.5 230.6 2.4 125.4 >> 4KB 0.46 55.65 1.11 3.56 > > What would be very interesting would be to see the 4KB performance > with the defaults (newstore overlay max = 32) vs overlays disabled > (newstore overlay max = 0) and see if/how much it is helping. > > The latest branch also has open-by-handle. It's on by default (newstore > open by handle = true). I think for most workloads it won't be very > noticeable... I think there are two questions we need to answer though: > > 1) Does it have any impact on a creation workload (say, 4kb objects). It > shouldn't, but we should confirm. 4KB objects via rados bench ok? > > 2) Does it impact small object random reads with a cold cache. I think to > see the effect we'll probably need to pile a ton of objects into the > store, drop caches, and then do random reads. In the best case the > effect will be small, but hopefully noticeable: we should go from > a directory lookup (1+ seeks) + inode lookup (1+ seek) + data > read, to inode lookup (1+ seek) + data read. So, 3 -> 2 seeks best case? > I'm not really sure what XFS is doing under the covers here... So the above test process for RBD was basically: 1) create a configurable sized RBD volume (16GB in this case across 4 OSDs). 2) fill volume with 4MB writes to preallocate the blocks 3) repeat for each test: 3a) drop cache and sync 3b) Run the test > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >