From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark Nelson <mark.nelson@inktank.com>
Subject: Re: [Share]Performance tunning on Ceph FileStore with SSD backend
Date: Wed, 09 Apr 2014 07:07:53 -0500
Message-ID: <53453819.1030100@inktank.com>
References: <CACJqLyb+D5n74OyjP4sRG2=G083GQ3BzEUt4i8iAdw6nEBfRJg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <ceph-devel-owner@vger.kernel.org>
Received: from mail-ie0-f181.google.com ([209.85.223.181]:35816 "EHLO
	mail-ie0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932560AbaDIMHv (ORCPT
	<rfc822;ceph-devel@vger.kernel.org>); Wed, 9 Apr 2014 08:07:51 -0400
Received: by mail-ie0-f181.google.com with SMTP id tp5so2182674ieb.26
        for <ceph-devel@vger.kernel.org>; Wed, 09 Apr 2014 05:07:50 -0700 (PDT)
In-Reply-To: <CACJqLyb+D5n74OyjP4sRG2=G083GQ3BzEUt4i8iAdw6nEBfRJg@mail.gmail.com>
Sender: ceph-devel-owner@vger.kernel.org
List-ID: <ceph-devel.vger.kernel.org>
To: Haomai Wang <haomaiwang@gmail.com>, "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>

On 04/09/2014 05:05 AM, Haomai Wang wrote:
> Hi all,
>

Hi Haomai!

> I would like to share some ideas about how to improve performance on
> ceph with SSD. Not much preciseness.

Aha, that's ok, but I'm going to pester you with lots of questions below. ;)

>
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
> ceph version is 0.67.5(Dumping)
>
> At first, we find three bottleneck on filestore:
> 1. fdcache_lock(changed in Firely release)
> 2. lfn_find in omap_* methods
> 3. DBObjectMap header
>
> According to my understanding and the docs in
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
> sure the correctness of this change, but it works well still now.

Yes, but I think it's interesting even if it's not safe!  Did you happen 
to test these things in isolation to see how much of a bottleneck each is?

>
> DBObjectMap header patch is on the pull request queue and may be
> merged in the next feature merge window.
>
> With things above done, we get much performance improvement in disk
> util and benchmark results(3x-4x).

That's a pretty dramatic result!  What kind of tests did you perform 
where you observed the 3-4x difference?  Did you measure latency and 
iops/throughput?

>
> Next, we find fdcache size become the main bottleneck. For example, if
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
> cache miss is expensive and can't be afford. The implementation of
> FDCache isn't O(1). So we only can get high performance on fdcache hit
> range(maybe 100GB with 10240 fdcache size) and more data exceed the
> size of fdcaceh will be disaster. If you want to cache more fd(102400
> fdcache size), the implementation of FDCache will bring on extra CPU
> cost(can't be ignore) for each op.
>
> Because of the capacity of SSD(several hundreds GB), we try to
> increase the size of rbd object(16MB) so less fd cache is needed. As
> for FDCache implementation, we simply discard SimpleLRU but introduce
> RandomCache. Now we can set much larger fdcache size(near cache all
> fd) with little overload.
>
> With these, we achieve 3x-4x performance improvements on filestore with SSD.

I'm curious how much of an effect changing the RBD object size had 
before and after you applied the new FDCache implementation?

>
> Maybe it exists something I missed or something wrong, hope can
> correct me. I hope it can help to improve FileStore on SSD and push
> into master branch.
>