All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: Haomai Wang <haomaiwang@gmail.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [Share]Performance tunning on Ceph FileStore with SSD backend
Date: Wed, 09 Apr 2014 07:07:53 -0500	[thread overview]
Message-ID: <53453819.1030100@inktank.com> (raw)
In-Reply-To: <CACJqLyb+D5n74OyjP4sRG2=G083GQ3BzEUt4i8iAdw6nEBfRJg@mail.gmail.com>

On 04/09/2014 05:05 AM, Haomai Wang wrote:
> Hi all,
>

Hi Haomai!

> I would like to share some ideas about how to improve performance on
> ceph with SSD. Not much preciseness.

Aha, that's ok, but I'm going to pester you with lots of questions below. ;)

>
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
> ceph version is 0.67.5(Dumping)
>
> At first, we find three bottleneck on filestore:
> 1. fdcache_lock(changed in Firely release)
> 2. lfn_find in omap_* methods
> 3. DBObjectMap header
>
> According to my understanding and the docs in
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
> sure the correctness of this change, but it works well still now.

Yes, but I think it's interesting even if it's not safe!  Did you happen 
to test these things in isolation to see how much of a bottleneck each is?

>
> DBObjectMap header patch is on the pull request queue and may be
> merged in the next feature merge window.
>
> With things above done, we get much performance improvement in disk
> util and benchmark results(3x-4x).

That's a pretty dramatic result!  What kind of tests did you perform 
where you observed the 3-4x difference?  Did you measure latency and 
iops/throughput?

>
> Next, we find fdcache size become the main bottleneck. For example, if
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
> cache miss is expensive and can't be afford. The implementation of
> FDCache isn't O(1). So we only can get high performance on fdcache hit
> range(maybe 100GB with 10240 fdcache size) and more data exceed the
> size of fdcaceh will be disaster. If you want to cache more fd(102400
> fdcache size), the implementation of FDCache will bring on extra CPU
> cost(can't be ignore) for each op.
>
> Because of the capacity of SSD(several hundreds GB), we try to
> increase the size of rbd object(16MB) so less fd cache is needed. As
> for FDCache implementation, we simply discard SimpleLRU but introduce
> RandomCache. Now we can set much larger fdcache size(near cache all
> fd) with little overload.
>
> With these, we achieve 3x-4x performance improvements on filestore with SSD.

I'm curious how much of an effect changing the RBD object size had 
before and after you applied the new FDCache implementation?

>
> Maybe it exists something I missed or something wrong, hope can
> correct me. I hope it can help to improve FileStore on SSD and push
> into master branch.
>

  reply	other threads:[~2014-04-09 12:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
2014-04-09 12:07 ` Mark Nelson [this message]
2014-04-09 12:08 ` Alexandre DERUMIER
2014-04-09 14:10   ` Sebastien Han
2014-04-09 14:15 ` Gregory Farnum
2014-04-11  6:04   ` Alexandre DERUMIER
2014-04-11  8:41     ` Haomai Wang
2014-05-26 20:29 ` Stefan Priebe
2014-05-27  4:42   ` Haomai Wang
2014-05-27  6:05     ` Stefan Priebe - Profihost AG
2014-05-27  6:37       ` Haomai Wang
2014-05-27  6:45         ` Stefan Priebe - Profihost AG
2014-05-27 10:05           ` Haomai Wang
2014-05-27 16:32             ` Milosz Tanski
2014-05-27  4:46   ` Haomai Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53453819.1030100@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=haomaiwang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.