From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: [Share]Performance tunning on Ceph FileStore with SSD backend Date: Wed, 09 Apr 2014 07:07:53 -0500 Message-ID: <53453819.1030100@inktank.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ie0-f181.google.com ([209.85.223.181]:35816 "EHLO mail-ie0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932560AbaDIMHv (ORCPT ); Wed, 9 Apr 2014 08:07:51 -0400 Received: by mail-ie0-f181.google.com with SMTP id tp5so2182674ieb.26 for ; Wed, 09 Apr 2014 05:07:50 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Haomai Wang , "ceph-devel@vger.kernel.org" On 04/09/2014 05:05 AM, Haomai Wang wrote: > Hi all, > Hi Haomai! > I would like to share some ideas about how to improve performance on > ceph with SSD. Not much preciseness. Aha, that's ok, but I'm going to pester you with lots of questions below. ;) > > Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD). > ceph version is 0.67.5(Dumping) > > At first, we find three bottleneck on filestore: > 1. fdcache_lock(changed in Firely release) > 2. lfn_find in omap_* methods > 3. DBObjectMap header > > According to my understanding and the docs in > ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h), > I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully > sure the correctness of this change, but it works well still now. Yes, but I think it's interesting even if it's not safe! Did you happen to test these things in isolation to see how much of a bottleneck each is? > > DBObjectMap header patch is on the pull request queue and may be > merged in the next feature merge window. > > With things above done, we get much performance improvement in disk > util and benchmark results(3x-4x). That's a pretty dramatic result! What kind of tests did you perform where you observed the 3-4x difference? Did you measure latency and iops/throughput? > > Next, we find fdcache size become the main bottleneck. For example, if > hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot > data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With > increase "filestore_fd_cache_size", the cost of lookup(FDCache) and > cache miss is expensive and can't be afford. The implementation of > FDCache isn't O(1). So we only can get high performance on fdcache hit > range(maybe 100GB with 10240 fdcache size) and more data exceed the > size of fdcaceh will be disaster. If you want to cache more fd(102400 > fdcache size), the implementation of FDCache will bring on extra CPU > cost(can't be ignore) for each op. > > Because of the capacity of SSD(several hundreds GB), we try to > increase the size of rbd object(16MB) so less fd cache is needed. As > for FDCache implementation, we simply discard SimpleLRU but introduce > RandomCache. Now we can set much larger fdcache size(near cache all > fd) with little overload. > > With these, we achieve 3x-4x performance improvements on filestore with SSD. I'm curious how much of an effect changing the RBD object size had before and after you applied the new FDCache implementation? > > Maybe it exists something I missed or something wrong, hope can > correct me. I hope it can help to improve FileStore on SSD and push > into master branch. >