[Share]Performance tunning on Ceph FileStore with SSD backend

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Share]Performance tunning on Ceph FileStore with SSD backend
@ 2014-04-09 10:05 Haomai Wang
  2014-04-09 12:07 ` Mark Nelson
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Haomai Wang @ 2014-04-09 10:05 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hi all,

I would like to share some ideas about how to improve performance on
ceph with SSD. Not much preciseness.

Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
ceph version is 0.67.5(Dumping)

At first, we find three bottleneck on filestore:
1. fdcache_lock(changed in Firely release)
2. lfn_find in omap_* methods
3. DBObjectMap header

According to my understanding and the docs in
ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
sure the correctness of this change, but it works well still now.

DBObjectMap header patch is on the pull request queue and may be
merged in the next feature merge window.

With things above done, we get much performance improvement in disk
util and benchmark results(3x-4x).

Next, we find fdcache size become the main bottleneck. For example, if
hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
cache miss is expensive and can't be afford. The implementation of
FDCache isn't O(1). So we only can get high performance on fdcache hit
range(maybe 100GB with 10240 fdcache size) and more data exceed the
size of fdcaceh will be disaster. If you want to cache more fd(102400
fdcache size), the implementation of FDCache will bring on extra CPU
cost(can't be ignore) for each op.

Because of the capacity of SSD(several hundreds GB), we try to
increase the size of rbd object(16MB) so less fd cache is needed. As
for FDCache implementation, we simply discard SimpleLRU but introduce
RandomCache. Now we can set much larger fdcache size(near cache all
fd) with little overload.

With these, we achieve 3x-4x performance improvements on filestore with SSD.

Maybe it exists something I missed or something wrong, hope can
correct me. I hope it can help to improve FileStore on SSD and push
into master branch.

-- 

Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
@ 2014-04-09 12:07 ` Mark Nelson
  2014-04-09 12:08 ` Alexandre DERUMIER
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Mark Nelson @ 2014-04-09 12:07 UTC (permalink / raw)
  To: Haomai Wang, ceph-devel@vger.kernel.org

On 04/09/2014 05:05 AM, Haomai Wang wrote:
> Hi all,
>

Hi Haomai!

> I would like to share some ideas about how to improve performance on
> ceph with SSD. Not much preciseness.

Aha, that's ok, but I'm going to pester you with lots of questions below. ;)

>
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
> ceph version is 0.67.5(Dumping)
>
> At first, we find three bottleneck on filestore:
> 1. fdcache_lock(changed in Firely release)
> 2. lfn_find in omap_* methods
> 3. DBObjectMap header
>
> According to my understanding and the docs in
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
> sure the correctness of this change, but it works well still now.

Yes, but I think it's interesting even if it's not safe!  Did you happen 
to test these things in isolation to see how much of a bottleneck each is?

>
> DBObjectMap header patch is on the pull request queue and may be
> merged in the next feature merge window.
>
> With things above done, we get much performance improvement in disk
> util and benchmark results(3x-4x).

That's a pretty dramatic result!  What kind of tests did you perform 
where you observed the 3-4x difference?  Did you measure latency and 
iops/throughput?

>
> Next, we find fdcache size become the main bottleneck. For example, if
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
> cache miss is expensive and can't be afford. The implementation of
> FDCache isn't O(1). So we only can get high performance on fdcache hit
> range(maybe 100GB with 10240 fdcache size) and more data exceed the
> size of fdcaceh will be disaster. If you want to cache more fd(102400
> fdcache size), the implementation of FDCache will bring on extra CPU
> cost(can't be ignore) for each op.
>
> Because of the capacity of SSD(several hundreds GB), we try to
> increase the size of rbd object(16MB) so less fd cache is needed. As
> for FDCache implementation, we simply discard SimpleLRU but introduce
> RandomCache. Now we can set much larger fdcache size(near cache all
> fd) with little overload.
>
> With these, we achieve 3x-4x performance improvements on filestore with SSD.

I'm curious how much of an effect changing the RBD object size had 
before and after you applied the new FDCache implementation?

>
> Maybe it exists something I missed or something wrong, hope can
> correct me. I hope it can help to improve FileStore on SSD and push
> into master branch.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
  2014-04-09 12:07 ` Mark Nelson
@ 2014-04-09 12:08 ` Alexandre DERUMIER
  2014-04-09 14:10   ` Sebastien Han
  2014-04-09 14:15 ` Gregory Farnum
  2014-05-26 20:29 ` Stefan Priebe
  3 siblings, 1 reply; 15+ messages in thread
From: Alexandre DERUMIER @ 2014-04-09 12:08 UTC (permalink / raw)
  To: Haomai Wang; +Cc: ceph-devel

Hi,

thanks for sharing !
(I'm looking to build a full ssd cluster too with 1TB ssd)

>>With these, we achieve 3x-4x performance improvements on filestore with SSD. 

Do you have some iops values benchmark, before and after ?

----- Mail original ----- 

De: "Haomai Wang" <haomaiwang@gmail.com> 
À: ceph-devel@vger.kernel.org 
Envoyé: Mercredi 9 Avril 2014 12:05:19 
Objet: [Share]Performance tunning on Ceph FileStore with SSD backend 

Hi all, 

I would like to share some ideas about how to improve performance on 
ceph with SSD. Not much preciseness. 

Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD). 
ceph version is 0.67.5(Dumping) 

At first, we find three bottleneck on filestore: 
1. fdcache_lock(changed in Firely release) 
2. lfn_find in omap_* methods 
3. DBObjectMap header 

According to my understanding and the docs in 
ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h), 
I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully 
sure the correctness of this change, but it works well still now. 

DBObjectMap header patch is on the pull request queue and may be 
merged in the next feature merge window. 

With things above done, we get much performance improvement in disk 
util and benchmark results(3x-4x). 

Next, we find fdcache size become the main bottleneck. For example, if 
hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot 
data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With 
increase "filestore_fd_cache_size", the cost of lookup(FDCache) and 
cache miss is expensive and can't be afford. The implementation of 
FDCache isn't O(1). So we only can get high performance on fdcache hit 
range(maybe 100GB with 10240 fdcache size) and more data exceed the 
size of fdcaceh will be disaster. If you want to cache more fd(102400 
fdcache size), the implementation of FDCache will bring on extra CPU 
cost(can't be ignore) for each op. 

Because of the capacity of SSD(several hundreds GB), we try to 
increase the size of rbd object(16MB) so less fd cache is needed. As 
for FDCache implementation, we simply discard SimpleLRU but introduce 
RandomCache. Now we can set much larger fdcache size(near cache all 
fd) with little overload. 

With these, we achieve 3x-4x performance improvements on filestore with SSD. 

Maybe it exists something I missed or something wrong, hope can 
correct me. I hope it can help to improve FileStore on SSD and push 
into master branch. 

-- 

Best Regards, 

Wheat 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 12:08 ` Alexandre DERUMIER
@ 2014-04-09 14:10   ` Sebastien Han
  0 siblings, 0 replies; 15+ messages in thread
From: Sebastien Han @ 2014-04-09 14:10 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Haomai Wang, ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]

Hey Haomai,

Nice work, by an chance do you have a branch that contains all the changes you’ve made?
So that people can try themselves :)

I look forward to reading more results :)

Thanks!

Cheers.
–––– 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien.han@enovance.com 
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance 

On 09 Apr 2014, at 14:08, Alexandre DERUMIER <aderumier@odiso.com> wrote:

> Hi,
> 
> thanks for sharing !
> (I'm looking to build a full ssd cluster too with 1TB ssd)
> 
>>> With these, we achieve 3x-4x performance improvements on filestore with SSD. 
> 
> Do you have some iops values benchmark, before and after ?
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Haomai Wang" <haomaiwang@gmail.com> 
> À: ceph-devel@vger.kernel.org 
> Envoyé: Mercredi 9 Avril 2014 12:05:19 
> Objet: [Share]Performance tunning on Ceph FileStore with SSD backend 
> 
> Hi all, 
> 
> I would like to share some ideas about how to improve performance on 
> ceph with SSD. Not much preciseness. 
> 
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD). 
> ceph version is 0.67.5(Dumping) 
> 
> At first, we find three bottleneck on filestore: 
> 1. fdcache_lock(changed in Firely release) 
> 2. lfn_find in omap_* methods 
> 3. DBObjectMap header 
> 
> According to my understanding and the docs in 
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h), 
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully 
> sure the correctness of this change, but it works well still now. 
> 
> DBObjectMap header patch is on the pull request queue and may be 
> merged in the next feature merge window. 
> 
> With things above done, we get much performance improvement in disk 
> util and benchmark results(3x-4x). 
> 
> Next, we find fdcache size become the main bottleneck. For example, if 
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot 
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With 
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and 
> cache miss is expensive and can't be afford. The implementation of 
> FDCache isn't O(1). So we only can get high performance on fdcache hit 
> range(maybe 100GB with 10240 fdcache size) and more data exceed the 
> size of fdcaceh will be disaster. If you want to cache more fd(102400 
> fdcache size), the implementation of FDCache will bring on extra CPU 
> cost(can't be ignore) for each op. 
> 
> Because of the capacity of SSD(several hundreds GB), we try to 
> increase the size of rbd object(16MB) so less fd cache is needed. As 
> for FDCache implementation, we simply discard SimpleLRU but introduce 
> RandomCache. Now we can set much larger fdcache size(near cache all 
> fd) with little overload. 
> 
> With these, we achieve 3x-4x performance improvements on filestore with SSD. 
> 
> Maybe it exists something I missed or something wrong, hope can 
> correct me. I hope it can help to improve FileStore on SSD and push 
> into master branch. 
> 
> -- 
> 
> Best Regards, 
> 
> Wheat 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@vger.kernel.org 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
  2014-04-09 12:07 ` Mark Nelson
  2014-04-09 12:08 ` Alexandre DERUMIER
@ 2014-04-09 14:15 ` Gregory Farnum
  2014-04-11  6:04   ` Alexandre DERUMIER
  2014-05-26 20:29 ` Stefan Priebe
  3 siblings, 1 reply; 15+ messages in thread
From: Gregory Farnum @ 2014-04-09 14:15 UTC (permalink / raw)
  To: Haomai Wang; +Cc: ceph-devel@vger.kernel.org

On Wed, Apr 9, 2014 at 3:05 AM, Haomai Wang <haomaiwang@gmail.com> wrote:
> Hi all,
>
> I would like to share some ideas about how to improve performance on
> ceph with SSD. Not much preciseness.
>
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
> ceph version is 0.67.5(Dumping)
>
> At first, we find three bottleneck on filestore:
> 1. fdcache_lock(changed in Firely release)
> 2. lfn_find in omap_* methods
> 3. DBObjectMap header
>
> According to my understanding and the docs in
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
> sure the correctness of this change, but it works well still now.

"Simply remove"? I don't remember all the details, but I'm sure
there's more to it than that if you want things to behave.

> DBObjectMap header patch is on the pull request queue and may be
> merged in the next feature merge window.
>
> With things above done, we get much performance improvement in disk
> util and benchmark results(3x-4x).
>
> Next, we find fdcache size become the main bottleneck. For example, if
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
> cache miss is expensive and can't be afford. The implementation of
> FDCache isn't O(1). So we only can get high performance on fdcache hit
> range(maybe 100GB with 10240 fdcache size) and more data exceed the
> size of fdcaceh will be disaster. If you want to cache more fd(102400
> fdcache size), the implementation of FDCache will bring on extra CPU
> cost(can't be ignore) for each op.

From explorations we and others have done, I think what we really want
to do here is make it cheaper to lookup and open files. The FileStore
is very much not optimized for this; a single lookup involves
constructing the path from its components multiple times and I think
even does the lookups more than once.
Also, 250k or even 25k file descriptors is an awful lot to demand. ;)

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 14:15 ` Gregory Farnum
@ 2014-04-11  6:04   ` Alexandre DERUMIER
  2014-04-11  8:41     ` Haomai Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Alexandre DERUMIER @ 2014-04-11  6:04 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, Haomai Wang

>>From explorations we and others have done, I think what we really want
>>to do here is make it cheaper to lookup and open files. The FileStore
>>is very much not optimized for this; a single lookup involves
>>constructing the path from its components multiple times and I think
>>even does the lookups more than once.
>>Also, 250k or even 25k file descriptors is an awful lot to demand. ;)

Could the new coming leveldb backend store help for this specific case ?



----- Mail original ----- 

De: "Gregory Farnum" <greg@inktank.com> 
À: "Haomai Wang" <haomaiwang@gmail.com> 
Cc: ceph-devel@vger.kernel.org 
Envoyé: Mercredi 9 Avril 2014 16:15:14 
Objet: Re: [Share]Performance tunning on Ceph FileStore with SSD backend 

On Wed, Apr 9, 2014 at 3:05 AM, Haomai Wang <haomaiwang@gmail.com> wrote: 
> Hi all, 
> 
> I would like to share some ideas about how to improve performance on 
> ceph with SSD. Not much preciseness. 
> 
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD). 
> ceph version is 0.67.5(Dumping) 
> 
> At first, we find three bottleneck on filestore: 
> 1. fdcache_lock(changed in Firely release) 
> 2. lfn_find in omap_* methods 
> 3. DBObjectMap header 
> 
> According to my understanding and the docs in 
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h), 
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully 
> sure the correctness of this change, but it works well still now. 

"Simply remove"? I don't remember all the details, but I'm sure 
there's more to it than that if you want things to behave. 

> DBObjectMap header patch is on the pull request queue and may be 
> merged in the next feature merge window. 
> 
> With things above done, we get much performance improvement in disk 
> util and benchmark results(3x-4x). 
> 
> Next, we find fdcache size become the main bottleneck. For example, if 
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot 
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With 
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and 
> cache miss is expensive and can't be afford. The implementation of 
> FDCache isn't O(1). So we only can get high performance on fdcache hit 
> range(maybe 100GB with 10240 fdcache size) and more data exceed the 
> size of fdcaceh will be disaster. If you want to cache more fd(102400 
> fdcache size), the implementation of FDCache will bring on extra CPU 
> cost(can't be ignore) for each op. 

From explorations we and others have done, I think what we really want 
to do here is make it cheaper to lookup and open files. The FileStore 
is very much not optimized for this; a single lookup involves 
constructing the path from its components multiple times and I think 
even does the lookups more than once. 
Also, 250k or even 25k file descriptors is an awful lot to demand. ;) 

-Greg 
Software Engineer #42 @ http://inktank.com | http://ceph.com 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-11  6:04   ` Alexandre DERUMIER
@ 2014-04-11  8:41     ` Haomai Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Haomai Wang @ 2014-04-11  8:41 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Gregory Farnum, ceph-devel@vger.kernel.org

Not fully, a object header also needed in KeyValueStore, but it's more
lightweight.

https://github.com/ceph/ceph/pull/1649

On Fri, Apr 11, 2014 at 2:04 PM, Alexandre DERUMIER <aderumier@odiso.com> wrote:
>>>From explorations we and others have done, I think what we really want
>>>to do here is make it cheaper to lookup and open files. The FileStore
>>>is very much not optimized for this; a single lookup involves
>>>constructing the path from its components multiple times and I think
>>>even does the lookups more than once.
>>>Also, 250k or even 25k file descriptors is an awful lot to demand. ;)
>
> Could the new coming leveldb backend store help for this specific case ?
>
>
>
> ----- Mail original -----
>
> De: "Gregory Farnum" <greg@inktank.com>
> À: "Haomai Wang" <haomaiwang@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Envoyé: Mercredi 9 Avril 2014 16:15:14
> Objet: Re: [Share]Performance tunning on Ceph FileStore with SSD backend
>
> On Wed, Apr 9, 2014 at 3:05 AM, Haomai Wang <haomaiwang@gmail.com> wrote:
>> Hi all,
>>
>> I would like to share some ideas about how to improve performance on
>> ceph with SSD. Not much preciseness.
>>
>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>> ceph version is 0.67.5(Dumping)
>>
>> At first, we find three bottleneck on filestore:
>> 1. fdcache_lock(changed in Firely release)
>> 2. lfn_find in omap_* methods
>> 3. DBObjectMap header
>>
>> According to my understanding and the docs in
>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>> sure the correctness of this change, but it works well still now.
>
> "Simply remove"? I don't remember all the details, but I'm sure
> there's more to it than that if you want things to behave.
>
>> DBObjectMap header patch is on the pull request queue and may be
>> merged in the next feature merge window.
>>
>> With things above done, we get much performance improvement in disk
>> util and benchmark results(3x-4x).
>>
>> Next, we find fdcache size become the main bottleneck. For example, if
>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>> cache miss is expensive and can't be afford. The implementation of
>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>> fdcache size), the implementation of FDCache will bring on extra CPU
>> cost(can't be ignore) for each op.
>
> From explorations we and others have done, I think what we really want
> to do here is make it cheaper to lookup and open files. The FileStore
> is very much not optimized for this; a single lookup involves
> constructing the path from its components multiple times and I think
> even does the lookups more than once.
> Also, 250k or even 25k file descriptors is an awful lot to demand. ;)
>
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
                   ` (2 preceding siblings ...)
  2014-04-09 14:15 ` Gregory Farnum
@ 2014-05-26 20:29 ` Stefan Priebe
  2014-05-27  4:42   ` Haomai Wang
  2014-05-27  4:46   ` Haomai Wang
  3 siblings, 2 replies; 15+ messages in thread
From: Stefan Priebe @ 2014-05-26 20:29 UTC (permalink / raw)
  To: Haomai Wang, ceph-devel@vger.kernel.org

Hi Haomai,

regarding the FDCache problems you're seeing. Isn't this branch 
interesting for you? Have you ever tested it?

http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html

Greets,
Stefan

Am 09.04.2014 12:05, schrieb Haomai Wang:
> Hi all,
>
> I would like to share some ideas about how to improve performance on
> ceph with SSD. Not much preciseness.
>
> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
> ceph version is 0.67.5(Dumping)
>
> At first, we find three bottleneck on filestore:
> 1. fdcache_lock(changed in Firely release)
> 2. lfn_find in omap_* methods
> 3. DBObjectMap header
>
> According to my understanding and the docs in
> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
> sure the correctness of this change, but it works well still now.
>
> DBObjectMap header patch is on the pull request queue and may be
> merged in the next feature merge window.
>
> With things above done, we get much performance improvement in disk
> util and benchmark results(3x-4x).
>
> Next, we find fdcache size become the main bottleneck. For example, if
> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
> cache miss is expensive and can't be afford. The implementation of
> FDCache isn't O(1). So we only can get high performance on fdcache hit
> range(maybe 100GB with 10240 fdcache size) and more data exceed the
> size of fdcaceh will be disaster. If you want to cache more fd(102400
> fdcache size), the implementation of FDCache will bring on extra CPU
> cost(can't be ignore) for each op.
>
> Because of the capacity of SSD(several hundreds GB), we try to
> increase the size of rbd object(16MB) so less fd cache is needed. As
> for FDCache implementation, we simply discard SimpleLRU but introduce
> RandomCache. Now we can set much larger fdcache size(near cache all
> fd) with little overload.
>
> With these, we achieve 3x-4x performance improvements on filestore with SSD.
>
> Maybe it exists something I missed or something wrong, hope can
> correct me. I hope it can help to improve FileStore on SSD and push
> into master branch.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-26 20:29 ` Stefan Priebe
@ 2014-05-27  4:42   ` Haomai Wang
  2014-05-27  6:05     ` Stefan Priebe - Profihost AG
  2014-05-27  4:46   ` Haomai Wang
  1 sibling, 1 reply; 15+ messages in thread
From: Haomai Wang @ 2014-05-27  4:42 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org

On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
> Hi Haomai,
>
> regarding the FDCache problems you're seeing. Isn't this branch interesting
> for you? Have you ever tested it?
>
> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>

Yes, I noticed it. But my main job is improving performance on 0.67.5
version. Before this branch, my improvement on this problem is avoid
lfn_find in omap* methods with FileStore
class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)

> Greets,
> Stefan
>
> Am 09.04.2014 12:05, schrieb Haomai Wang:
>
>> Hi all,
>>
>> I would like to share some ideas about how to improve performance on
>> ceph with SSD. Not much preciseness.
>>
>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>> ceph version is 0.67.5(Dumping)
>>
>> At first, we find three bottleneck on filestore:
>> 1. fdcache_lock(changed in Firely release)
>> 2. lfn_find in omap_* methods
>> 3. DBObjectMap header
>>
>> According to my understanding and the docs in
>>
>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>> sure the correctness of this change, but it works well still now.
>>
>> DBObjectMap header patch is on the pull request queue and may be
>> merged in the next feature merge window.
>>
>> With things above done, we get much performance improvement in disk
>> util and benchmark results(3x-4x).
>>
>> Next, we find fdcache size become the main bottleneck. For example, if
>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>> cache miss is expensive and can't be afford. The implementation of
>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>> fdcache size), the implementation of FDCache will bring on extra CPU
>> cost(can't be ignore) for each op.
>>
>> Because of the capacity of SSD(several hundreds GB), we try to
>> increase the size of rbd object(16MB) so less fd cache is needed. As
>> for FDCache implementation, we simply discard SimpleLRU but introduce
>> RandomCache. Now we can set much larger fdcache size(near cache all
>> fd) with little overload.
>>
>> With these, we achieve 3x-4x performance improvements on filestore with
>> SSD.
>>
>> Maybe it exists something I missed or something wrong, hope can
>> correct me. I hope it can help to improve FileStore on SSD and push
>> into master branch.
>>
>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-26 20:29 ` Stefan Priebe
  2014-05-27  4:42   ` Haomai Wang
@ 2014-05-27  4:46   ` Haomai Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Haomai Wang @ 2014-05-27  4:46 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org

On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
> Hi Haomai,
>
> regarding the FDCache problems you're seeing. Isn't this branch interesting
> for you? Have you ever tested it?
>
> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>

I don't test performance improvements with this branch.

> Greets,
> Stefan
>
> Am 09.04.2014 12:05, schrieb Haomai Wang:
>
>> Hi all,
>>
>> I would like to share some ideas about how to improve performance on
>> ceph with SSD. Not much preciseness.
>>
>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>> ceph version is 0.67.5(Dumping)
>>
>> At first, we find three bottleneck on filestore:
>> 1. fdcache_lock(changed in Firely release)
>> 2. lfn_find in omap_* methods
>> 3. DBObjectMap header
>>
>> According to my understanding and the docs in
>>
>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>> sure the correctness of this change, but it works well still now.
>>
>> DBObjectMap header patch is on the pull request queue and may be
>> merged in the next feature merge window.
>>
>> With things above done, we get much performance improvement in disk
>> util and benchmark results(3x-4x).
>>
>> Next, we find fdcache size become the main bottleneck. For example, if
>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>> cache miss is expensive and can't be afford. The implementation of
>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>> fdcache size), the implementation of FDCache will bring on extra CPU
>> cost(can't be ignore) for each op.
>>
>> Because of the capacity of SSD(several hundreds GB), we try to
>> increase the size of rbd object(16MB) so less fd cache is needed. As
>> for FDCache implementation, we simply discard SimpleLRU but introduce
>> RandomCache. Now we can set much larger fdcache size(near cache all
>> fd) with little overload.
>>
>> With these, we achieve 3x-4x performance improvements on filestore with
>> SSD.
>>
>> Maybe it exists something I missed or something wrong, hope can
>> correct me. I hope it can help to improve FileStore on SSD and push
>> into master branch.
>>
>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-27  4:42   ` Haomai Wang
@ 2014-05-27  6:05     ` Stefan Priebe - Profihost AG
  2014-05-27  6:37       ` Haomai Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-05-27  6:05 UTC (permalink / raw)
  To: Haomai Wang; +Cc: ceph-devel@vger.kernel.org

Am 27.05.2014 06:42, schrieb Haomai Wang:
> On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>> Hi Haomai,
>>
>> regarding the FDCache problems you're seeing. Isn't this branch interesting
>> for you? Have you ever tested it?
>>
>> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>>
> 
> Yes, I noticed it. But my main job is improving performance on 0.67.5
> version. Before this branch, my improvement on this problem is avoid
> lfn_find in omap* methods with FileStore
> class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)

Avoids mean just remove them? Are they not needed? Do you have a branch
for this?

>> Greets,
>> Stefan
>>
>> Am 09.04.2014 12:05, schrieb Haomai Wang:
>>
>>> Hi all,
>>>
>>> I would like to share some ideas about how to improve performance on
>>> ceph with SSD. Not much preciseness.
>>>
>>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>>> ceph version is 0.67.5(Dumping)
>>>
>>> At first, we find three bottleneck on filestore:
>>> 1. fdcache_lock(changed in Firely release)
>>> 2. lfn_find in omap_* methods
>>> 3. DBObjectMap header
>>>
>>> According to my understanding and the docs in
>>>
>>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>>> sure the correctness of this change, but it works well still now.
>>>
>>> DBObjectMap header patch is on the pull request queue and may be
>>> merged in the next feature merge window.
>>>
>>> With things above done, we get much performance improvement in disk
>>> util and benchmark results(3x-4x).
>>>
>>> Next, we find fdcache size become the main bottleneck. For example, if
>>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>>> cache miss is expensive and can't be afford. The implementation of
>>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>>> fdcache size), the implementation of FDCache will bring on extra CPU
>>> cost(can't be ignore) for each op.
>>>
>>> Because of the capacity of SSD(several hundreds GB), we try to
>>> increase the size of rbd object(16MB) so less fd cache is needed. As
>>> for FDCache implementation, we simply discard SimpleLRU but introduce
>>> RandomCache. Now we can set much larger fdcache size(near cache all
>>> fd) with little overload.
>>>
>>> With these, we achieve 3x-4x performance improvements on filestore with
>>> SSD.
>>>
>>> Maybe it exists something I missed or something wrong, hope can
>>> correct me. I hope it can help to improve FileStore on SSD and push
>>> into master branch.
>>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-27  6:05     ` Stefan Priebe - Profihost AG
@ 2014-05-27  6:37       ` Haomai Wang
  2014-05-27  6:45         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 15+ messages in thread
From: Haomai Wang @ 2014-05-27  6:37 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

I'm not full sure the correctness of changes although it seemed ok to
me. And I apply these changes to product env and no problems.

On Tue, May 27, 2014 at 2:05 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 27.05.2014 06:42, schrieb Haomai Wang:
>> On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>> Hi Haomai,
>>>
>>> regarding the FDCache problems you're seeing. Isn't this branch interesting
>>> for you? Have you ever tested it?
>>>
>>> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>>>
>>
>> Yes, I noticed it. But my main job is improving performance on 0.67.5
>> version. Before this branch, my improvement on this problem is avoid
>> lfn_find in omap* methods with FileStore
>> class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)
>
> Avoids mean just remove them? Are they not needed? Do you have a branch
> for this?
>
>>> Greets,
>>> Stefan
>>>
>>> Am 09.04.2014 12:05, schrieb Haomai Wang:
>>>
>>>> Hi all,
>>>>
>>>> I would like to share some ideas about how to improve performance on
>>>> ceph with SSD. Not much preciseness.
>>>>
>>>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>>>> ceph version is 0.67.5(Dumping)
>>>>
>>>> At first, we find three bottleneck on filestore:
>>>> 1. fdcache_lock(changed in Firely release)
>>>> 2. lfn_find in omap_* methods
>>>> 3. DBObjectMap header
>>>>
>>>> According to my understanding and the docs in
>>>>
>>>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>>>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>>>> sure the correctness of this change, but it works well still now.
>>>>
>>>> DBObjectMap header patch is on the pull request queue and may be
>>>> merged in the next feature merge window.
>>>>
>>>> With things above done, we get much performance improvement in disk
>>>> util and benchmark results(3x-4x).
>>>>
>>>> Next, we find fdcache size become the main bottleneck. For example, if
>>>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>>>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>>>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>>>> cache miss is expensive and can't be afford. The implementation of
>>>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>>>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>>>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>>>> fdcache size), the implementation of FDCache will bring on extra CPU
>>>> cost(can't be ignore) for each op.
>>>>
>>>> Because of the capacity of SSD(several hundreds GB), we try to
>>>> increase the size of rbd object(16MB) so less fd cache is needed. As
>>>> for FDCache implementation, we simply discard SimpleLRU but introduce
>>>> RandomCache. Now we can set much larger fdcache size(near cache all
>>>> fd) with little overload.
>>>>
>>>> With these, we achieve 3x-4x performance improvements on filestore with
>>>> SSD.
>>>>
>>>> Maybe it exists something I missed or something wrong, hope can
>>>> correct me. I hope it can help to improve FileStore on SSD and push
>>>> into master branch.
>>>>
>>>
>>
>>
>>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-27  6:37       ` Haomai Wang
@ 2014-05-27  6:45         ` Stefan Priebe - Profihost AG
  2014-05-27 10:05           ` Haomai Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-05-27  6:45 UTC (permalink / raw)
  To: Haomai Wang; +Cc: ceph-devel@vger.kernel.org

Am 27.05.2014 08:37, schrieb Haomai Wang:
> I'm not full sure the correctness of changes although it seemed ok to
> me. And I apply these changes to product env and no problems.

Do you have a branch in your yuyuyu github account for this?

> On Tue, May 27, 2014 at 2:05 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Am 27.05.2014 06:42, schrieb Haomai Wang:
>>> On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>>> Hi Haomai,
>>>>
>>>> regarding the FDCache problems you're seeing. Isn't this branch interesting
>>>> for you? Have you ever tested it?
>>>>
>>>> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>>>>
>>>
>>> Yes, I noticed it. But my main job is improving performance on 0.67.5
>>> version. Before this branch, my improvement on this problem is avoid
>>> lfn_find in omap* methods with FileStore
>>> class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)
>>
>> Avoids mean just remove them? Are they not needed? Do you have a branch
>> for this?
>>
>>>> Greets,
>>>> Stefan
>>>>
>>>> Am 09.04.2014 12:05, schrieb Haomai Wang:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to share some ideas about how to improve performance on
>>>>> ceph with SSD. Not much preciseness.
>>>>>
>>>>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>>>>> ceph version is 0.67.5(Dumping)
>>>>>
>>>>> At first, we find three bottleneck on filestore:
>>>>> 1. fdcache_lock(changed in Firely release)
>>>>> 2. lfn_find in omap_* methods
>>>>> 3. DBObjectMap header
>>>>>
>>>>> According to my understanding and the docs in
>>>>>
>>>>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>>>>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>>>>> sure the correctness of this change, but it works well still now.
>>>>>
>>>>> DBObjectMap header patch is on the pull request queue and may be
>>>>> merged in the next feature merge window.
>>>>>
>>>>> With things above done, we get much performance improvement in disk
>>>>> util and benchmark results(3x-4x).
>>>>>
>>>>> Next, we find fdcache size become the main bottleneck. For example, if
>>>>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>>>>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>>>>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>>>>> cache miss is expensive and can't be afford. The implementation of
>>>>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>>>>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>>>>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>>>>> fdcache size), the implementation of FDCache will bring on extra CPU
>>>>> cost(can't be ignore) for each op.
>>>>>
>>>>> Because of the capacity of SSD(several hundreds GB), we try to
>>>>> increase the size of rbd object(16MB) so less fd cache is needed. As
>>>>> for FDCache implementation, we simply discard SimpleLRU but introduce
>>>>> RandomCache. Now we can set much larger fdcache size(near cache all
>>>>> fd) with little overload.
>>>>>
>>>>> With these, we achieve 3x-4x performance improvements on filestore with
>>>>> SSD.
>>>>>
>>>>> Maybe it exists something I missed or something wrong, hope can
>>>>> correct me. I hope it can help to improve FileStore on SSD and push
>>>>> into master branch.
>>>>>
>>>>
>>>
>>>
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-27  6:45         ` Stefan Priebe - Profihost AG
@ 2014-05-27 10:05           ` Haomai Wang
  2014-05-27 16:32             ` Milosz Tanski
  0 siblings, 1 reply; 15+ messages in thread
From: Haomai Wang @ 2014-05-27 10:05 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org

Still not, I will try to push to master branch

On Tue, May 27, 2014 at 2:45 PM, Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
> Am 27.05.2014 08:37, schrieb Haomai Wang:
>> I'm not full sure the correctness of changes although it seemed ok to
>> me. And I apply these changes to product env and no problems.
>
> Do you have a branch in your yuyuyu github account for this?
>
>> On Tue, May 27, 2014 at 2:05 PM, Stefan Priebe - Profihost AG
>> <s.priebe@profihost.ag> wrote:
>>> Am 27.05.2014 06:42, schrieb Haomai Wang:
>>>> On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>>>> Hi Haomai,
>>>>>
>>>>> regarding the FDCache problems you're seeing. Isn't this branch interesting
>>>>> for you? Have you ever tested it?
>>>>>
>>>>> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>>>>>
>>>>
>>>> Yes, I noticed it. But my main job is improving performance on 0.67.5
>>>> version. Before this branch, my improvement on this problem is avoid
>>>> lfn_find in omap* methods with FileStore
>>>> class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)
>>>
>>> Avoids mean just remove them? Are they not needed? Do you have a branch
>>> for this?
>>>
>>>>> Greets,
>>>>> Stefan
>>>>>
>>>>> Am 09.04.2014 12:05, schrieb Haomai Wang:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I would like to share some ideas about how to improve performance on
>>>>>> ceph with SSD. Not much preciseness.
>>>>>>
>>>>>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>>>>>> ceph version is 0.67.5(Dumping)
>>>>>>
>>>>>> At first, we find three bottleneck on filestore:
>>>>>> 1. fdcache_lock(changed in Firely release)
>>>>>> 2. lfn_find in omap_* methods
>>>>>> 3. DBObjectMap header
>>>>>>
>>>>>> According to my understanding and the docs in
>>>>>>
>>>>>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>>>>>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>>>>>> sure the correctness of this change, but it works well still now.
>>>>>>
>>>>>> DBObjectMap header patch is on the pull request queue and may be
>>>>>> merged in the next feature merge window.
>>>>>>
>>>>>> With things above done, we get much performance improvement in disk
>>>>>> util and benchmark results(3x-4x).
>>>>>>
>>>>>> Next, we find fdcache size become the main bottleneck. For example, if
>>>>>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>>>>>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>>>>>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>>>>>> cache miss is expensive and can't be afford. The implementation of
>>>>>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>>>>>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>>>>>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>>>>>> fdcache size), the implementation of FDCache will bring on extra CPU
>>>>>> cost(can't be ignore) for each op.
>>>>>>
>>>>>> Because of the capacity of SSD(several hundreds GB), we try to
>>>>>> increase the size of rbd object(16MB) so less fd cache is needed. As
>>>>>> for FDCache implementation, we simply discard SimpleLRU but introduce
>>>>>> RandomCache. Now we can set much larger fdcache size(near cache all
>>>>>> fd) with little overload.
>>>>>>
>>>>>> With these, we achieve 3x-4x performance improvements on filestore with
>>>>>> SSD.
>>>>>>
>>>>>> Maybe it exists something I missed or something wrong, hope can
>>>>>> correct me. I hope it can help to improve FileStore on SSD and push
>>>>>> into master branch.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>>
>>



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Share]Performance tunning on Ceph FileStore with SSD backend
  2014-05-27 10:05           ` Haomai Wang
@ 2014-05-27 16:32             ` Milosz Tanski
  0 siblings, 0 replies; 15+ messages in thread
From: Milosz Tanski @ 2014-05-27 16:32 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Stefan Priebe - Profihost AG, ceph-devel@vger.kernel.org

If the locking on something the fdcache is scabbily bottleneck. Why
not test using a spinlock instead of a mutex. It may (or may not) be
an easy and cheap win esp. if the amount of work inside of the spin
lock (operating on the map and LRU) are pretty cheap operations.

Another fine option if this is a issue that comes up with multiple
data structures in Ceph is to look into the CDS project. They provide
a number of lock free and wait free data structures implemented in C++
that also happen to be portable to most operating systems (with
fallback and specials implementations for each).

Obviously non trivial to replace the current data structures with
different ones, but it's also easier then rolling your own. We've
started using CDS recently for other projects and in a C++ code base,
it's easier to use then liburcu and more portable (with windows
support).

The library is located here:
http://libcds.sourceforge.net/

Best,
- Milosz

On Tue, May 27, 2014 at 6:05 AM, Haomai Wang <haomaiwang@gmail.com> wrote:
> Still not, I will try to push to master branch
>
> On Tue, May 27, 2014 at 2:45 PM, Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>> Am 27.05.2014 08:37, schrieb Haomai Wang:
>>> I'm not full sure the correctness of changes although it seemed ok to
>>> me. And I apply these changes to product env and no problems.
>>
>> Do you have a branch in your yuyuyu github account for this?
>>
>>> On Tue, May 27, 2014 at 2:05 PM, Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>>>> Am 27.05.2014 06:42, schrieb Haomai Wang:
>>>>> On Tue, May 27, 2014 at 4:29 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>>>>>> Hi Haomai,
>>>>>>
>>>>>> regarding the FDCache problems you're seeing. Isn't this branch interesting
>>>>>> for you? Have you ever tested it?
>>>>>>
>>>>>> http://lists.ceph.com/pipermail/ceph-commit-ceph.com/2014-January/007399.html
>>>>>>
>>>>>
>>>>> Yes, I noticed it. But my main job is improving performance on 0.67.5
>>>>> version. Before this branch, my improvement on this problem is avoid
>>>>> lfn_find in omap* methods with FileStore
>>>>> class.(https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18505.html)
>>>>
>>>> Avoids mean just remove them? Are they not needed? Do you have a branch
>>>> for this?
>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>>
>>>>>> Am 09.04.2014 12:05, schrieb Haomai Wang:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I would like to share some ideas about how to improve performance on
>>>>>>> ceph with SSD. Not much preciseness.
>>>>>>>
>>>>>>> Our ssd is 500GB and each OSD own a SSD(journal is on the same SSD).
>>>>>>> ceph version is 0.67.5(Dumping)
>>>>>>>
>>>>>>> At first, we find three bottleneck on filestore:
>>>>>>> 1. fdcache_lock(changed in Firely release)
>>>>>>> 2. lfn_find in omap_* methods
>>>>>>> 3. DBObjectMap header
>>>>>>>
>>>>>>> According to my understanding and the docs in
>>>>>>>
>>>>>>> ObjectStore.h(https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h),
>>>>>>> I simply remove lfn_find in omap_* and fdcache_lock. I'm not fully
>>>>>>> sure the correctness of this change, but it works well still now.
>>>>>>>
>>>>>>> DBObjectMap header patch is on the pull request queue and may be
>>>>>>> merged in the next feature merge window.
>>>>>>>
>>>>>>> With things above done, we get much performance improvement in disk
>>>>>>> util and benchmark results(3x-4x).
>>>>>>>
>>>>>>> Next, we find fdcache size become the main bottleneck. For example, if
>>>>>>> hot data range is 100GB, we need 25000(100GB/4MB) fd to cache. If hot
>>>>>>> data range is 1TB, we need 250000(1000GB/4MB) fd to cache. With
>>>>>>> increase "filestore_fd_cache_size", the cost of lookup(FDCache) and
>>>>>>> cache miss is expensive and can't be afford. The implementation of
>>>>>>> FDCache isn't O(1). So we only can get high performance on fdcache hit
>>>>>>> range(maybe 100GB with 10240 fdcache size) and more data exceed the
>>>>>>> size of fdcaceh will be disaster. If you want to cache more fd(102400
>>>>>>> fdcache size), the implementation of FDCache will bring on extra CPU
>>>>>>> cost(can't be ignore) for each op.
>>>>>>>
>>>>>>> Because of the capacity of SSD(several hundreds GB), we try to
>>>>>>> increase the size of rbd object(16MB) so less fd cache is needed. As
>>>>>>> for FDCache implementation, we simply discard SimpleLRU but introduce
>>>>>>> RandomCache. Now we can set much larger fdcache size(near cache all
>>>>>>> fd) with little overload.
>>>>>>>
>>>>>>> With these, we achieve 3x-4x performance improvements on filestore with
>>>>>>> SSD.
>>>>>>>
>>>>>>> Maybe it exists something I missed or something wrong, hope can
>>>>>>> correct me. I hope it can help to improve FileStore on SSD and push
>>>>>>> into master branch.
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>
> --
> Best Regards,
>
> Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-27 16:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-09 10:05 [Share]Performance tunning on Ceph FileStore with SSD backend Haomai Wang
2014-04-09 12:07 ` Mark Nelson
2014-04-09 12:08 ` Alexandre DERUMIER
2014-04-09 14:10   ` Sebastien Han
2014-04-09 14:15 ` Gregory Farnum
2014-04-11  6:04   ` Alexandre DERUMIER
2014-04-11  8:41     ` Haomai Wang
2014-05-26 20:29 ` Stefan Priebe
2014-05-27  4:42   ` Haomai Wang
2014-05-27  6:05     ` Stefan Priebe - Profihost AG
2014-05-27  6:37       ` Haomai Wang
2014-05-27  6:45         ` Stefan Priebe - Profihost AG
2014-05-27 10:05           ` Haomai Wang
2014-05-27 16:32             ` Milosz Tanski
2014-05-27  4:46   ` Haomai Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.