* Re: [ceph-users] keyvaluestore backend metadata overhead [not found] <CAC8iE5iHTEfSQL978paWpu9hSfUbE65OVT_dKi2P=yvWSQ5JhA@mail.gmail.com> @ 2015-01-29 22:51 ` Sage Weil 2015-01-30 2:46 ` Haomai Wang 2015-01-30 14:46 ` Chris Pacejo 0 siblings, 2 replies; 15+ messages in thread From: Sage Weil @ 2015-01-29 22:51 UTC (permalink / raw) To: Chris Pacejo; +Cc: ceph-devel, haomaiwang [-- Attachment #1: Type: TEXT/PLAIN, Size: 1699 bytes --] Hi Chris, [Moving this thread to ceph-devel, which is probably a bit more appropriate.] On Thu, 29 Jan 2015, Chris Pacejo wrote: > Hi, we've been experimenting with the keyvaluestore backend, and have found > that, on every object write (e.g. with `rados put`), a single transaction is > issued containing an additional 9 KeyValueDB writes, beyond those which > constitute the object data. Given the key names, these are clearly all > metadata of some sort, but this poses a problem when the objects themselves > are very small. Given the default strip block size of 4 KiB, with objects > of size 36 KiB or less, half or more of all key-value store writes are > metadata writes. With objects of size 4 KiB or less, the metadata overhead > grows to 90%+. > > Is there any way to reduce the number of metadata rows which must be written > with each object? There is a level (or two) of indirection in KeyValueStore's GenericObjectMap that is there to allow object cloning. I wonder if we will want to facilitate a backend that doesn't implement clone and can only be used for pools that disallow clone and snap operations. There is also some key consolidation in the OSD layer we talked about in the wednesday performance call that will cut this down some! > (Alternatively, if there is a way to convince the OSD to issue multiple > concurrent write transactions, that would also help. But even with > "keyvaluestore op threads" set as high as 64, and `rados bench` issuing 64 > concurrent writes, we never see more than a single active write transaction > on the (multithread-capable) backend. Is there some other option we're > missing?) sage ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-29 22:51 ` [ceph-users] keyvaluestore backend metadata overhead Sage Weil @ 2015-01-30 2:46 ` Haomai Wang 2015-01-30 15:41 ` Chris Pacejo 2015-01-30 14:46 ` Chris Pacejo 1 sibling, 1 reply; 15+ messages in thread From: Haomai Wang @ 2015-01-30 2:46 UTC (permalink / raw) To: Sage Weil; +Cc: Chris Pacejo, ceph-devel@vger.kernel.org Hi Chris, For metadata overhead, we need to resolve it at upper level, keyvaluestore won't add extra metadata in normal io except rarely header save which only update when header changed. As for active write, why do you think it there only one active write in keyvaluestore threads? I just check runtime perf data again, it looks fine that multi write can do concurrently submit transaction. On Fri, Jan 30, 2015 at 6:51 AM, Sage Weil <sage@newdream.net> wrote: > Hi Chris, > > [Moving this thread to ceph-devel, which is probably a bit more > appropriate.] > > On Thu, 29 Jan 2015, Chris Pacejo wrote: >> Hi, we've been experimenting with the keyvaluestore backend, and have found >> that, on every object write (e.g. with `rados put`), a single transaction is >> issued containing an additional 9 KeyValueDB writes, beyond those which >> constitute the object data. Given the key names, these are clearly all >> metadata of some sort, but this poses a problem when the objects themselves >> are very small. Given the default strip block size of 4 KiB, with objects >> of size 36 KiB or less, half or more of all key-value store writes are >> metadata writes. With objects of size 4 KiB or less, the metadata overhead >> grows to 90%+. >> >> Is there any way to reduce the number of metadata rows which must be written >> with each object? > > There is a level (or two) of indirection in KeyValueStore's > GenericObjectMap that is there to allow object cloning. I wonder if we > will want to facilitate a backend that doesn't implement clone and can > only be used for pools that disallow clone and snap operations. > > There is also some key consolidation in the OSD layer we talked about in > the wednesday performance call that will cut this down some! > >> (Alternatively, if there is a way to convince the OSD to issue multiple >> concurrent write transactions, that would also help. But even with >> "keyvaluestore op threads" set as high as 64, and `rados bench` issuing 64 >> concurrent writes, we never see more than a single active write transaction >> on the (multithread-capable) backend. Is there some other option we're >> missing?) > > sage -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 2:46 ` Haomai Wang @ 2015-01-30 15:41 ` Chris Pacejo 2015-01-30 15:52 ` Haomai Wang 0 siblings, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-01-30 15:41 UTC (permalink / raw) To: Haomai Wang; +Cc: Sage Weil, ceph-devel@vger.kernel.org Hi Haomai, On Thu, Jan 29, 2015 at 9:46 PM, Haomai Wang <haomaiwang@gmail.com> wrote: > For metadata overhead, we need to resolve it at upper level, > keyvaluestore won't add extra metadata in normal io except rarely > header save which only update when header changed. Unfortunately, our write workload is dominated by object creates. > As for active write, why do you think it there only one active write > in keyvaluestore threads? I just check runtime perf data again, it > looks fine that multi write can do concurrently submit transaction. We've implemented a MySQL backend for KeyValueDB in the hopes of getting better performance than LevelDB (what we're currently seeing is on par). Internally, it uses a LIFO connection pool, from which connections are leased for the duration of a transaction commit or snapshot walk (to permit concurrent transactions). Watching the connection activity in MySQL using "SHOW PROCESSLIST", during most runs, it's clear that, for the duration of the write benchmark, all but two of the connections remain idle. (During cleanup, I do see more connections used, and I have on occasion seen more used during writes.) So while it's possible the transactions are being built concurrently, they aren't (or are with a very low probability) being submitted (via submit_transaction_sync()) concurrently. (It's entirely possible that a bug in our code, or misdocumented behavior in the MySQL client, excludes concurrent threads from using open MySQL connections, but I *have* seen concurrent transaction commits, only rarely.) You mention "runtime perf data", is there a simple way to query the OSD's idea of how many concurrent transaction submits it is issuing? In the meantime I'll instrument our backend to track this value itself. Thanks! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 15:41 ` Chris Pacejo @ 2015-01-30 15:52 ` Haomai Wang 2015-01-30 16:08 ` Chris Pacejo 2015-02-03 15:13 ` Chris Pacejo 0 siblings, 2 replies; 15+ messages in thread From: Haomai Wang @ 2015-01-30 15:52 UTC (permalink / raw) To: Chris Pacejo; +Cc: Sage Weil, ceph-devel@vger.kernel.org On Fri, Jan 30, 2015 at 11:41 PM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > Hi Haomai, > > On Thu, Jan 29, 2015 at 9:46 PM, Haomai Wang <haomaiwang@gmail.com> wrote: >> For metadata overhead, we need to resolve it at upper level, >> keyvaluestore won't add extra metadata in normal io except rarely >> header save which only update when header changed. > > Unfortunately, our write workload is dominated by object creates. > > >> As for active write, why do you think it there only one active write >> in keyvaluestore threads? I just check runtime perf data again, it >> looks fine that multi write can do concurrently submit transaction. > > We've implemented a MySQL backend for KeyValueDB in the hopes of > getting better performance than LevelDB (what we're currently seeing > is on par). Internally, it uses a LIFO connection pool, from which > connections are leased for the duration of a transaction commit or > snapshot walk (to permit concurrent transactions). Watching the > connection activity in MySQL using "SHOW PROCESSLIST", during most > runs, it's clear that, for the duration of the write benchmark, all > but two of the connections remain idle. (During cleanup, I do see > more connections used, and I have on occasion seen more used during > writes.) So while it's possible the transactions are being built > concurrently, they aren't (or are with a very low probability) being > submitted (via submit_transaction_sync()) concurrently. > > (It's entirely possible that a bug in our code, or misdocumented > behavior in the MySQL client, excludes concurrent threads from using > open MySQL connections, but I *have* seen concurrent transaction > commits, only rarely.) > > You mention "runtime perf data", is there a simple way to query the > OSD's idea of how many concurrent transaction submits it is issuing? > In the meantime I'll instrument our backend to track this value > itself. > > Thanks! It's really a surprise that you impl a MySQL backend. Could I know the purpose? Because it may not fit with keyvaluestore I think. You can simply calculate the sum of submit_transaction_sync consuming time, it would be the multiple of the op thread number. -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 15:52 ` Haomai Wang @ 2015-01-30 16:08 ` Chris Pacejo 2015-01-30 16:18 ` Haomai Wang 2015-02-03 15:13 ` Chris Pacejo 1 sibling, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-01-30 16:08 UTC (permalink / raw) To: Haomai Wang; +Cc: Sage Weil, ceph-devel@vger.kernel.org On Fri, Jan 30, 2015 at 10:52 AM, Haomai Wang <haomaiwang@gmail.com> wrote: > It's really a surprise that you impl a MySQL backend. Could I know the > purpose? Because it may not fit with keyvaluestore I think. We've found it to perform better (in isolation) than LevelDB. We were able to map KeyValueDB's interface to it fairly painlessly, and I believe correctly. (The only major catch was that we needed to buffer operations within a transaction and execute them all at once on submit, to prevent MySQL unnecessarily holding locks for the duration of long-lived transactions.) > You can simply calculate the sum of submit_transaction_sync consuming > time, it would be the multiple of the op thread number. I will try this, thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 16:08 ` Chris Pacejo @ 2015-01-30 16:18 ` Haomai Wang 2015-02-01 14:50 ` Chen, Xiaoxi 0 siblings, 1 reply; 15+ messages in thread From: Haomai Wang @ 2015-01-30 16:18 UTC (permalink / raw) To: Chris Pacejo; +Cc: Sage Weil, ceph-devel@vger.kernel.org Although I still have some confusing, it's glad to see more attempts. More test results are welcomed! On Sat, Jan 31, 2015 at 12:08 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > On Fri, Jan 30, 2015 at 10:52 AM, Haomai Wang <haomaiwang@gmail.com> wrote: >> It's really a surprise that you impl a MySQL backend. Could I know the >> purpose? Because it may not fit with keyvaluestore I think. > > We've found it to perform better (in isolation) than LevelDB. We were > able to map KeyValueDB's interface to it fairly painlessly, and I > believe correctly. (The only major catch was that we needed to buffer > operations within a transaction and execute them all at once on > submit, to prevent MySQL unnecessarily holding locks for the duration > of long-lived transactions.) > > >> You can simply calculate the sum of submit_transaction_sync consuming >> time, it would be the multiple of the op thread number. > > I will try this, thanks. -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 16:18 ` Haomai Wang @ 2015-02-01 14:50 ` Chen, Xiaoxi 2015-02-03 15:03 ` Chris Pacejo 0 siblings, 1 reply; 15+ messages in thread From: Chen, Xiaoxi @ 2015-02-01 14:50 UTC (permalink / raw) To: Haomai Wang, Chris Pacejo; +Cc: Sage Weil, ceph-devel@vger.kernel.org We can always use a structure database in an unstructured way, I think it's workable in theory, but why choose MySQL? As discussed some while ago, any LSM structured database design will suffer in performance due to write amplification, is that the reason goes to MySQL only about prevent LSM? Or try some B-tree like structure? If so ,maybe LMDB is a better choice?(although it's not yeet self-proven as production ready ) -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Haomai Wang Sent: Saturday, January 31, 2015 12:18 AM To: Chris Pacejo Cc: Sage Weil; ceph-devel@vger.kernel.org Subject: Re: [ceph-users] keyvaluestore backend metadata overhead Although I still have some confusing, it's glad to see more attempts. More test results are welcomed! On Sat, Jan 31, 2015 at 12:08 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > On Fri, Jan 30, 2015 at 10:52 AM, Haomai Wang <haomaiwang@gmail.com> wrote: >> It's really a surprise that you impl a MySQL backend. Could I know >> the purpose? Because it may not fit with keyvaluestore I think. > > We've found it to perform better (in isolation) than LevelDB. We were > able to map KeyValueDB's interface to it fairly painlessly, and I > believe correctly. (The only major catch was that we needed to buffer > operations within a transaction and execute them all at once on > submit, to prevent MySQL unnecessarily holding locks for the duration > of long-lived transactions.) > > >> You can simply calculate the sum of submit_transaction_sync consuming >> time, it would be the multiple of the op thread number. > > I will try this, thanks. -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-01 14:50 ` Chen, Xiaoxi @ 2015-02-03 15:03 ` Chris Pacejo 2015-02-04 3:15 ` Mark Nelson 0 siblings, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-02-03 15:03 UTC (permalink / raw) To: Chen, Xiaoxi; +Cc: Haomai Wang, Sage Weil, ceph-devel@vger.kernel.org Hi Xiaoxi, On Sun, Feb 1, 2015 at 9:50 AM, Chen, Xiaoxi <xiaoxi.chen@intel.com> wrote: > We can always use a structure database in an unstructured way, I think it's workable in theory, but why choose MySQL? In our internal performance tests, it performed better than LevelDB and some others, and it's well-proven. It's not our first choice, nor are we done investigating other options. But we'll check out LMDB, thanks for the pointer. Regardless, the issues we're seeing are equally applicable to any key-value backend. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-03 15:03 ` Chris Pacejo @ 2015-02-04 3:15 ` Mark Nelson 0 siblings, 0 replies; 15+ messages in thread From: Mark Nelson @ 2015-02-04 3:15 UTC (permalink / raw) To: Chris Pacejo, Chen, Xiaoxi Cc: Haomai Wang, Sage Weil, ceph-devel@vger.kernel.org On 02/03/2015 09:03 AM, Chris Pacejo wrote: > Hi Xiaoxi, > > On Sun, Feb 1, 2015 at 9:50 AM, Chen, Xiaoxi <xiaoxi.chen@intel.com> wrote: >> We can always use a structure database in an unstructured way, I think it's workable in theory, but why choose MySQL? > > In our internal performance tests, it performed better than LevelDB > and some others, and it's well-proven. It's not our first choice, nor > are we done investigating other options. But we'll check out LMDB, > thanks for the pointer. > > Regardless, the issues we're seeing are equally applicable to any > key-value backend. You may also wish to try the rocksdb backend with universal compaction rather than leveled compaction. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-30 15:52 ` Haomai Wang 2015-01-30 16:08 ` Chris Pacejo @ 2015-02-03 15:13 ` Chris Pacejo 2015-02-03 20:25 ` Chris Pacejo 1 sibling, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-02-03 15:13 UTC (permalink / raw) To: Haomai Wang; +Cc: Sage Weil, ceph-devel@vger.kernel.org On Fri, Jan 30, 2015 at 10:52 AM, Haomai Wang <haomaiwang@gmail.com> wrote: >>> As for active write, why do you think it there only one active write >>> in keyvaluestore threads? I just check runtime perf data again, it >>> looks fine that multi write can do concurrently submit transaction. > > You can simply calculate the sum of submit_transaction_sync consuming > time, it would be the multiple of the op thread number. I've instrumented submit_transaction to tick up/down an atomic counter. While in certain situations (resource-constrained VM; OSD startup), I do see up to "keyvaluestore op threads" number of concurrent transaction submits reported by this counter, on real hardware, during `rados bench` with 2700-byte objects, I never see more than 3 concurrent submits; on average, I see 2. I don't know the OSD's internals well enough to speculate on the cause, but it's worth noting that the OSD processes consume a lot of CPU (170%+) during these benchmarks (compared to 14% for 1 MiB objects). We'll keep experimenting, but we're definitely excited of the possibility of reducing metadata overhead ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-03 15:13 ` Chris Pacejo @ 2015-02-03 20:25 ` Chris Pacejo 2015-02-04 2:31 ` Haomai Wang 0 siblings, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-02-03 20:25 UTC (permalink / raw) To: ceph-devel@vger.kernel.org The below observations (including high CPU usage by the OSDs) hold true when we (roughly) double the performance of the MySQL backend by pointing it to SSDs instead of rotary media. This causes us to suspect that our current bottleneck is not the extra load placed on the backend by the metadata; but rather something in the OSD which causes it to be unable to saturate the backend. Any thoughts? On Tue, Feb 3, 2015 at 10:13 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > I've instrumented submit_transaction to tick up/down an atomic > counter. While in certain situations (resource-constrained VM; OSD > startup), I do see up to "keyvaluestore op threads" number of > concurrent transaction submits reported by this counter, on real > hardware, during `rados bench` with 2700-byte objects, I never see > more than 3 concurrent submits; on average, I see 2. I don't know the > OSD's internals well enough to speculate on the cause, but it's worth > noting that the OSD processes consume a lot of CPU (170%+) during > these benchmarks (compared to 14% for 1 MiB objects). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-03 20:25 ` Chris Pacejo @ 2015-02-04 2:31 ` Haomai Wang 2015-02-09 18:29 ` Chris Pacejo 0 siblings, 1 reply; 15+ messages in thread From: Haomai Wang @ 2015-02-04 2:31 UTC (permalink / raw) To: Chris Pacejo; +Cc: ceph-devel@vger.kernel.org On Wed, Feb 4, 2015 at 4:25 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > The below observations (including high CPU usage by the OSDs) hold > true when we (roughly) double the performance of the MySQL backend by > pointing it to SSDs instead of rotary media. This causes us to > suspect that our current bottleneck is not the extra load placed on > the backend by the metadata; but rather something in the OSD which > causes it to be unable to saturate the backend. Any thoughts? > Maybe more detail number can help us a bit. > > On Tue, Feb 3, 2015 at 10:13 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: >> I've instrumented submit_transaction to tick up/down an atomic >> counter. While in certain situations (resource-constrained VM; OSD >> startup), I do see up to "keyvaluestore op threads" number of >> concurrent transaction submits reported by this counter, on real >> hardware, during `rados bench` with 2700-byte objects, I never see >> more than 3 concurrent submits; on average, I see 2. I don't know the >> OSD's internals well enough to speculate on the cause, but it's worth >> noting that the OSD processes consume a lot of CPU (170%+) during >> these benchmarks (compared to 14% for 1 MiB objects). > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-04 2:31 ` Haomai Wang @ 2015-02-09 18:29 ` Chris Pacejo 2015-02-10 6:33 ` Haomai Wang 0 siblings, 1 reply; 15+ messages in thread From: Chris Pacejo @ 2015-02-09 18:29 UTC (permalink / raw) To: Haomai Wang; +Cc: ceph-devel@vger.kernel.org On Tue, Feb 3, 2015 at 9:31 PM, Haomai Wang <haomaiwang@gmail.com> wrote: > Maybe more detail number can help us a bit. Here's what we're testing with and what we observe: Hardware: 2x6-core hyperthreaded Xeon E5-2620 v2 2.10GHz CPU 8x8 GiB DDR3 RAM 4x4 TB 7200 RPM 8 ms 183 MB/s SAS rotary disks Software: CentOS 7 CEPH 0.91 4 OSDs osd pool default size = 1 (just for testing!) keyvaluestore op threads = 16 keyvaluestore backend = mysql (our own backend) one MySQL process per OSD, each writing to a separate disk Test setup: rados bench on a fresh install 256 concurrent writes 360 seconds 2700 byte objects, and 1 MiB objects measure throughput with rados bench measure CPU usage by observing top measure max concurrent transaction submits by instrumenting the KeyValueDB interface With this setup, we observe that, with 2700 byte objects: 7.4 MiB/s (~2900 ops/s) throughput, 170%/170%/60%/60% OSD CPU usage, 200%/200%/65%/65% MySQL CPU usage, and 3/3/1/1 maximum concurrent transaction submits; and with 1 MiB objects: 50.7 MiB/s (~51 ops/s) throughput, 14%/14%/4%/4% OSD CPU usage, 50%/50%/15%/15% MySQL CPU usage, and 3/3/1/1 maximum concurrent transaction submits. We know that our transaction concurrency measurement is not buggy, as it will consistently report up to `keyvaluestore op threads` concurrent submits both on OSD startup on this same hardware, and during benchmarking in a resource-constrained VM. We are pretty sure MySQL is not the bottleneck, since we've been able to throw much more at it (concurrently); at least 10 kops/s per instance. (Sequentially it is not so good; hence our fixation on the low transaction concurrency!) Let me know if there are any other figures which would be helpful in diagnosing why the OSDs are not issuing as many concurrent transactions as we'd like, or why they are using so much CPU. Thanks for your help. - Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-02-09 18:29 ` Chris Pacejo @ 2015-02-10 6:33 ` Haomai Wang 0 siblings, 0 replies; 15+ messages in thread From: Haomai Wang @ 2015-02-10 6:33 UTC (permalink / raw) To: Chris Pacejo; +Cc: ceph-devel@vger.kernel.org On Tue, Feb 10, 2015 at 2:29 AM, Chris Pacejo <cpacejo@clearskydata.com> wrote: > On Tue, Feb 3, 2015 at 9:31 PM, Haomai Wang <haomaiwang@gmail.com> wrote: >> Maybe more detail number can help us a bit. > > Here's what we're testing with and what we observe: > > Hardware: > 2x6-core hyperthreaded Xeon E5-2620 v2 2.10GHz CPU > 8x8 GiB DDR3 RAM > 4x4 TB 7200 RPM 8 ms 183 MB/s SAS rotary disks > > Software: > CentOS 7 > CEPH 0.91 > 4 OSDs > osd pool default size = 1 (just for testing!) > keyvaluestore op threads = 16 > keyvaluestore backend = mysql (our own backend) > one MySQL process per OSD, each writing to a separate disk > > Test setup: > rados bench on a fresh install > 256 concurrent writes > 360 seconds > 2700 byte objects, and 1 MiB objects > measure throughput with rados bench > measure CPU usage by observing top > measure max concurrent transaction submits by instrumenting the > KeyValueDB interface > > With this setup, we observe that, with 2700 byte objects: > > 7.4 MiB/s (~2900 ops/s) throughput, > 170%/170%/60%/60% OSD CPU usage, > 200%/200%/65%/65% MySQL CPU usage, and > 3/3/1/1 maximum concurrent transaction submits; > > and with 1 MiB objects: > > 50.7 MiB/s (~51 ops/s) throughput, > 14%/14%/4%/4% OSD CPU usage, > 50%/50%/15%/15% MySQL CPU usage, and > 3/3/1/1 maximum concurrent transaction submits. It looks like that a little unbalance ops for four osds? > > We know that our transaction concurrency measurement is not buggy, as > it will consistently report up to `keyvaluestore op threads` > concurrent submits both on OSD startup on this same hardware, and > during benchmarking in a resource-constrained VM. We are pretty sure > MySQL is not the bottleneck, since we've been able to throw much more > at it (concurrently); at least 10 kops/s per instance. (Sequentially > it is not so good; hence our fixation on the low transaction > concurrency!) > > Let me know if there are any other figures which would be helpful in > diagnosing why the OSDs are not issuing as many concurrent > transactions as we'd like, or why they are using so much CPU. Thanks > for your help. I think you can look at perf dump result to see whether exists full throttle queue, such as keyvaluestore queue. Sorry, I still can't think of anything may prevent concurrent level above objectstore backend, at most of cases, backend should be the bottleneck > > - Chris -- Best Regards, Wheat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ceph-users] keyvaluestore backend metadata overhead 2015-01-29 22:51 ` [ceph-users] keyvaluestore backend metadata overhead Sage Weil 2015-01-30 2:46 ` Haomai Wang @ 2015-01-30 14:46 ` Chris Pacejo 1 sibling, 0 replies; 15+ messages in thread From: Chris Pacejo @ 2015-01-30 14:46 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel, haomaiwang Hi Sage, thanks for the quick reply. On Thu, Jan 29, 2015 at 5:51 PM, Sage Weil <sage@newdream.net> wrote: > There is a level (or two) of indirection in KeyValueStore's > GenericObjectMap that is there to allow object cloning. I wonder if we > will want to facilitate a backend that doesn't implement clone and can > only be used for pools that disallow clone and snap operations. That would be perfect for us. We need neither cloning nor snapshots. > There is also some key consolidation in the OSD layer we talked about in > the wednesday performance call that will cut this down some! Awesome. Each fewer key-value pair will be a huge performance boost for us! ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-02-10 6:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAC8iE5iHTEfSQL978paWpu9hSfUbE65OVT_dKi2P=yvWSQ5JhA@mail.gmail.com>
2015-01-29 22:51 ` [ceph-users] keyvaluestore backend metadata overhead Sage Weil
2015-01-30 2:46 ` Haomai Wang
2015-01-30 15:41 ` Chris Pacejo
2015-01-30 15:52 ` Haomai Wang
2015-01-30 16:08 ` Chris Pacejo
2015-01-30 16:18 ` Haomai Wang
2015-02-01 14:50 ` Chen, Xiaoxi
2015-02-03 15:03 ` Chris Pacejo
2015-02-04 3:15 ` Mark Nelson
2015-02-03 15:13 ` Chris Pacejo
2015-02-03 20:25 ` Chris Pacejo
2015-02-04 2:31 ` Haomai Wang
2015-02-09 18:29 ` Chris Pacejo
2015-02-10 6:33 ` Haomai Wang
2015-01-30 14:46 ` Chris Pacejo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.