* poor rbd performance @ 2014-08-22 12:55 Wyllys Ingersoll 2014-08-22 22:17 ` Dan Mick 0 siblings, 1 reply; 6+ messages in thread From: Wyllys Ingersoll @ 2014-08-22 12:55 UTC (permalink / raw) To: stgt Im seeing some disappointing performance numbers using the bs_rbd backend with a Ceph RBD pool backend over a 10GB Ethernet link. Read operations appear to max out at about 100MB/second, regardless of block size or amount of data being read and write operations fare much worse, maxing out somewhere in the 40MB/second range. Any ideas why this would be so limited? I've tested using 'fio' as well as some other perf testing utilities. On the same link, talking to the same ceph pool/image, using librados directly (either through the C or Python bindings), the read performance is 5-8x faster and write performance is 2-3x faster. Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform better? thanks, Wyllys Ingersoll ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: poor rbd performance 2014-08-22 12:55 poor rbd performance Wyllys Ingersoll @ 2014-08-22 22:17 ` Dan Mick 2014-08-23 13:46 ` Wyllys Ingersoll 0 siblings, 1 reply; 6+ messages in thread From: Dan Mick @ 2014-08-22 22:17 UTC (permalink / raw) To: Wyllys Ingersoll, stgt Hello, name from a past life... I wrote the original port to rbd, and there was very little attempt to even consider performance, and certainly no study; it was and is a proof-of-concept. I don't know offhand what may be at fault, but I know it's a target-rich environment, because no one's ever gone hunting at all to my knowledge. Several have recommended making use of Ceph async interfaces; I don't know how much of a win this would be, because stgt already has a pool of worker threads for outstanding requests. I also don't know how hard it is to monitor things like thread utilization inside stgt. but I'm interested in the subject and can help answer Ceph questions if you have them. On 08/22/2014 05:55 AM, Wyllys Ingersoll wrote: > Im seeing some disappointing performance numbers using the bs_rbd backend > with a Ceph RBD pool backend over a 10GB Ethernet link. > > Read operations appear to max out at about 100MB/second, regardless of > block size or amount of data being read and write operations fare much > worse, maxing out somewhere in the 40MB/second range. Any ideas why this > would be so limited? > > I've tested using 'fio' as well as some other perf testing utilities. On > the same link, talking to the same ceph pool/image, using librados directly > (either through the C or Python bindings), the read performance is 5-8x > faster and write performance is 2-3x faster. > > Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform > better? > > thanks, > Wyllys Ingersoll > -- > To unsubscribe from this list: send the line "unsubscribe stgt" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: poor rbd performance 2014-08-22 22:17 ` Dan Mick @ 2014-08-23 13:46 ` Wyllys Ingersoll 2014-08-24 23:25 ` FUJITA Tomonori 0 siblings, 1 reply; 6+ messages in thread From: Wyllys Ingersoll @ 2014-08-23 13:46 UTC (permalink / raw) To: Dan Mick; +Cc: stgt Hey Dan, its always good to hear from another former Sun eng. I saw your blog posts about the RBD backend and it seems to work as advertised. I don't know if the rados aio calls will make it better or not. I do know that it is possible to get better throughput using the standard IO functions (the ones you used) because all of the non-iSCSI tests we've done just use the standard rbd_read/rbd_write calls just like bs_rbd. It might be an architectural limitation in tgtd, but I am not familiar enough with the tgtd code yet to know where to look for the bottlenecks. I tried changing the timing interval in 'work.c' to be much shorter, but it didn't make a difference. When I enabled some debug logging, I see that the maximum data size that the tgtd read or write operation handles is only 128Kb, which is something that may be worth tracking down, perhaps theres a way to make it ask for bigger chunks of data. We've even tried changing the replication on the ceph pool to 1 (just for testing purposes) to eliminate the duplication from the equation, but its not making much difference in this situation. I might try modifying the code to use the aio calls to see how it goes, it might yield some interesting results if I add some timing measurements, and maybe it will end up being faster. Any suggestions for further ceph tuning or other areas in tgtd to look at for possible problems? thanks, Wyllys On Fri, Aug 22, 2014 at 6:17 PM, Dan Mick <dan.mick@inktank.com> wrote: > Hello, name from a past life... > > I wrote the original port to rbd, and there was very little attempt to > even consider performance, and certainly no study; it was and is a > proof-of-concept. I don't know offhand what may be at fault, but I know > it's a target-rich environment, because no one's ever gone hunting at > all to my knowledge. > > Several have recommended making use of Ceph async interfaces; I don't > know how much of a win this would be, because stgt already has a pool of > worker threads for outstanding requests. I also don't know how hard it > is to monitor things like thread utilization inside stgt. > > but I'm interested in the subject and can help answer Ceph questions if > you have them. > > On 08/22/2014 05:55 AM, Wyllys Ingersoll wrote: >> Im seeing some disappointing performance numbers using the bs_rbd backend >> with a Ceph RBD pool backend over a 10GB Ethernet link. >> >> Read operations appear to max out at about 100MB/second, regardless of >> block size or amount of data being read and write operations fare much >> worse, maxing out somewhere in the 40MB/second range. Any ideas why this >> would be so limited? >> >> I've tested using 'fio' as well as some other perf testing utilities. On >> the same link, talking to the same ceph pool/image, using librados directly >> (either through the C or Python bindings), the read performance is 5-8x >> faster and write performance is 2-3x faster. >> >> Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform >> better? >> >> thanks, >> Wyllys Ingersoll >> -- >> To unsubscribe from this list: send the line "unsubscribe stgt" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: poor rbd performance 2014-08-23 13:46 ` Wyllys Ingersoll @ 2014-08-24 23:25 ` FUJITA Tomonori 2014-08-25 3:38 ` Wyllys Ingersoll 0 siblings, 1 reply; 6+ messages in thread From: FUJITA Tomonori @ 2014-08-24 23:25 UTC (permalink / raw) To: wyllys.ingersoll; +Cc: dan.mick, stgt Hello, On Sat, 23 Aug 2014 09:46:37 -0400 Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote: > Hey Dan, its always good to hear from another former Sun eng. > > I saw your blog posts about the RBD backend and it seems to work as > advertised. I don't know if the rados aio calls will make it better > or not. I do know that it is possible to get better throughput using > the standard IO functions (the ones you used) because all of the > non-iSCSI tests we've done just use the standard rbd_read/rbd_write > calls just like bs_rbd. It might be an architectural limitation in > tgtd, but I am not familiar enough with the tgtd code yet to know > where to look for the bottlenecks. I tried changing the timing > interval in 'work.c' to be much shorter, but it didn't make a > difference. When I enabled some debug logging, I see that the maximum > data size that the tgtd read or write operation handles is only > 128Kb, You mean that the read/write size from iSCSI initiators is 128K? If so, probably it's due to your configuration. tgt doesn't have such limit. iSCSI has lots of negotiation parameters between an initiator and a target, which affects the performance. You need to configure both properly to get good performance. > which is something that may be worth tracking down, perhaps theres a > way to make it ask for bigger chunks of data. We've even tried > changing the replication on the ceph pool to 1 (just for testing > purposes) to eliminate the duplication from the equation, but its not > making much difference in this situation. > > I might try modifying the code to use the aio calls to see how it > goes, it might yield some interesting results if I add some timing > measurements, and maybe it will end up being faster. > > Any suggestions for further ceph tuning or other areas in tgtd to look > at for possible problems? I would suggest you to work on a simpler backend like bs_null to know the best performance of tgt on your env. I think that the first step is knowing where is the bottleneck. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: poor rbd performance 2014-08-24 23:25 ` FUJITA Tomonori @ 2014-08-25 3:38 ` Wyllys Ingersoll 2014-08-26 18:23 ` Wyllys Ingersoll 0 siblings, 1 reply; 6+ messages in thread From: Wyllys Ingersoll @ 2014-08-25 3:38 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: Dan Mick, stgt On Sun, Aug 24, 2014 at 7:25 PM, FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > Hello, > > On Sat, 23 Aug 2014 09:46:37 -0400 > Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote: > >> Hey Dan, its always good to hear from another former Sun eng. >> >> I saw your blog posts about the RBD backend and it seems to work as >> advertised. I don't know if the rados aio calls will make it better >> or not. I do know that it is possible to get better throughput using >> the standard IO functions (the ones you used) because all of the >> non-iSCSI tests we've done just use the standard rbd_read/rbd_write >> calls just like bs_rbd. It might be an architectural limitation in >> tgtd, but I am not familiar enough with the tgtd code yet to know >> where to look for the bottlenecks. I tried changing the timing >> interval in 'work.c' to be much shorter, but it didn't make a >> difference. When I enabled some debug logging, I see that the maximum >> data size that the tgtd read or write operation handles is only >> 128Kb, > > You mean that the read/write size from iSCSI initiators is 128K? If > so, probably it's due to your configuration. tgt doesn't have such > limit. iSCSI has lots of negotiation parameters between an initiator > and a target, which affects the performance. You need to configure > both properly to get good performance. I believe the initiator is from open-iSCSI. I have tried changing some settings but apparently haven't found the right ones that will make a difference. >> Any suggestions for further ceph tuning or other areas in tgtd to look >> at for possible problems? > > I would suggest you to work on a simpler backend like bs_null to know > the best performance of tgt on your env. I think that the first step > is knowing where is the bottleneck. Thanks, I'll take a look. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: poor rbd performance 2014-08-25 3:38 ` Wyllys Ingersoll @ 2014-08-26 18:23 ` Wyllys Ingersoll 0 siblings, 0 replies; 6+ messages in thread From: Wyllys Ingersoll @ 2014-08-26 18:23 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: Dan Mick, stgt I dont think using RBD AIO makes much difference. The bs worker thread ultimately has to block until the backend request handler completes before it can put the cmd structure back in the queue (see: bs_thread_worker_fn in bs.c). If the request handler function in bs_rbd returns before the asynchronous call has actually completed, the data gets corrupted and it eventually crashes. So, changing to using the rados aio functions didnt matter, but bumping up the nr_threads from 16 to 64 made a significant difference. On a multi-processor/multi-core system it might make sense to raise that value way up to maximize the request handling. On Sun, Aug 24, 2014 at 11:38 PM, Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote: > On Sun, Aug 24, 2014 at 7:25 PM, FUJITA Tomonori > <fujita.tomonori@lab.ntt.co.jp> wrote: >> Hello, >> >> On Sat, 23 Aug 2014 09:46:37 -0400 >> Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote: >> >>> Hey Dan, its always good to hear from another former Sun eng. >>> >>> I saw your blog posts about the RBD backend and it seems to work as >>> advertised. I don't know if the rados aio calls will make it better >>> or not. I do know that it is possible to get better throughput using >>> the standard IO functions (the ones you used) because all of the >>> non-iSCSI tests we've done just use the standard rbd_read/rbd_write >>> calls just like bs_rbd. It might be an architectural limitation in >>> tgtd, but I am not familiar enough with the tgtd code yet to know >>> where to look for the bottlenecks. I tried changing the timing >>> interval in 'work.c' to be much shorter, but it didn't make a >>> difference. When I enabled some debug logging, I see that the maximum >>> data size that the tgtd read or write operation handles is only >>> 128Kb, >> >> You mean that the read/write size from iSCSI initiators is 128K? If >> so, probably it's due to your configuration. tgt doesn't have such >> limit. iSCSI has lots of negotiation parameters between an initiator >> and a target, which affects the performance. You need to configure >> both properly to get good performance. > > > > I believe the initiator is from open-iSCSI. I have tried changing > some settings but apparently haven't found the right ones that will > make a difference. > > > >>> Any suggestions for further ceph tuning or other areas in tgtd to look >>> at for possible problems? >> >> I would suggest you to work on a simpler backend like bs_null to know >> the best performance of tgt on your env. I think that the first step >> is knowing where is the bottleneck. > > > Thanks, I'll take a look. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-26 18:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-08-22 12:55 poor rbd performance Wyllys Ingersoll 2014-08-22 22:17 ` Dan Mick 2014-08-23 13:46 ` Wyllys Ingersoll 2014-08-24 23:25 ` FUJITA Tomonori 2014-08-25 3:38 ` Wyllys Ingersoll 2014-08-26 18:23 ` Wyllys Ingersoll
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.