poor rbd performance

All of lore.kernel.org
 help / color / mirror / Atom feed

* poor rbd performance
@ 2014-08-22 12:55 Wyllys Ingersoll
  2014-08-22 22:17 ` Dan Mick
  0 siblings, 1 reply; 6+ messages in thread
From: Wyllys Ingersoll @ 2014-08-22 12:55 UTC (permalink / raw)
  To: stgt

Im seeing some disappointing performance numbers using the bs_rbd backend
with a Ceph RBD pool backend over a 10GB Ethernet link.

Read operations appear to max out at about 100MB/second, regardless of
block size or amount of data being read and write operations fare much
worse, maxing out somewhere in the 40MB/second range.   Any ideas why this
would be so limited?

I've tested using 'fio' as well as some other perf testing utilities.  On
the same link, talking to the same ceph pool/image, using librados directly
(either through the C or Python bindings), the read performance is 5-8x
faster and write performance is 2-3x faster.

Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform
better?

thanks,
  Wyllys Ingersoll

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: poor rbd performance
  2014-08-22 12:55 poor rbd performance Wyllys Ingersoll
@ 2014-08-22 22:17 ` Dan Mick
  2014-08-23 13:46   ` Wyllys Ingersoll
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Mick @ 2014-08-22 22:17 UTC (permalink / raw)
  To: Wyllys Ingersoll, stgt

Hello, name from a past life...

I wrote the original port to rbd, and there was very little attempt to
even consider performance, and certainly no study; it was and is a
proof-of-concept.  I don't know offhand what may be at fault, but I know
it's a target-rich environment, because no one's ever gone hunting at
all to my knowledge.

Several have recommended making use of Ceph async interfaces; I don't
know how much of a win this would be, because stgt already has a pool of
worker threads for outstanding requests.  I also don't know how hard it
is to monitor things like thread utilization inside stgt.

but I'm interested in the subject and can help answer Ceph questions if
you have them.

On 08/22/2014 05:55 AM, Wyllys Ingersoll wrote:
> Im seeing some disappointing performance numbers using the bs_rbd backend
> with a Ceph RBD pool backend over a 10GB Ethernet link.
> 
> Read operations appear to max out at about 100MB/second, regardless of
> block size or amount of data being read and write operations fare much
> worse, maxing out somewhere in the 40MB/second range.   Any ideas why this
> would be so limited?
> 
> I've tested using 'fio' as well as some other perf testing utilities.  On
> the same link, talking to the same ceph pool/image, using librados directly
> (either through the C or Python bindings), the read performance is 5-8x
> faster and write performance is 2-3x faster.
> 
> Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform
> better?
> 
> thanks,
>   Wyllys Ingersoll
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: poor rbd performance
  2014-08-22 22:17 ` Dan Mick
@ 2014-08-23 13:46   ` Wyllys Ingersoll
  2014-08-24 23:25     ` FUJITA Tomonori
  0 siblings, 1 reply; 6+ messages in thread
From: Wyllys Ingersoll @ 2014-08-23 13:46 UTC (permalink / raw)
  To: Dan Mick; +Cc: stgt

Hey Dan, its always good to hear from another former Sun eng.

I saw your blog posts about the RBD backend and it seems to work as
advertised.  I don't know if the rados aio calls will make it better
or not.  I do know that it is possible to get better throughput using
the standard IO functions (the ones you used) because all of the
non-iSCSI tests we've done just use the standard rbd_read/rbd_write
calls just like bs_rbd.  It might be an architectural limitation in
tgtd, but I am not familiar enough with the tgtd code yet to know
where to look for the bottlenecks.  I tried changing the timing
interval in 'work.c' to be much shorter, but it didn't make a
difference.  When I enabled some debug logging, I see that the maximum
data size that the tgtd read or write operation handles is only 128Kb,
which is something that may be worth tracking down, perhaps theres a
way to make it ask for bigger chunks of data.  We've even tried
changing the replication on the ceph pool to 1 (just for testing
purposes) to eliminate the duplication from the equation, but its not
making much difference in this situation.

I might try modifying the code to use the aio calls to see how it
goes, it might yield some interesting results if I add some timing
measurements, and maybe it will end up being faster.

Any suggestions for further ceph tuning or other areas in tgtd to look
at for possible problems?

thanks,
  Wyllys

On Fri, Aug 22, 2014 at 6:17 PM, Dan Mick <dan.mick@inktank.com> wrote:
> Hello, name from a past life...
>
> I wrote the original port to rbd, and there was very little attempt to
> even consider performance, and certainly no study; it was and is a
> proof-of-concept.  I don't know offhand what may be at fault, but I know
> it's a target-rich environment, because no one's ever gone hunting at
> all to my knowledge.
>
> Several have recommended making use of Ceph async interfaces; I don't
> know how much of a win this would be, because stgt already has a pool of
> worker threads for outstanding requests.  I also don't know how hard it
> is to monitor things like thread utilization inside stgt.
>
> but I'm interested in the subject and can help answer Ceph questions if
> you have them.
>
> On 08/22/2014 05:55 AM, Wyllys Ingersoll wrote:
>> Im seeing some disappointing performance numbers using the bs_rbd backend
>> with a Ceph RBD pool backend over a 10GB Ethernet link.
>>
>> Read operations appear to max out at about 100MB/second, regardless of
>> block size or amount of data being read and write operations fare much
>> worse, maxing out somewhere in the 40MB/second range.   Any ideas why this
>> would be so limited?
>>
>> I've tested using 'fio' as well as some other perf testing utilities.  On
>> the same link, talking to the same ceph pool/image, using librados directly
>> (either through the C or Python bindings), the read performance is 5-8x
>> faster and write performance is 2-3x faster.
>>
>> Any suggestions as to how to tune the iSCSI or bs_rbd interface to perform
>> better?
>>
>> thanks,
>>   Wyllys Ingersoll
>> --
>> To unsubscribe from this list: send the line "unsubscribe stgt" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: poor rbd performance
  2014-08-23 13:46   ` Wyllys Ingersoll
@ 2014-08-24 23:25     ` FUJITA Tomonori
  2014-08-25  3:38       ` Wyllys Ingersoll
  0 siblings, 1 reply; 6+ messages in thread
From: FUJITA Tomonori @ 2014-08-24 23:25 UTC (permalink / raw)
  To: wyllys.ingersoll; +Cc: dan.mick, stgt

Hello,

On Sat, 23 Aug 2014 09:46:37 -0400
Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote:

> Hey Dan, its always good to hear from another former Sun eng.
> 
> I saw your blog posts about the RBD backend and it seems to work as
> advertised.  I don't know if the rados aio calls will make it better
> or not.  I do know that it is possible to get better throughput using
> the standard IO functions (the ones you used) because all of the
> non-iSCSI tests we've done just use the standard rbd_read/rbd_write
> calls just like bs_rbd.  It might be an architectural limitation in
> tgtd, but I am not familiar enough with the tgtd code yet to know
> where to look for the bottlenecks.  I tried changing the timing
> interval in 'work.c' to be much shorter, but it didn't make a
> difference.  When I enabled some debug logging, I see that the maximum
> data size that the tgtd read or write operation handles is only
> 128Kb,

You mean that the read/write size from iSCSI initiators is 128K? If
so, probably it's due to your configuration. tgt doesn't have such
limit. iSCSI has lots of negotiation parameters between an initiator
and a target, which affects the performance. You need to configure
both properly to get good performance.


> which is something that may be worth tracking down, perhaps theres a
> way to make it ask for bigger chunks of data.  We've even tried
> changing the replication on the ceph pool to 1 (just for testing
> purposes) to eliminate the duplication from the equation, but its not
> making much difference in this situation.
> 
> I might try modifying the code to use the aio calls to see how it
> goes, it might yield some interesting results if I add some timing
> measurements, and maybe it will end up being faster.
> 
> Any suggestions for further ceph tuning or other areas in tgtd to look
> at for possible problems?

I would suggest you to work on a simpler backend like bs_null to know
the best performance of tgt on your env. I think that the first step
is knowing where is the bottleneck.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: poor rbd performance
  2014-08-24 23:25     ` FUJITA Tomonori
@ 2014-08-25  3:38       ` Wyllys Ingersoll
  2014-08-26 18:23         ` Wyllys Ingersoll
  0 siblings, 1 reply; 6+ messages in thread
From: Wyllys Ingersoll @ 2014-08-25  3:38 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: Dan Mick, stgt

On Sun, Aug 24, 2014 at 7:25 PM, FUJITA Tomonori
<fujita.tomonori@lab.ntt.co.jp> wrote:
> Hello,
>
> On Sat, 23 Aug 2014 09:46:37 -0400
> Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote:
>
>> Hey Dan, its always good to hear from another former Sun eng.
>>
>> I saw your blog posts about the RBD backend and it seems to work as
>> advertised.  I don't know if the rados aio calls will make it better
>> or not.  I do know that it is possible to get better throughput using
>> the standard IO functions (the ones you used) because all of the
>> non-iSCSI tests we've done just use the standard rbd_read/rbd_write
>> calls just like bs_rbd.  It might be an architectural limitation in
>> tgtd, but I am not familiar enough with the tgtd code yet to know
>> where to look for the bottlenecks.  I tried changing the timing
>> interval in 'work.c' to be much shorter, but it didn't make a
>> difference.  When I enabled some debug logging, I see that the maximum
>> data size that the tgtd read or write operation handles is only
>> 128Kb,
>
> You mean that the read/write size from iSCSI initiators is 128K? If
> so, probably it's due to your configuration. tgt doesn't have such
> limit. iSCSI has lots of negotiation parameters between an initiator
> and a target, which affects the performance. You need to configure
> both properly to get good performance.



I believe the initiator is from open-iSCSI.  I have tried changing
some settings but apparently haven't found the right ones that will
make a difference.



>> Any suggestions for further ceph tuning or other areas in tgtd to look
>> at for possible problems?
>
> I would suggest you to work on a simpler backend like bs_null to know
> the best performance of tgt on your env. I think that the first step
> is knowing where is the bottleneck.


Thanks, I'll take a look.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: poor rbd performance
  2014-08-25  3:38       ` Wyllys Ingersoll
@ 2014-08-26 18:23         ` Wyllys Ingersoll
  0 siblings, 0 replies; 6+ messages in thread
From: Wyllys Ingersoll @ 2014-08-26 18:23 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: Dan Mick, stgt

I dont think using RBD AIO makes much difference.  The bs worker
thread ultimately has to block until the backend request handler
completes before it can put the cmd structure back in the queue  (see:
 bs_thread_worker_fn in bs.c).  If the request handler function in
bs_rbd returns before the asynchronous call has actually completed,
the data gets corrupted and it eventually crashes.

So, changing to using the rados aio functions didnt matter, but
bumping up the nr_threads from 16 to 64 made a significant difference.
On a multi-processor/multi-core system it might make sense to raise
that value way up to maximize the request handling.

On Sun, Aug 24, 2014 at 11:38 PM, Wyllys Ingersoll
<wyllys.ingersoll@keepertech.com> wrote:
> On Sun, Aug 24, 2014 at 7:25 PM, FUJITA Tomonori
> <fujita.tomonori@lab.ntt.co.jp> wrote:
>> Hello,
>>
>> On Sat, 23 Aug 2014 09:46:37 -0400
>> Wyllys Ingersoll <wyllys.ingersoll@keepertech.com> wrote:
>>
>>> Hey Dan, its always good to hear from another former Sun eng.
>>>
>>> I saw your blog posts about the RBD backend and it seems to work as
>>> advertised.  I don't know if the rados aio calls will make it better
>>> or not.  I do know that it is possible to get better throughput using
>>> the standard IO functions (the ones you used) because all of the
>>> non-iSCSI tests we've done just use the standard rbd_read/rbd_write
>>> calls just like bs_rbd.  It might be an architectural limitation in
>>> tgtd, but I am not familiar enough with the tgtd code yet to know
>>> where to look for the bottlenecks.  I tried changing the timing
>>> interval in 'work.c' to be much shorter, but it didn't make a
>>> difference.  When I enabled some debug logging, I see that the maximum
>>> data size that the tgtd read or write operation handles is only
>>> 128Kb,
>>
>> You mean that the read/write size from iSCSI initiators is 128K? If
>> so, probably it's due to your configuration. tgt doesn't have such
>> limit. iSCSI has lots of negotiation parameters between an initiator
>> and a target, which affects the performance. You need to configure
>> both properly to get good performance.
>
>
>
> I believe the initiator is from open-iSCSI.  I have tried changing
> some settings but apparently haven't found the right ones that will
> make a difference.
>
>
>
>>> Any suggestions for further ceph tuning or other areas in tgtd to look
>>> at for possible problems?
>>
>> I would suggest you to work on a simpler backend like bs_null to know
>> the best performance of tgt on your env. I think that the first step
>> is knowing where is the bottleneck.
>
>
> Thanks, I'll take a look.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-26 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-22 12:55 poor rbd performance Wyllys Ingersoll
2014-08-22 22:17 ` Dan Mick
2014-08-23 13:46   ` Wyllys Ingersoll
2014-08-24 23:25     ` FUJITA Tomonori
2014-08-25  3:38       ` Wyllys Ingersoll
2014-08-26 18:23         ` Wyllys Ingersoll

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.