All of lore.kernel.org
 help / color / mirror / Atom feed
* radosrgw performance problems
@ 2013-06-11 13:27 Jäger, Philipp
  2013-06-11 14:38 ` Mark Nelson
  0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-11 13:27 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hello,

we have a performance problem with radosrgw.
Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. 
(2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s)
But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.

Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2

So have you any idea why the rgw is so slow? How can we identify where the problem is?

(I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)

Ceph.conf- rgw part:

[client.radosgw.connect2]
host = hcrgwko2
rgw socket path = /tmp/connect2.sock
log file = /var/log/ceph/connect2.log
rgw dns name =  FQDN

Thank you very much.


Regards

Philipp

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: radosrgw performance problems
  2013-06-11 13:27 radosrgw performance problems Jäger, Philipp
@ 2013-06-11 14:38 ` Mark Nelson
  2013-06-12  8:22   ` AW: " Jäger, Philipp
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-11 14:38 UTC (permalink / raw)
  To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org

On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s)
> But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.

One thing to check is to make sure that the rgw pool you are writing to 
has enough placement groups for your cluster.  The default may be 
extremely low.

>
> Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?

RBD is pretty streamlined so you can get good performance with it.  On 
my test setup I'm seeing 80-90% of the performance of raw rados object 
writes/reads (and in some cases much faster with RBD cache enabled!). 
RGW, Apache, fastcgi, and simply the requirements of supporting the S3 
protocol itself add a lot of overhead.  MD5 calculations by themselves 
start chewing up a ton of CPU once you try to support high throughput 
scenarios and there is a non-trivial amount of extra latency added as 
well.  You may be able to improve things with some tweaks, but I 
wouldn't be surprised if RBD is always going to be faster to an extent.

For folks who want really fast object storage I think directly utilizing 
rados is probably the way to go, but that requires modifying the app and 
it's not for everyone.

>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)

connecting with the admin daemon should still be supported. 
Documentation is here:

http://ceph.com/docs/next/radosgw/troubleshooting/

If this doesn't work please let me know!

Also, I've created a bug for the rest-bench issue:

http://tracker.ceph.com/issues/5302

Personally I've been using swift-bench for most of my recent rgw testing.

Mark

>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name =  FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: radosrgw performance problems
  2013-06-11 14:38 ` Mark Nelson
@ 2013-06-12  8:22   ` Jäger, Philipp
  2013-06-12 10:52     ` Jäger, Philipp
  0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12  8:22 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

Hello,

i've added my answers below.

Thanks

Regards

Philipp

-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com] 
Gesendet: Dienstag, 11. Juni 2013 16:38
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: radosrgw performance problems

On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same 
> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.

One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.

[Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds) 

>
> Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?

RBD is pretty streamlined so you can get good performance with it.  On 
my test setup I'm seeing 80-90% of the performance of raw rados object 
writes/reads (and in some cases much faster with RBD cache enabled!). 
RGW, Apache, fastcgi, and simply the requirements of supporting the S3 
protocol itself add a lot of overhead.  MD5 calculations by themselves 
start chewing up a ton of CPU once you try to support high throughput 
scenarios and there is a non-trivial amount of extra latency added as 
well.  You may be able to improve things with some tweaks, but I 
wouldn't be surprised if RBD is always going to be faster to an extent.

[Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
So I think we are not talking about tweaking, rather a general problem?


For folks who want really fast object storage I think directly utilizing 
rados is probably the way to go, but that requires modifying the app and 
it's not for everyone.

>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)

connecting with the admin daemon should still be supported. 
Documentation is here:

http://ceph.com/docs/next/radosgw/troubleshooting/

If this doesn't work please let me know!

[Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?


Also, I've created a bug for the rest-bench issue:

http://tracker.ceph.com/issues/5302

Personally I've been using swift-bench for most of my recent rgw testing.

Mark

>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name =  FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: radosrgw performance problems
  2013-06-12  8:22   ` AW: " Jäger, Philipp
@ 2013-06-12 10:52     ` Jäger, Philipp
  2013-06-12 14:53       ` Mark Nelson
  0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12 10:52 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

Hello,

identified the problem.

When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s)
Have you experience with SSL? Is this normal?

Thanks

Regards



-----Ursprüngliche Nachricht-----
Von: Jäger, Philipp 
Gesendet: Mittwoch, 12. Juni 2013 10:22
An: 'Mark Nelson'
Cc: ceph-devel@vger.kernel.org
Betreff: AW: radosrgw performance problems

Hello,

i've added my answers below.

Thanks

Regards

Philipp

-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com]
Gesendet: Dienstag, 11. Juni 2013 16:38
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: radosrgw performance problems

On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.

One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.

[Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds) 

>
> Same speed with the inktank apache/fastcgi and the original one. 
> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?

RBD is pretty streamlined so you can get good performance with it.  On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). 
RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead.  MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well.  You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.

[Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
So I think we are not talking about tweaking, rather a general problem?


For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.

>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)

connecting with the admin daemon should still be supported. 
Documentation is here:

http://ceph.com/docs/next/radosgw/troubleshooting/

If this doesn't work please let me know!

[Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?


Also, I've created a bug for the rest-bench issue:

http://tracker.ceph.com/issues/5302

Personally I've been using swift-bench for most of my recent rgw testing.

Mark

>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name =  FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AW: radosrgw performance problems
  2013-06-12 10:52     ` Jäger, Philipp
@ 2013-06-12 14:53       ` Mark Nelson
  2013-06-12 15:14         ` AW: " Jäger, Philipp
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-12 14:53 UTC (permalink / raw)
  To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org

Interesting.  Was Apache using excessive CPU?  Do your processors and 
libraries support AES-NI?  Seems strange that at this level that would 
be the limiting factor, but I've seen stranger things...  Glad you 
figured it out!

Mark

On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
> Hello,
>
> identified the problem.
>
> When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s)
> Have you experience with SSL? Is this normal?
>
> Thanks
>
> Regards
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jäger, Philipp
> Gesendet: Mittwoch, 12. Juni 2013 10:22
> An: 'Mark Nelson'
> Cc: ceph-devel@vger.kernel.org
> Betreff: AW: radosrgw performance problems
>
> Hello,
>
> i've added my answers below.
>
> Thanks
>
> Regards
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Dienstag, 11. Juni 2013 16:38
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: radosrgw performance problems
>
> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> we have a performance problem with radosrgw.
>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>
> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.
>
> [Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds)
>
>>
>> Same speed with the inktank apache/fastcgi and the original one.
>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>
>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>
> RBD is pretty streamlined so you can get good performance with it.  On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead.  MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well.  You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>
> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
> So I think we are not talking about tweaking, rather a general problem?
>
>
> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>
>>
>> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
>
> connecting with the admin daemon should still be supported.
> Documentation is here:
>
> http://ceph.com/docs/next/radosgw/troubleshooting/
>
> If this doesn't work please let me know!
>
> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?
>
>
> Also, I've created a bug for the rest-bench issue:
>
> http://tracker.ceph.com/issues/5302
>
> Personally I've been using swift-bench for most of my recent rgw testing.
>
> Mark
>
>>
>> Ceph.conf- rgw part:
>>
>> [client.radosgw.connect2]
>> host = hcrgwko2
>> rgw socket path = /tmp/connect2.sock
>> log file = /var/log/ceph/connect2.log
>> rgw dns name =  FQDN
>>
>> Thank you very much.
>>
>>
>> Regards
>>
>> Philipp
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: AW: radosrgw performance problems
  2013-06-12 14:53       ` Mark Nelson
@ 2013-06-12 15:14         ` Jäger, Philipp
  2013-06-12 16:42           ` Mark Nelson
  0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12 15:14 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

No, not really:

30239 www-data  20   0  751m 7556 2036 S   10  0.4   0:03.67 apache2
 1955 root      20   0 2048m  10m 4352 S    6  0.5   2:14.54 radosgw

10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.

Another perf question:

As I said we can write about 170mb/s with the rados bench:
rados bench -p test 100 write:
Bandwidth (MB/sec):     171.744.

With rbd or rgw (w/o https) we get less than 40mb/s:
(time rados -p connect put 600mb.iso 600mb.iso
real    0m15.846s
 user    0m0.640s
sys     0m0.836s)

I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.

Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...


Thank you very much so far.

Philipp

-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com] 
Gesendet: Mittwoch, 12. Juni 2013 16:53
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: AW: radosrgw performance problems

Interesting.  Was Apache using excessive CPU?  Do your processors and libraries support AES-NI?  Seems strange that at this level that would be the limiting factor, but I've seen stranger things...  Glad you figured it out!

Mark

On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
> Hello,
>
> identified the problem.
>
> When I deactivate SSL in Apache Config, and connect via http, I get 
> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>
> Thanks
>
> Regards
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jäger, Philipp
> Gesendet: Mittwoch, 12. Juni 2013 10:22
> An: 'Mark Nelson'
> Cc: ceph-devel@vger.kernel.org
> Betreff: AW: radosrgw performance problems
>
> Hello,
>
> i've added my answers below.
>
> Thanks
>
> Regards
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Dienstag, 11. Juni 2013 16:38
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: radosrgw performance problems
>
> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> we have a performance problem with radosrgw.
>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>
> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.
>
> [Philipp] We don't use standard pool, new pool with 1500pg, same 
> problem. (30 osds)
>
>>
>> Same speed with the inktank apache/fastcgi and the original one.
>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>
>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>
> RBD is pretty streamlined so you can get good performance with it.  On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead.  MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well.  You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>
> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
> So I think we are not talking about tweaking, rather a general problem?
>
>
> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>
>>
>> (I've heard something about the rgw admin socket to check 
>> perfcounters, but it seems that this is deprecated? Because when i 
>> type ceph --admin-daemon ... it says unknown command and I cannot 
>> find it in the ceph docu. Then i wanted to bench via rest-bench, but 
>> it says "ERROR: failed to create bucket: XmlParseFailure -failed 
>> initializing benchmark", so I could not bench the speed.)
>
> connecting with the admin daemon should still be supported.
> Documentation is here:
>
> http://ceph.com/docs/next/radosgw/troubleshooting/
>
> If this doesn't work please let me know!
>
> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?
>
>
> Also, I've created a bug for the rest-bench issue:
>
> http://tracker.ceph.com/issues/5302
>
> Personally I've been using swift-bench for most of my recent rgw testing.
>
> Mark
>
>>
>> Ceph.conf- rgw part:
>>
>> [client.radosgw.connect2]
>> host = hcrgwko2
>> rgw socket path = /tmp/connect2.sock
>> log file = /var/log/ceph/connect2.log rgw dns name =  FQDN
>>
>> Thank you very much.
>>
>>
>> Regards
>>
>> Philipp
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AW: AW: radosrgw performance problems
  2013-06-12 15:14         ` AW: " Jäger, Philipp
@ 2013-06-12 16:42           ` Mark Nelson
  2013-06-13 11:04             ` AW: " Jäger, Philipp
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-12 16:42 UTC (permalink / raw)
  To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org

I spoke to Yehuda (who develops RGW), and he mentioned that it may be 
latency due to SSL handshake.  How big are the objects you are writing?

with RBD, I can do much better than 40% of the rados throughput, but it 
takes a lot of concurrency.  I use fio with libaio, direct=1, 4MB 
writes, and a high iodepth on multiple volumes to get there.  Btw, rados 
bench by default is going to keep 16 objects in flight too.

Mark

On 06/12/2013 10:14 AM, Jäger, Philipp wrote:
> No, not really:
>
> 30239 www-data  20   0  751m 7556 2036 S   10  0.4   0:03.67 apache2
>   1955 root      20   0 2048m  10m 4352 S    6  0.5   2:14.54 radosgw
>
> 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
> We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.
>
> Another perf question:
>
> As I said we can write about 170mb/s with the rados bench:
> rados bench -p test 100 write:
> Bandwidth (MB/sec):     171.744.
>
> With rbd or rgw (w/o https) we get less than 40mb/s:
> (time rados -p connect put 600mb.iso 600mb.iso
> real    0m15.846s
>   user    0m0.640s
> sys     0m0.836s)
>
> I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.
>
> Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...
>
>
> Thank you very much so far.
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Mittwoch, 12. Juni 2013 16:53
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: AW: radosrgw performance problems
>
> Interesting.  Was Apache using excessive CPU?  Do your processors and libraries support AES-NI?  Seems strange that at this level that would be the limiting factor, but I've seen stranger things...  Glad you figured it out!
>
> Mark
>
> On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> identified the problem.
>>
>> When I deactivate SSL in Apache Config, and connect via http, I get
>> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>>
>> Thanks
>>
>> Regards
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Jäger, Philipp
>> Gesendet: Mittwoch, 12. Juni 2013 10:22
>> An: 'Mark Nelson'
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: AW: radosrgw performance problems
>>
>> Hello,
>>
>> i've added my answers below.
>>
>> Thanks
>>
>> Regards
>>
>> Philipp
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Gesendet: Dienstag, 11. Juni 2013 16:38
>> An: Jäger, Philipp
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: Re: radosrgw performance problems
>>
>> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>>> Hello,
>>>
>>> we have a performance problem with radosrgw.
>>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>>
>> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.
>>
>> [Philipp] We don't use standard pool, new pool with 1500pg, same
>> problem. (30 osds)
>>
>>>
>>> Same speed with the inktank apache/fastcgi and the original one.
>>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>>
>>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>>
>> RBD is pretty streamlined so you can get good performance with it.  On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
>> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead.  MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well.  You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>>
>> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
>> So I think we are not talking about tweaking, rather a general problem?
>>
>>
>> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>>
>>>
>>> (I've heard something about the rgw admin socket to check
>>> perfcounters, but it seems that this is deprecated? Because when i
>>> type ceph --admin-daemon ... it says unknown command and I cannot
>>> find it in the ceph docu. Then i wanted to bench via rest-bench, but
>>> it says "ERROR: failed to create bucket: XmlParseFailure -failed
>>> initializing benchmark", so I could not bench the speed.)
>>
>> connecting with the admin daemon should still be supported.
>> Documentation is here:
>>
>> http://ceph.com/docs/next/radosgw/troubleshooting/
>>
>> If this doesn't work please let me know!
>>
>> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?
>>
>>
>> Also, I've created a bug for the rest-bench issue:
>>
>> http://tracker.ceph.com/issues/5302
>>
>> Personally I've been using swift-bench for most of my recent rgw testing.
>>
>> Mark
>>
>>>
>>> Ceph.conf- rgw part:
>>>
>>> [client.radosgw.connect2]
>>> host = hcrgwko2
>>> rgw socket path = /tmp/connect2.sock
>>> log file = /var/log/ceph/connect2.log rgw dns name =  FQDN
>>>
>>> Thank you very much.
>>>
>>>
>>> Regards
>>>
>>> Philipp
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* AW: AW: AW: radosrgw performance problems
  2013-06-12 16:42           ` Mark Nelson
@ 2013-06-13 11:04             ` Jäger, Philipp
  0 siblings, 0 replies; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-13 11:04 UTC (permalink / raw)
  To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org

Hello,

I upload 1 big iso, about 8-9mb/s. So not thousand small files. Cluster is still in a test environment.


Apache config:

FastCgiExternalServer /var/www/connect2.fcgi -socket /tmp/connect2.sock

<VirtualHost *:443 >
        ServerName foo.cgm.ag
        ServerAlias *.foo.cgm.ag
# hcrgwko2
        ServerAdmin foo
        DocumentRoot /var/www
KeepAlive off
        SSLEngine on
        SSLCertificateFile /etc/apache2/ssl/foo.cert
        SSLCertificateKeyFile /etc/apache2/ssl/foo.key

RewriteEngine On
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /connect2.fcgi?page=$1&params=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
<IfModule mod_fastcgi.c>
                <Directory /var/www>
                        Options +ExecCGI
                        AllowOverride All
                        SetHandler fastcgi-script
                        Order allow,deny
                        Allow from all
                        AuthBasicAuthoritative Off
                </Directory>
        </IfModule>

        AllowEncodedSlashes On
        ErrorLog /var/log/apache2/error.log
        CustomLog /var/log/apache2/access.log combined
        ServerSignature Off
</VirtualHost>


Thanks

Philipp


-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com] 
Gesendet: Mittwoch, 12. Juni 2013 18:43
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: AW: AW: radosrgw performance problems

I spoke to Yehuda (who develops RGW), and he mentioned that it may be latency due to SSL handshake.  How big are the objects you are writing?

with RBD, I can do much better than 40% of the rados throughput, but it takes a lot of concurrency.  I use fio with libaio, direct=1, 4MB writes, and a high iodepth on multiple volumes to get there.  Btw, rados bench by default is going to keep 16 objects in flight too.

Mark

On 06/12/2013 10:14 AM, Jäger, Philipp wrote:
> No, not really:
>
> 30239 www-data  20   0  751m 7556 2036 S   10  0.4   0:03.67 apache2
>   1955 root      20   0 2048m  10m 4352 S    6  0.5   2:14.54 radosgw
>
> 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
> We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.
>
> Another perf question:
>
> As I said we can write about 170mb/s with the rados bench:
> rados bench -p test 100 write:
> Bandwidth (MB/sec):     171.744.
>
> With rbd or rgw (w/o https) we get less than 40mb/s:
> (time rados -p connect put 600mb.iso 600mb.iso
> real    0m15.846s
>   user    0m0.640s
> sys     0m0.836s)
>
> I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.
>
> Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...
>
>
> Thank you very much so far.
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Mittwoch, 12. Juni 2013 16:53
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: AW: radosrgw performance problems
>
> Interesting.  Was Apache using excessive CPU?  Do your processors and libraries support AES-NI?  Seems strange that at this level that would be the limiting factor, but I've seen stranger things...  Glad you figured it out!
>
> Mark
>
> On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> identified the problem.
>>
>> When I deactivate SSL in Apache Config, and connect via http, I get 
>> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>>
>> Thanks
>>
>> Regards
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Jäger, Philipp
>> Gesendet: Mittwoch, 12. Juni 2013 10:22
>> An: 'Mark Nelson'
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: AW: radosrgw performance problems
>>
>> Hello,
>>
>> i've added my answers below.
>>
>> Thanks
>>
>> Regards
>>
>> Philipp
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Gesendet: Dienstag, 11. Juni 2013 16:38
>> An: Jäger, Philipp
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: Re: radosrgw performance problems
>>
>> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>>> Hello,
>>>
>>> we have a performance problem with radosrgw.
>>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>>
>> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster.  The default may be extremely low.
>>
>> [Philipp] We don't use standard pool, new pool with 1500pg, same 
>> problem. (30 osds)
>>
>>>
>>> Same speed with the inktank apache/fastcgi and the original one.
>>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>>
>>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>>
>> RBD is pretty streamlined so you can get good performance with it.  On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
>> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead.  MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well.  You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>>
>> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec):     171.744.
>> So I think we are not talking about tweaking, rather a general problem?
>>
>>
>> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>>
>>>
>>> (I've heard something about the rgw admin socket to check 
>>> perfcounters, but it seems that this is deprecated? Because when i 
>>> type ceph --admin-daemon ... it says unknown command and I cannot 
>>> find it in the ceph docu. Then i wanted to bench via rest-bench, but 
>>> it says "ERROR: failed to create bucket: XmlParseFailure -failed 
>>> initializing benchmark", so I could not bench the speed.)
>>
>> connecting with the admin daemon should still be supported.
>> Documentation is here:
>>
>> http://ceph.com/docs/next/radosgw/troubleshooting/
>>
>> If this doesn't work please let me know!
>>
>> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf?  The admin socket is not the "rgw socket path" I think?
>>
>>
>> Also, I've created a bug for the rest-bench issue:
>>
>> http://tracker.ceph.com/issues/5302
>>
>> Personally I've been using swift-bench for most of my recent rgw testing.
>>
>> Mark
>>
>>>
>>> Ceph.conf- rgw part:
>>>
>>> [client.radosgw.connect2]
>>> host = hcrgwko2
>>> rgw socket path = /tmp/connect2.sock log file = 
>>> /var/log/ceph/connect2.log rgw dns name =  FQDN
>>>
>>> Thank you very much.
>>>
>>>
>>> Regards
>>>
>>> Philipp
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-06-13 11:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-11 13:27 radosrgw performance problems Jäger, Philipp
2013-06-11 14:38 ` Mark Nelson
2013-06-12  8:22   ` AW: " Jäger, Philipp
2013-06-12 10:52     ` Jäger, Philipp
2013-06-12 14:53       ` Mark Nelson
2013-06-12 15:14         ` AW: " Jäger, Philipp
2013-06-12 16:42           ` Mark Nelson
2013-06-13 11:04             ` AW: " Jäger, Philipp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.