* radosrgw performance problems
@ 2013-06-11 13:27 Jäger, Philipp
2013-06-11 14:38 ` Mark Nelson
0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-11 13:27 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Hello,
we have a performance problem with radosrgw.
Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
(2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s)
But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
So have you any idea why the rgw is so slow? How can we identify where the problem is?
(I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
Ceph.conf- rgw part:
[client.radosgw.connect2]
host = hcrgwko2
rgw socket path = /tmp/connect2.sock
log file = /var/log/ceph/connect2.log
rgw dns name = FQDN
Thank you very much.
Regards
Philipp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: radosrgw performance problems
2013-06-11 13:27 radosrgw performance problems Jäger, Philipp
@ 2013-06-11 14:38 ` Mark Nelson
2013-06-12 8:22 ` AW: " Jäger, Philipp
0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-11 14:38 UTC (permalink / raw)
To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org
On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s)
> But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
One thing to check is to make sure that the rgw pool you are writing to
has enough placement groups for your cluster. The default may be
extremely low.
>
> Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?
RBD is pretty streamlined so you can get good performance with it. On
my test setup I'm seeing 80-90% of the performance of raw rados object
writes/reads (and in some cases much faster with RBD cache enabled!).
RGW, Apache, fastcgi, and simply the requirements of supporting the S3
protocol itself add a lot of overhead. MD5 calculations by themselves
start chewing up a ton of CPU once you try to support high throughput
scenarios and there is a non-trivial amount of extra latency added as
well. You may be able to improve things with some tweaks, but I
wouldn't be surprised if RBD is always going to be faster to an extent.
For folks who want really fast object storage I think directly utilizing
rados is probably the way to go, but that requires modifying the app and
it's not for everyone.
>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
connecting with the admin daemon should still be supported.
Documentation is here:
http://ceph.com/docs/next/radosgw/troubleshooting/
If this doesn't work please let me know!
Also, I've created a bug for the rest-bench issue:
http://tracker.ceph.com/issues/5302
Personally I've been using swift-bench for most of my recent rgw testing.
Mark
>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name = FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: radosrgw performance problems
2013-06-11 14:38 ` Mark Nelson
@ 2013-06-12 8:22 ` Jäger, Philipp
2013-06-12 10:52 ` Jäger, Philipp
0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12 8:22 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Hello,
i've added my answers below.
Thanks
Regards
Philipp
-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com]
Gesendet: Dienstag, 11. Juni 2013 16:38
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: radosrgw performance problems
On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
[Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds)
>
> Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?
RBD is pretty streamlined so you can get good performance with it. On
my test setup I'm seeing 80-90% of the performance of raw rados object
writes/reads (and in some cases much faster with RBD cache enabled!).
RGW, Apache, fastcgi, and simply the requirements of supporting the S3
protocol itself add a lot of overhead. MD5 calculations by themselves
start chewing up a ton of CPU once you try to support high throughput
scenarios and there is a non-trivial amount of extra latency added as
well. You may be able to improve things with some tweaks, but I
wouldn't be surprised if RBD is always going to be faster to an extent.
[Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
So I think we are not talking about tweaking, rather a general problem?
For folks who want really fast object storage I think directly utilizing
rados is probably the way to go, but that requires modifying the app and
it's not for everyone.
>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
connecting with the admin daemon should still be supported.
Documentation is here:
http://ceph.com/docs/next/radosgw/troubleshooting/
If this doesn't work please let me know!
[Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
Also, I've created a bug for the rest-bench issue:
http://tracker.ceph.com/issues/5302
Personally I've been using swift-bench for most of my recent rgw testing.
Mark
>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name = FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: radosrgw performance problems
2013-06-12 8:22 ` AW: " Jäger, Philipp
@ 2013-06-12 10:52 ` Jäger, Philipp
2013-06-12 14:53 ` Mark Nelson
0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12 10:52 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Hello,
identified the problem.
When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s)
Have you experience with SSL? Is this normal?
Thanks
Regards
-----Ursprüngliche Nachricht-----
Von: Jäger, Philipp
Gesendet: Mittwoch, 12. Juni 2013 10:22
An: 'Mark Nelson'
Cc: ceph-devel@vger.kernel.org
Betreff: AW: radosrgw performance problems
Hello,
i've added my answers below.
Thanks
Regards
Philipp
-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com]
Gesendet: Dienstag, 11. Juni 2013 16:38
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: radosrgw performance problems
On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
> Hello,
>
> we have a performance problem with radosrgw.
> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
[Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds)
>
> Same speed with the inktank apache/fastcgi and the original one.
> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>
> So have you any idea why the rgw is so slow? How can we identify where the problem is?
RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
[Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
So I think we are not talking about tweaking, rather a general problem?
For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>
> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
connecting with the admin daemon should still be supported.
Documentation is here:
http://ceph.com/docs/next/radosgw/troubleshooting/
If this doesn't work please let me know!
[Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
Also, I've created a bug for the rest-bench issue:
http://tracker.ceph.com/issues/5302
Personally I've been using swift-bench for most of my recent rgw testing.
Mark
>
> Ceph.conf- rgw part:
>
> [client.radosgw.connect2]
> host = hcrgwko2
> rgw socket path = /tmp/connect2.sock
> log file = /var/log/ceph/connect2.log
> rgw dns name = FQDN
>
> Thank you very much.
>
>
> Regards
>
> Philipp
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AW: radosrgw performance problems
2013-06-12 10:52 ` Jäger, Philipp
@ 2013-06-12 14:53 ` Mark Nelson
2013-06-12 15:14 ` AW: " Jäger, Philipp
0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-12 14:53 UTC (permalink / raw)
To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org
Interesting. Was Apache using excessive CPU? Do your processors and
libraries support AES-NI? Seems strange that at this level that would
be the limiting factor, but I've seen stranger things... Glad you
figured it out!
Mark
On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
> Hello,
>
> identified the problem.
>
> When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s)
> Have you experience with SSL? Is this normal?
>
> Thanks
>
> Regards
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jäger, Philipp
> Gesendet: Mittwoch, 12. Juni 2013 10:22
> An: 'Mark Nelson'
> Cc: ceph-devel@vger.kernel.org
> Betreff: AW: radosrgw performance problems
>
> Hello,
>
> i've added my answers below.
>
> Thanks
>
> Regards
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Dienstag, 11. Juni 2013 16:38
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: radosrgw performance problems
>
> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> we have a performance problem with radosrgw.
>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>
> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
>
> [Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds)
>
>>
>> Same speed with the inktank apache/fastcgi and the original one.
>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>
>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>
> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>
> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
> So I think we are not talking about tweaking, rather a general problem?
>
>
> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>
>>
>> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.)
>
> connecting with the admin daemon should still be supported.
> Documentation is here:
>
> http://ceph.com/docs/next/radosgw/troubleshooting/
>
> If this doesn't work please let me know!
>
> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
>
>
> Also, I've created a bug for the rest-bench issue:
>
> http://tracker.ceph.com/issues/5302
>
> Personally I've been using swift-bench for most of my recent rgw testing.
>
> Mark
>
>>
>> Ceph.conf- rgw part:
>>
>> [client.radosgw.connect2]
>> host = hcrgwko2
>> rgw socket path = /tmp/connect2.sock
>> log file = /var/log/ceph/connect2.log
>> rgw dns name = FQDN
>>
>> Thank you very much.
>>
>>
>> Regards
>>
>> Philipp
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: AW: radosrgw performance problems
2013-06-12 14:53 ` Mark Nelson
@ 2013-06-12 15:14 ` Jäger, Philipp
2013-06-12 16:42 ` Mark Nelson
0 siblings, 1 reply; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-12 15:14 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
No, not really:
30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2
1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw
10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.
Another perf question:
As I said we can write about 170mb/s with the rados bench:
rados bench -p test 100 write:
Bandwidth (MB/sec): 171.744.
With rbd or rgw (w/o https) we get less than 40mb/s:
(time rados -p connect put 600mb.iso 600mb.iso
real 0m15.846s
user 0m0.640s
sys 0m0.836s)
I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.
Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...
Thank you very much so far.
Philipp
-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com]
Gesendet: Mittwoch, 12. Juni 2013 16:53
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: AW: radosrgw performance problems
Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out!
Mark
On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
> Hello,
>
> identified the problem.
>
> When I deactivate SSL in Apache Config, and connect via http, I get
> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>
> Thanks
>
> Regards
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Jäger, Philipp
> Gesendet: Mittwoch, 12. Juni 2013 10:22
> An: 'Mark Nelson'
> Cc: ceph-devel@vger.kernel.org
> Betreff: AW: radosrgw performance problems
>
> Hello,
>
> i've added my answers below.
>
> Thanks
>
> Regards
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Dienstag, 11. Juni 2013 16:38
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: radosrgw performance problems
>
> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> we have a performance problem with radosrgw.
>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>
> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
>
> [Philipp] We don't use standard pool, new pool with 1500pg, same
> problem. (30 osds)
>
>>
>> Same speed with the inktank apache/fastcgi and the original one.
>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>
>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>
> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>
> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
> So I think we are not talking about tweaking, rather a general problem?
>
>
> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>
>>
>> (I've heard something about the rgw admin socket to check
>> perfcounters, but it seems that this is deprecated? Because when i
>> type ceph --admin-daemon ... it says unknown command and I cannot
>> find it in the ceph docu. Then i wanted to bench via rest-bench, but
>> it says "ERROR: failed to create bucket: XmlParseFailure -failed
>> initializing benchmark", so I could not bench the speed.)
>
> connecting with the admin daemon should still be supported.
> Documentation is here:
>
> http://ceph.com/docs/next/radosgw/troubleshooting/
>
> If this doesn't work please let me know!
>
> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
>
>
> Also, I've created a bug for the rest-bench issue:
>
> http://tracker.ceph.com/issues/5302
>
> Personally I've been using swift-bench for most of my recent rgw testing.
>
> Mark
>
>>
>> Ceph.conf- rgw part:
>>
>> [client.radosgw.connect2]
>> host = hcrgwko2
>> rgw socket path = /tmp/connect2.sock
>> log file = /var/log/ceph/connect2.log rgw dns name = FQDN
>>
>> Thank you very much.
>>
>>
>> Regards
>>
>> Philipp
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AW: AW: radosrgw performance problems
2013-06-12 15:14 ` AW: " Jäger, Philipp
@ 2013-06-12 16:42 ` Mark Nelson
2013-06-13 11:04 ` AW: " Jäger, Philipp
0 siblings, 1 reply; 8+ messages in thread
From: Mark Nelson @ 2013-06-12 16:42 UTC (permalink / raw)
To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org
I spoke to Yehuda (who develops RGW), and he mentioned that it may be
latency due to SSL handshake. How big are the objects you are writing?
with RBD, I can do much better than 40% of the rados throughput, but it
takes a lot of concurrency. I use fio with libaio, direct=1, 4MB
writes, and a high iodepth on multiple volumes to get there. Btw, rados
bench by default is going to keep 16 objects in flight too.
Mark
On 06/12/2013 10:14 AM, Jäger, Philipp wrote:
> No, not really:
>
> 30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2
> 1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw
>
> 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
> We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.
>
> Another perf question:
>
> As I said we can write about 170mb/s with the rados bench:
> rados bench -p test 100 write:
> Bandwidth (MB/sec): 171.744.
>
> With rbd or rgw (w/o https) we get less than 40mb/s:
> (time rados -p connect put 600mb.iso 600mb.iso
> real 0m15.846s
> user 0m0.640s
> sys 0m0.836s)
>
> I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.
>
> Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...
>
>
> Thank you very much so far.
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Mittwoch, 12. Juni 2013 16:53
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: AW: radosrgw performance problems
>
> Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out!
>
> Mark
>
> On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> identified the problem.
>>
>> When I deactivate SSL in Apache Config, and connect via http, I get
>> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>>
>> Thanks
>>
>> Regards
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Jäger, Philipp
>> Gesendet: Mittwoch, 12. Juni 2013 10:22
>> An: 'Mark Nelson'
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: AW: radosrgw performance problems
>>
>> Hello,
>>
>> i've added my answers below.
>>
>> Thanks
>>
>> Regards
>>
>> Philipp
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Gesendet: Dienstag, 11. Juni 2013 16:38
>> An: Jäger, Philipp
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: Re: radosrgw performance problems
>>
>> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>>> Hello,
>>>
>>> we have a performance problem with radosrgw.
>>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>>
>> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
>>
>> [Philipp] We don't use standard pool, new pool with 1500pg, same
>> problem. (30 osds)
>>
>>>
>>> Same speed with the inktank apache/fastcgi and the original one.
>>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>>
>>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>>
>> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
>> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>>
>> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
>> So I think we are not talking about tweaking, rather a general problem?
>>
>>
>> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>>
>>>
>>> (I've heard something about the rgw admin socket to check
>>> perfcounters, but it seems that this is deprecated? Because when i
>>> type ceph --admin-daemon ... it says unknown command and I cannot
>>> find it in the ceph docu. Then i wanted to bench via rest-bench, but
>>> it says "ERROR: failed to create bucket: XmlParseFailure -failed
>>> initializing benchmark", so I could not bench the speed.)
>>
>> connecting with the admin daemon should still be supported.
>> Documentation is here:
>>
>> http://ceph.com/docs/next/radosgw/troubleshooting/
>>
>> If this doesn't work please let me know!
>>
>> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
>>
>>
>> Also, I've created a bug for the rest-bench issue:
>>
>> http://tracker.ceph.com/issues/5302
>>
>> Personally I've been using swift-bench for most of my recent rgw testing.
>>
>> Mark
>>
>>>
>>> Ceph.conf- rgw part:
>>>
>>> [client.radosgw.connect2]
>>> host = hcrgwko2
>>> rgw socket path = /tmp/connect2.sock
>>> log file = /var/log/ceph/connect2.log rgw dns name = FQDN
>>>
>>> Thank you very much.
>>>
>>>
>>> Regards
>>>
>>> Philipp
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: AW: AW: radosrgw performance problems
2013-06-12 16:42 ` Mark Nelson
@ 2013-06-13 11:04 ` Jäger, Philipp
0 siblings, 0 replies; 8+ messages in thread
From: Jäger, Philipp @ 2013-06-13 11:04 UTC (permalink / raw)
To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org
Hello,
I upload 1 big iso, about 8-9mb/s. So not thousand small files. Cluster is still in a test environment.
Apache config:
FastCgiExternalServer /var/www/connect2.fcgi -socket /tmp/connect2.sock
<VirtualHost *:443 >
ServerName foo.cgm.ag
ServerAlias *.foo.cgm.ag
# hcrgwko2
ServerAdmin foo
DocumentRoot /var/www
KeepAlive off
SSLEngine on
SSLCertificateFile /etc/apache2/ssl/foo.cert
SSLCertificateKeyFile /etc/apache2/ssl/foo.key
RewriteEngine On
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /connect2.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
<IfModule mod_fastcgi.c>
<Directory /var/www>
Options +ExecCGI
AllowOverride All
SetHandler fastcgi-script
Order allow,deny
Allow from all
AuthBasicAuthoritative Off
</Directory>
</IfModule>
AllowEncodedSlashes On
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined
ServerSignature Off
</VirtualHost>
Thanks
Philipp
-----Ursprüngliche Nachricht-----
Von: Mark Nelson [mailto:mark.nelson@inktank.com]
Gesendet: Mittwoch, 12. Juni 2013 18:43
An: Jäger, Philipp
Cc: ceph-devel@vger.kernel.org
Betreff: Re: AW: AW: radosrgw performance problems
I spoke to Yehuda (who develops RGW), and he mentioned that it may be latency due to SSL handshake. How big are the objects you are writing?
with RBD, I can do much better than 40% of the rados throughput, but it takes a lot of concurrency. I use fio with libaio, direct=1, 4MB writes, and a high iodepth on multiple volumes to get there. Btw, rados bench by default is going to keep 16 objects in flight too.
Mark
On 06/12/2013 10:14 AM, Jäger, Philipp wrote:
> No, not really:
>
> 30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2
> 1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw
>
> 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph.
> We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu.
>
> Another perf question:
>
> As I said we can write about 170mb/s with the rados bench:
> rados bench -p test 100 write:
> Bandwidth (MB/sec): 171.744.
>
> With rbd or rgw (w/o https) we get less than 40mb/s:
> (time rados -p connect put 600mb.iso 600mb.iso
> real 0m15.846s
> user 0m0.640s
> sys 0m0.836s)
>
> I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench.
>
> Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way...
>
>
> Thank you very much so far.
>
> Philipp
>
> -----Ursprüngliche Nachricht-----
> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
> Gesendet: Mittwoch, 12. Juni 2013 16:53
> An: Jäger, Philipp
> Cc: ceph-devel@vger.kernel.org
> Betreff: Re: AW: radosrgw performance problems
>
> Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out!
>
> Mark
>
> On 06/12/2013 05:52 AM, Jäger, Philipp wrote:
>> Hello,
>>
>> identified the problem.
>>
>> When I deactivate SSL in Apache Config, and connect via http, I get
>> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal?
>>
>> Thanks
>>
>> Regards
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Jäger, Philipp
>> Gesendet: Mittwoch, 12. Juni 2013 10:22
>> An: 'Mark Nelson'
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: AW: radosrgw performance problems
>>
>> Hello,
>>
>> i've added my answers below.
>>
>> Thanks
>>
>> Regards
>>
>> Philipp
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Gesendet: Dienstag, 11. Juni 2013 16:38
>> An: Jäger, Philipp
>> Cc: ceph-devel@vger.kernel.org
>> Betreff: Re: radosrgw performance problems
>>
>> On 06/11/2013 08:27 AM, Jäger, Philipp wrote:
>>> Hello,
>>>
>>> we have a performance problem with radosrgw.
>>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself.
>>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same
>>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general.
>>
>> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low.
>>
>> [Philipp] We don't use standard pool, new pool with 1500pg, same
>> problem. (30 osds)
>>
>>>
>>> Same speed with the inktank apache/fastcgi and the original one.
>>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2
>>>
>>> So have you any idea why the rgw is so slow? How can we identify where the problem is?
>>
>> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!).
>> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent.
>>
>> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744.
>> So I think we are not talking about tweaking, rather a general problem?
>>
>>
>> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone.
>>
>>>
>>> (I've heard something about the rgw admin socket to check
>>> perfcounters, but it seems that this is deprecated? Because when i
>>> type ceph --admin-daemon ... it says unknown command and I cannot
>>> find it in the ceph docu. Then i wanted to bench via rest-bench, but
>>> it says "ERROR: failed to create bucket: XmlParseFailure -failed
>>> initializing benchmark", so I could not bench the speed.)
>>
>> connecting with the admin daemon should still be supported.
>> Documentation is here:
>>
>> http://ceph.com/docs/next/radosgw/troubleshooting/
>>
>> If this doesn't work please let me know!
>>
>> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think?
>>
>>
>> Also, I've created a bug for the rest-bench issue:
>>
>> http://tracker.ceph.com/issues/5302
>>
>> Personally I've been using swift-bench for most of my recent rgw testing.
>>
>> Mark
>>
>>>
>>> Ceph.conf- rgw part:
>>>
>>> [client.radosgw.connect2]
>>> host = hcrgwko2
>>> rgw socket path = /tmp/connect2.sock log file =
>>> /var/log/ceph/connect2.log rgw dns name = FQDN
>>>
>>> Thank you very much.
>>>
>>>
>>> Regards
>>>
>>> Philipp
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-06-13 11:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-11 13:27 radosrgw performance problems Jäger, Philipp
2013-06-11 14:38 ` Mark Nelson
2013-06-12 8:22 ` AW: " Jäger, Philipp
2013-06-12 10:52 ` Jäger, Philipp
2013-06-12 14:53 ` Mark Nelson
2013-06-12 15:14 ` AW: " Jäger, Philipp
2013-06-12 16:42 ` Mark Nelson
2013-06-13 11:04 ` AW: " Jäger, Philipp
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.