* radosrgw performance problems @ 2013-06-11 13:27 Jäger, Philipp 2013-06-11 14:38 ` Mark Nelson 0 siblings, 1 reply; 8+ messages in thread From: Jäger, Philipp @ 2013-06-11 13:27 UTC (permalink / raw) To: ceph-devel@vger.kernel.org Hello, we have a performance problem with radosrgw. Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. (2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 So have you any idea why the rgw is so slow? How can we identify where the problem is? (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.) Ceph.conf- rgw part: [client.radosgw.connect2] host = hcrgwko2 rgw socket path = /tmp/connect2.sock log file = /var/log/ceph/connect2.log rgw dns name = FQDN Thank you very much. Regards Philipp ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: radosrgw performance problems 2013-06-11 13:27 radosrgw performance problems Jäger, Philipp @ 2013-06-11 14:38 ` Mark Nelson 2013-06-12 8:22 ` AW: " Jäger, Philipp 0 siblings, 1 reply; 8+ messages in thread From: Mark Nelson @ 2013-06-11 14:38 UTC (permalink / raw) To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org On 06/11/2013 08:27 AM, Jäger, Philipp wrote: > Hello, > > we have a performance problem with radosrgw. > Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. > (2 uploads at the same time: combined 15mb/s, 3 uploads at the same time: comb. 21mb/s) > But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. > > Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 > > So have you any idea why the rgw is so slow? How can we identify where the problem is? RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. > > (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.) connecting with the admin daemon should still be supported. Documentation is here: http://ceph.com/docs/next/radosgw/troubleshooting/ If this doesn't work please let me know! Also, I've created a bug for the rest-bench issue: http://tracker.ceph.com/issues/5302 Personally I've been using swift-bench for most of my recent rgw testing. Mark > > Ceph.conf- rgw part: > > [client.radosgw.connect2] > host = hcrgwko2 > rgw socket path = /tmp/connect2.sock > log file = /var/log/ceph/connect2.log > rgw dns name = FQDN > > Thank you very much. > > > Regards > > Philipp > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: radosrgw performance problems 2013-06-11 14:38 ` Mark Nelson @ 2013-06-12 8:22 ` Jäger, Philipp 2013-06-12 10:52 ` Jäger, Philipp 0 siblings, 1 reply; 8+ messages in thread From: Jäger, Philipp @ 2013-06-12 8:22 UTC (permalink / raw) To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org Hello, i've added my answers below. Thanks Regards Philipp -----Ursprüngliche Nachricht----- Von: Mark Nelson [mailto:mark.nelson@inktank.com] Gesendet: Dienstag, 11. Juni 2013 16:38 An: Jäger, Philipp Cc: ceph-devel@vger.kernel.org Betreff: Re: radosrgw performance problems On 06/11/2013 08:27 AM, Jäger, Philipp wrote: > Hello, > > we have a performance problem with radosrgw. > Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. > (2 uploads at the same time: combined 15mb/s, 3 uploads at the same > time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. [Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds) > > Same speed with the inktank apache/fastcgi and the original one. Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 > > So have you any idea why the rgw is so slow? How can we identify where the problem is? RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. So I think we are not talking about tweaking, rather a general problem? For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. > > (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.) connecting with the admin daemon should still be supported. Documentation is here: http://ceph.com/docs/next/radosgw/troubleshooting/ If this doesn't work please let me know! [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? Also, I've created a bug for the rest-bench issue: http://tracker.ceph.com/issues/5302 Personally I've been using swift-bench for most of my recent rgw testing. Mark > > Ceph.conf- rgw part: > > [client.radosgw.connect2] > host = hcrgwko2 > rgw socket path = /tmp/connect2.sock > log file = /var/log/ceph/connect2.log > rgw dns name = FQDN > > Thank you very much. > > > Regards > > Philipp > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: radosrgw performance problems 2013-06-12 8:22 ` AW: " Jäger, Philipp @ 2013-06-12 10:52 ` Jäger, Philipp 2013-06-12 14:53 ` Mark Nelson 0 siblings, 1 reply; 8+ messages in thread From: Jäger, Philipp @ 2013-06-12 10:52 UTC (permalink / raw) To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org Hello, identified the problem. When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal? Thanks Regards -----Ursprüngliche Nachricht----- Von: Jäger, Philipp Gesendet: Mittwoch, 12. Juni 2013 10:22 An: 'Mark Nelson' Cc: ceph-devel@vger.kernel.org Betreff: AW: radosrgw performance problems Hello, i've added my answers below. Thanks Regards Philipp -----Ursprüngliche Nachricht----- Von: Mark Nelson [mailto:mark.nelson@inktank.com] Gesendet: Dienstag, 11. Juni 2013 16:38 An: Jäger, Philipp Cc: ceph-devel@vger.kernel.org Betreff: Re: radosrgw performance problems On 06/11/2013 08:27 AM, Jäger, Philipp wrote: > Hello, > > we have a performance problem with radosrgw. > Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. > (2 uploads at the same time: combined 15mb/s, 3 uploads at the same > time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. [Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds) > > Same speed with the inktank apache/fastcgi and the original one. > Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 > > So have you any idea why the rgw is so slow? How can we identify where the problem is? RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. So I think we are not talking about tweaking, rather a general problem? For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. > > (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.) connecting with the admin daemon should still be supported. Documentation is here: http://ceph.com/docs/next/radosgw/troubleshooting/ If this doesn't work please let me know! [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? Also, I've created a bug for the rest-bench issue: http://tracker.ceph.com/issues/5302 Personally I've been using swift-bench for most of my recent rgw testing. Mark > > Ceph.conf- rgw part: > > [client.radosgw.connect2] > host = hcrgwko2 > rgw socket path = /tmp/connect2.sock > log file = /var/log/ceph/connect2.log > rgw dns name = FQDN > > Thank you very much. > > > Regards > > Philipp > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AW: radosrgw performance problems 2013-06-12 10:52 ` Jäger, Philipp @ 2013-06-12 14:53 ` Mark Nelson 2013-06-12 15:14 ` AW: " Jäger, Philipp 0 siblings, 1 reply; 8+ messages in thread From: Mark Nelson @ 2013-06-12 14:53 UTC (permalink / raw) To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out! Mark On 06/12/2013 05:52 AM, Jäger, Philipp wrote: > Hello, > > identified the problem. > > When I deactivate SSL in Apache Config, and connect via http, I get the 40MB/s. (with ssl 8mb/s) > Have you experience with SSL? Is this normal? > > Thanks > > Regards > > > > -----Ursprüngliche Nachricht----- > Von: Jäger, Philipp > Gesendet: Mittwoch, 12. Juni 2013 10:22 > An: 'Mark Nelson' > Cc: ceph-devel@vger.kernel.org > Betreff: AW: radosrgw performance problems > > Hello, > > i've added my answers below. > > Thanks > > Regards > > Philipp > > -----Ursprüngliche Nachricht----- > Von: Mark Nelson [mailto:mark.nelson@inktank.com] > Gesendet: Dienstag, 11. Juni 2013 16:38 > An: Jäger, Philipp > Cc: ceph-devel@vger.kernel.org > Betreff: Re: radosrgw performance problems > > On 06/11/2013 08:27 AM, Jäger, Philipp wrote: >> Hello, >> >> we have a performance problem with radosrgw. >> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. >> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same >> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. > > One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. > > [Philipp] We don't use standard pool, new pool with 1500pg, same problem. (30 osds) > >> >> Same speed with the inktank apache/fastcgi and the original one. >> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 >> >> So have you any idea why the rgw is so slow? How can we identify where the problem is? > > RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). > RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. > > [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. > So I think we are not talking about tweaking, rather a general problem? > > > For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. > >> >> (I've heard something about the rgw admin socket to check perfcounters, but it seems that this is deprecated? Because when i type ceph --admin-daemon ... it says unknown command and I cannot find it in the ceph docu. Then i wanted to bench via rest-bench, but it says "ERROR: failed to create bucket: XmlParseFailure -failed initializing benchmark", so I could not bench the speed.) > > connecting with the admin daemon should still be supported. > Documentation is here: > > http://ceph.com/docs/next/radosgw/troubleshooting/ > > If this doesn't work please let me know! > > [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? > > > Also, I've created a bug for the rest-bench issue: > > http://tracker.ceph.com/issues/5302 > > Personally I've been using swift-bench for most of my recent rgw testing. > > Mark > >> >> Ceph.conf- rgw part: >> >> [client.radosgw.connect2] >> host = hcrgwko2 >> rgw socket path = /tmp/connect2.sock >> log file = /var/log/ceph/connect2.log >> rgw dns name = FQDN >> >> Thank you very much. >> >> >> Regards >> >> Philipp >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: AW: radosrgw performance problems 2013-06-12 14:53 ` Mark Nelson @ 2013-06-12 15:14 ` Jäger, Philipp 2013-06-12 16:42 ` Mark Nelson 0 siblings, 1 reply; 8+ messages in thread From: Jäger, Philipp @ 2013-06-12 15:14 UTC (permalink / raw) To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org No, not really: 30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2 1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph. We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu. Another perf question: As I said we can write about 170mb/s with the rados bench: rados bench -p test 100 write: Bandwidth (MB/sec): 171.744. With rbd or rgw (w/o https) we get less than 40mb/s: (time rados -p connect put 600mb.iso 600mb.iso real 0m15.846s user 0m0.640s sys 0m0.836s) I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench. Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way... Thank you very much so far. Philipp -----Ursprüngliche Nachricht----- Von: Mark Nelson [mailto:mark.nelson@inktank.com] Gesendet: Mittwoch, 12. Juni 2013 16:53 An: Jäger, Philipp Cc: ceph-devel@vger.kernel.org Betreff: Re: AW: radosrgw performance problems Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out! Mark On 06/12/2013 05:52 AM, Jäger, Philipp wrote: > Hello, > > identified the problem. > > When I deactivate SSL in Apache Config, and connect via http, I get > the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal? > > Thanks > > Regards > > > > -----Ursprüngliche Nachricht----- > Von: Jäger, Philipp > Gesendet: Mittwoch, 12. Juni 2013 10:22 > An: 'Mark Nelson' > Cc: ceph-devel@vger.kernel.org > Betreff: AW: radosrgw performance problems > > Hello, > > i've added my answers below. > > Thanks > > Regards > > Philipp > > -----Ursprüngliche Nachricht----- > Von: Mark Nelson [mailto:mark.nelson@inktank.com] > Gesendet: Dienstag, 11. Juni 2013 16:38 > An: Jäger, Philipp > Cc: ceph-devel@vger.kernel.org > Betreff: Re: radosrgw performance problems > > On 06/11/2013 08:27 AM, Jäger, Philipp wrote: >> Hello, >> >> we have a performance problem with radosrgw. >> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. >> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same >> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. > > One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. > > [Philipp] We don't use standard pool, new pool with 1500pg, same > problem. (30 osds) > >> >> Same speed with the inktank apache/fastcgi and the original one. >> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 >> >> So have you any idea why the rgw is so slow? How can we identify where the problem is? > > RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). > RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. > > [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. > So I think we are not talking about tweaking, rather a general problem? > > > For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. > >> >> (I've heard something about the rgw admin socket to check >> perfcounters, but it seems that this is deprecated? Because when i >> type ceph --admin-daemon ... it says unknown command and I cannot >> find it in the ceph docu. Then i wanted to bench via rest-bench, but >> it says "ERROR: failed to create bucket: XmlParseFailure -failed >> initializing benchmark", so I could not bench the speed.) > > connecting with the admin daemon should still be supported. > Documentation is here: > > http://ceph.com/docs/next/radosgw/troubleshooting/ > > If this doesn't work please let me know! > > [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? > > > Also, I've created a bug for the rest-bench issue: > > http://tracker.ceph.com/issues/5302 > > Personally I've been using swift-bench for most of my recent rgw testing. > > Mark > >> >> Ceph.conf- rgw part: >> >> [client.radosgw.connect2] >> host = hcrgwko2 >> rgw socket path = /tmp/connect2.sock >> log file = /var/log/ceph/connect2.log rgw dns name = FQDN >> >> Thank you very much. >> >> >> Regards >> >> Philipp >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AW: AW: radosrgw performance problems 2013-06-12 15:14 ` AW: " Jäger, Philipp @ 2013-06-12 16:42 ` Mark Nelson 2013-06-13 11:04 ` AW: " Jäger, Philipp 0 siblings, 1 reply; 8+ messages in thread From: Mark Nelson @ 2013-06-12 16:42 UTC (permalink / raw) To: "Jäger, Philipp"; +Cc: ceph-devel@vger.kernel.org I spoke to Yehuda (who develops RGW), and he mentioned that it may be latency due to SSL handshake. How big are the objects you are writing? with RBD, I can do much better than 40% of the rados throughput, but it takes a lot of concurrency. I use fio with libaio, direct=1, 4MB writes, and a high iodepth on multiple volumes to get there. Btw, rados bench by default is going to keep 16 objects in flight too. Mark On 06/12/2013 10:14 AM, Jäger, Philipp wrote: > No, not really: > > 30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2 > 1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw > > 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph. > We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu. > > Another perf question: > > As I said we can write about 170mb/s with the rados bench: > rados bench -p test 100 write: > Bandwidth (MB/sec): 171.744. > > With rbd or rgw (w/o https) we get less than 40mb/s: > (time rados -p connect put 600mb.iso 600mb.iso > real 0m15.846s > user 0m0.640s > sys 0m0.836s) > > I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench. > > Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way... > > > Thank you very much so far. > > Philipp > > -----Ursprüngliche Nachricht----- > Von: Mark Nelson [mailto:mark.nelson@inktank.com] > Gesendet: Mittwoch, 12. Juni 2013 16:53 > An: Jäger, Philipp > Cc: ceph-devel@vger.kernel.org > Betreff: Re: AW: radosrgw performance problems > > Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out! > > Mark > > On 06/12/2013 05:52 AM, Jäger, Philipp wrote: >> Hello, >> >> identified the problem. >> >> When I deactivate SSL in Apache Config, and connect via http, I get >> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal? >> >> Thanks >> >> Regards >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: Jäger, Philipp >> Gesendet: Mittwoch, 12. Juni 2013 10:22 >> An: 'Mark Nelson' >> Cc: ceph-devel@vger.kernel.org >> Betreff: AW: radosrgw performance problems >> >> Hello, >> >> i've added my answers below. >> >> Thanks >> >> Regards >> >> Philipp >> >> -----Ursprüngliche Nachricht----- >> Von: Mark Nelson [mailto:mark.nelson@inktank.com] >> Gesendet: Dienstag, 11. Juni 2013 16:38 >> An: Jäger, Philipp >> Cc: ceph-devel@vger.kernel.org >> Betreff: Re: radosrgw performance problems >> >> On 06/11/2013 08:27 AM, Jäger, Philipp wrote: >>> Hello, >>> >>> we have a performance problem with radosrgw. >>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. >>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same >>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. >> >> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. >> >> [Philipp] We don't use standard pool, new pool with 1500pg, same >> problem. (30 osds) >> >>> >>> Same speed with the inktank apache/fastcgi and the original one. >>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 >>> >>> So have you any idea why the rgw is so slow? How can we identify where the problem is? >> >> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). >> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. >> >> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. >> So I think we are not talking about tweaking, rather a general problem? >> >> >> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. >> >>> >>> (I've heard something about the rgw admin socket to check >>> perfcounters, but it seems that this is deprecated? Because when i >>> type ceph --admin-daemon ... it says unknown command and I cannot >>> find it in the ceph docu. Then i wanted to bench via rest-bench, but >>> it says "ERROR: failed to create bucket: XmlParseFailure -failed >>> initializing benchmark", so I could not bench the speed.) >> >> connecting with the admin daemon should still be supported. >> Documentation is here: >> >> http://ceph.com/docs/next/radosgw/troubleshooting/ >> >> If this doesn't work please let me know! >> >> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? >> >> >> Also, I've created a bug for the rest-bench issue: >> >> http://tracker.ceph.com/issues/5302 >> >> Personally I've been using swift-bench for most of my recent rgw testing. >> >> Mark >> >>> >>> Ceph.conf- rgw part: >>> >>> [client.radosgw.connect2] >>> host = hcrgwko2 >>> rgw socket path = /tmp/connect2.sock >>> log file = /var/log/ceph/connect2.log rgw dns name = FQDN >>> >>> Thank you very much. >>> >>> >>> Regards >>> >>> Philipp >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>> in the body of a message to majordomo@vger.kernel.org More majordomo >>> info at http://vger.kernel.org/majordomo-info.html >>> >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: AW: AW: radosrgw performance problems 2013-06-12 16:42 ` Mark Nelson @ 2013-06-13 11:04 ` Jäger, Philipp 0 siblings, 0 replies; 8+ messages in thread From: Jäger, Philipp @ 2013-06-13 11:04 UTC (permalink / raw) To: Mark Nelson; +Cc: ceph-devel@vger.kernel.org Hello, I upload 1 big iso, about 8-9mb/s. So not thousand small files. Cluster is still in a test environment. Apache config: FastCgiExternalServer /var/www/connect2.fcgi -socket /tmp/connect2.sock <VirtualHost *:443 > ServerName foo.cgm.ag ServerAlias *.foo.cgm.ag # hcrgwko2 ServerAdmin foo DocumentRoot /var/www KeepAlive off SSLEngine on SSLCertificateFile /etc/apache2/ssl/foo.cert SSLCertificateKeyFile /etc/apache2/ssl/foo.key RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /connect2.fcgi?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] <IfModule mod_fastcgi.c> <Directory /var/www> Options +ExecCGI AllowOverride All SetHandler fastcgi-script Order allow,deny Allow from all AuthBasicAuthoritative Off </Directory> </IfModule> AllowEncodedSlashes On ErrorLog /var/log/apache2/error.log CustomLog /var/log/apache2/access.log combined ServerSignature Off </VirtualHost> Thanks Philipp -----Ursprüngliche Nachricht----- Von: Mark Nelson [mailto:mark.nelson@inktank.com] Gesendet: Mittwoch, 12. Juni 2013 18:43 An: Jäger, Philipp Cc: ceph-devel@vger.kernel.org Betreff: Re: AW: AW: radosrgw performance problems I spoke to Yehuda (who develops RGW), and he mentioned that it may be latency due to SSL handshake. How big are the objects you are writing? with RBD, I can do much better than 40% of the rados throughput, but it takes a lot of concurrency. I use fio with libaio, direct=1, 4MB writes, and a high iodepth on multiple volumes to get there. Btw, rados bench by default is going to keep 16 objects in flight too. Mark On 06/12/2013 10:14 AM, Jäger, Philipp wrote: > No, not really: > > 30239 www-data 20 0 751m 7556 2036 S 10 0.4 0:03.67 apache2 > 1955 root 20 0 2048m 10m 4352 S 6 0.5 2:14.54 radosgw > > 10% cpu usage apache, load was 0.4. Also less than 15% usage via vcenter performance graph. > We are setting up a physical server right now at this moment, because we thought also about missing instruction sets of the cpu. > > Another perf question: > > As I said we can write about 170mb/s with the rados bench: > rados bench -p test 100 write: > Bandwidth (MB/sec): 171.744. > > With rbd or rgw (w/o https) we get less than 40mb/s: > (time rados -p connect put 600mb.iso 600mb.iso > real 0m15.846s > user 0m0.640s > sys 0m0.836s) > > I think you can also close the bug! Because of https I have to type the protocol, and then I get the message "bad protocol"( like here: http://tracker.ceph.com/issues/3968 ) , so not possible to bench at the moment with rest-bench. > > Ive configured the admin socket, but I don't know who to "read" the output of perfcounters in a sensefull way... > > > Thank you very much so far. > > Philipp > > -----Ursprüngliche Nachricht----- > Von: Mark Nelson [mailto:mark.nelson@inktank.com] > Gesendet: Mittwoch, 12. Juni 2013 16:53 > An: Jäger, Philipp > Cc: ceph-devel@vger.kernel.org > Betreff: Re: AW: radosrgw performance problems > > Interesting. Was Apache using excessive CPU? Do your processors and libraries support AES-NI? Seems strange that at this level that would be the limiting factor, but I've seen stranger things... Glad you figured it out! > > Mark > > On 06/12/2013 05:52 AM, Jäger, Philipp wrote: >> Hello, >> >> identified the problem. >> >> When I deactivate SSL in Apache Config, and connect via http, I get >> the 40MB/s. (with ssl 8mb/s) Have you experience with SSL? Is this normal? >> >> Thanks >> >> Regards >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: Jäger, Philipp >> Gesendet: Mittwoch, 12. Juni 2013 10:22 >> An: 'Mark Nelson' >> Cc: ceph-devel@vger.kernel.org >> Betreff: AW: radosrgw performance problems >> >> Hello, >> >> i've added my answers below. >> >> Thanks >> >> Regards >> >> Philipp >> >> -----Ursprüngliche Nachricht----- >> Von: Mark Nelson [mailto:mark.nelson@inktank.com] >> Gesendet: Dienstag, 11. Juni 2013 16:38 >> An: Jäger, Philipp >> Cc: ceph-devel@vger.kernel.org >> Betreff: Re: radosrgw performance problems >> >> On 06/11/2013 08:27 AM, Jäger, Philipp wrote: >>> Hello, >>> >>> we have a performance problem with radosrgw. >>> Only 8mb/s-9 per upload, also tested with s3cmd on the rgw itself. >>> (2 uploads at the same time: combined 15mb/s, 3 uploads at the same >>> time: comb. 21mb/s) But when putting a file via rados rbd , we get 40mb/s upload, so no network or other problem in general. >> >> One thing to check is to make sure that the rgw pool you are writing to has enough placement groups for your cluster. The default may be extremely low. >> >> [Philipp] We don't use standard pool, new pool with 1500pg, same >> problem. (30 osds) >> >>> >>> Same speed with the inktank apache/fastcgi and the original one. >>> Hardware also fast enough. We use Ubuntu 12.04 lts, ceph 0.61.2 >>> >>> So have you any idea why the rgw is so slow? How can we identify where the problem is? >> >> RBD is pretty streamlined so you can get good performance with it. On my test setup I'm seeing 80-90% of the performance of raw rados object writes/reads (and in some cases much faster with RBD cache enabled!). >> RGW, Apache, fastcgi, and simply the requirements of supporting the S3 protocol itself add a lot of overhead. MD5 calculations by themselves start chewing up a ton of CPU once you try to support high throughput scenarios and there is a non-trivial amount of extra latency added as well. You may be able to improve things with some tweaks, but I wouldn't be surprised if RBD is always going to be faster to an extent. >> >> [Philipp]We are talking about 9mb/s per rgw, which is less then 1/4 of rbd (rados put: 40mb/s), with the rados bench we get actually: Bandwidth (MB/sec): 171.744. >> So I think we are not talking about tweaking, rather a general problem? >> >> >> For folks who want really fast object storage I think directly utilizing rados is probably the way to go, but that requires modifying the app and it's not for everyone. >> >>> >>> (I've heard something about the rgw admin socket to check >>> perfcounters, but it seems that this is deprecated? Because when i >>> type ceph --admin-daemon ... it says unknown command and I cannot >>> find it in the ceph docu. Then i wanted to bench via rest-bench, but >>> it says "ERROR: failed to create bucket: XmlParseFailure -failed >>> initializing benchmark", so I could not bench the speed.) >> >> connecting with the admin daemon should still be supported. >> Documentation is here: >> >> http://ceph.com/docs/next/radosgw/troubleshooting/ >> >> If this doesn't work please let me know! >> >> [Philipp] How can you activate a rgw admin socket? I think we have to add an entry in the ceph.conf? The admin socket is not the "rgw socket path" I think? >> >> >> Also, I've created a bug for the rest-bench issue: >> >> http://tracker.ceph.com/issues/5302 >> >> Personally I've been using swift-bench for most of my recent rgw testing. >> >> Mark >> >>> >>> Ceph.conf- rgw part: >>> >>> [client.radosgw.connect2] >>> host = hcrgwko2 >>> rgw socket path = /tmp/connect2.sock log file = >>> /var/log/ceph/connect2.log rgw dns name = FQDN >>> >>> Thank you very much. >>> >>> >>> Regards >>> >>> Philipp >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>> in the body of a message to majordomo@vger.kernel.org More majordomo >>> info at http://vger.kernel.org/majordomo-info.html >>> >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-06-13 11:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-11 13:27 radosrgw performance problems Jäger, Philipp 2013-06-11 14:38 ` Mark Nelson 2013-06-12 8:22 ` AW: " Jäger, Philipp 2013-06-12 10:52 ` Jäger, Philipp 2013-06-12 14:53 ` Mark Nelson 2013-06-12 15:14 ` AW: " Jäger, Philipp 2013-06-12 16:42 ` Mark Nelson 2013-06-13 11:04 ` AW: " Jäger, Philipp
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.