From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nelson Subject: Re: poor performance Date: Sun, 04 Nov 2012 07:52:12 -0600 Message-ID: <5096730C.5050507@inktank.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-ie0-f174.google.com ([209.85.223.174]:57280 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753779Ab2KDNwO (ORCPT ); Sun, 4 Nov 2012 08:52:14 -0500 Received: by mail-ie0-f174.google.com with SMTP id k13so6852192iea.19 for ; Sun, 04 Nov 2012 05:52:13 -0800 (PST) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Aleksey Samarin Cc: Gregory Farnum , "ceph-devel@vger.kernel.org" , Mike Ryan On 11/04/2012 07:18 AM, Aleksey Samarin wrote: > Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 m= ds. > here is what I did: > ceph osd pool create bench > ceph osd tell \* bench > rados -p bench bench 30 write --no-cleanup > output: > > Maintaining 16 concurrent writes of 4194304 bytes for at least 30 s= econds. > Object prefix: benchmark_data_host01_11635 > sec Cur ops started finished avg MB/s cur MB/s last lat a= vg lat > 0 0 0 0 0 0 - = 0 > 1 16 16 0 0 0 - = 0 > 2 16 37 21 41.9911 42 0.139005 1= =2E08941 > 3 16 53 37 49.3243 64 0.754114 1= =2E09392 > 4 16 75 59 58.9893 88 0.284647 0.= 914221 > 5 16 89 73 58.3896 56 0.072228 0.= 881008 > 6 16 95 79 52.6575 24 1.56959 0.= 961477 > 7 16 111 95 54.2764 64 0.046105 1= =2E08791 > 8 16 128 112 55.9906 68 0.035714 1= =2E04594 > 9 16 150 134 59.5457 88 0.046298 1= =2E04415 > 10 16 166 150 59.9901 64 0.048635 0.= 986384 > 11 16 176 160 58.1723 40 0.727784 0.= 988408 > 12 16 206 190 63.3231 120 0.28869 0.= 946624 > 13 16 225 209 64.2976 76 1.34472 0.= 919464 > 14 16 263 247 70.5605 152 0.070926 0= =2E90046 > 15 16 295 279 74.3887 128 0.041517 0.= 830466 > 16 16 315 299 74.7388 80 0.296037 0.= 841527 > 17 16 333 317 74.5772 72 0.286097 0.= 849558 > 18 16 340 324 71.9891 28 0.295084 0= =2E83922 > 19 16 343 327 68.8317 12 1.46948 0.= 845797 > 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat:= 0.861539 > sec Cur ops started finished avg MB/s cur MB/s last lat a= vg lat > 20 16 378 362 72.389 140 0.566232 0.= 861539 > 21 16 400 384 73.1313 88 0.038835 0.= 857785 > 22 16 404 388 70.5344 16 0.801216 0.= 857002 > 23 16 413 397 69.0327 36 0.062256 0= =2E86376 > 24 16 428 412 68.6543 60 0.042583 0= =2E89389 > 25 16 450 434 69.4277 88 0.383877 0.= 905833 > 26 16 472 456 70.1415 88 0.269878 0.= 898023 > 27 16 472 456 67.5437 0 - 0.= 898023 > 28 16 512 496 70.8448 80 0.056798 0.= 891163 > 29 16 530 514 70.8843 72 1.20653 0.= 898112 > 30 16 542 526 70.1212 48 0.744383 0.= 890733 > Total time run: 30.174151 > Total writes made: 543 > Write size: 4194304 > Bandwidth (MB/sec): 71.982 > > Stddev Bandwidth: 38.318 > Max bandwidth (MB/sec): 152 > Min bandwidth (MB/sec): 0 > Average Latency: 0.889026 > Stddev Latency: 0.677425 > Max latency: 2.94467 > Min latency: 0.035714 > Much better for 1 disk per node! I suspect that lack of syncfs is=20 hurting you, or perhaps some other issue with writes to lots of disks a= t=20 the same time. > > 2012/11/4 Aleksey Samarin : >> Ok! >> Well, I'll take these tests and write about the results. >> >> btw, >> disks are the same, as some may be faster than others? >> >> 2012/11/4 Gregory Farnum : >>> That's only nine =97 where are the other three? If you have three s= low >>> disks that could definitely cause the troubles you're seeing. >>> >>> Also, what Mark said about sync versus syncfs. >>> >>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin = wrote: >>>> It`s ok! >>>> >>>> Output: >>>> >>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blo= cks >>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blo= cks >>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blo= cks >>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in bl= ocks >>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>> >>>> All the best. >>>> >>>> 2012/11/4 Gregory Farnum : >>>>> [Sorry for the blank email; I missed!] >>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin wrote: >>>>>> Hi! >>>>>> This command? ceph tell osd \* bench >>>>>> Output: tell target 'osd' not a valid entity name >>>>> >>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>> >>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>> >>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>> >>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least= 30 seconds. >>>>>> Object prefix: benchmark_data_host01_5827 >>>>>> sec Cur ops started finished avg MB/s cur MB/s last la= t avg lat >>>>>> 0 0 0 0 0 0 = - 0 >>>>>> 1 16 29 13 51.9885 52 0.48926= 8 0.186749 >>>>>> 2 16 52 36 71.9866 92 1.8722= 6 0.711888 >>>>>> 3 16 57 41 54.657 20 0.08969= 7 0.697821 >>>>>> 4 16 60 44 43.9923 12 1.6186= 8 0.765361 >>>>>> 5 16 60 44 35.1941 0 = - 0.765361 >>>>>> 6 16 60 44 29.3285 0 = - 0.765361 >>>>>> 7 16 60 44 25.1388 0 = - 0.765361 >>>>>> 8 16 61 45 22.4964 1 5.8964= 3 0.879384 >>>>>> 9 16 62 46 20.4412 4 6.023= 4 0.991211 >>>>>> 10 16 62 46 18.3971 0 = - 0.991211 >>>>>> 11 16 63 47 17.0883 2 8.7974= 9 1.1573 >>>>>> 12 16 63 47 15.6643 0 = - 1.1573 >>>>>> 13 16 63 47 14.4593 0 = - 1.1573 >>>>>> 14 16 63 47 13.4266 0 = - 1.1573 >>>>>> 15 16 63 47 12.5315 0 = - 1.1573 >>>>>> 16 16 63 47 11.7483 0 = - 1.1573 >>>>>> 17 16 63 47 11.0572 0 = - 1.1573 >>>>>> 18 16 63 47 10.4429 0 = - 1.1573 >>>>>> 19 16 63 47 9.89331 0 = - 1.1573 >>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg= lat: 1.1573 >>>>>> sec Cur ops started finished avg MB/s cur MB/s last la= t avg lat >>>>>> 20 16 63 47 9.39865 0 = - 1.1573 >>>>>> 21 16 63 47 8.95105 0 = - 1.1573 >>>>>> 22 16 63 47 8.54419 0 = - 1.1573 >>>>>> 23 16 63 47 8.17271 0 = - 1.1573 >>>>>> 24 16 63 47 7.83218 0 = - 1.1573 >>>>>> 25 16 63 47 7.5189 0 = - 1.1573 >>>>>> 26 16 63 47 7.22972 0 = - 1.1573 >>>>>> 27 16 81 65 9.62824 4.5 0.07645= 6 4.9428 >>>>>> 28 16 118 102 14.5693 148 0.42727= 3 4.34095 >>>>>> 29 16 119 103 14.2049 4 1.5789= 7 4.31414 >>>>>> 30 16 132 116 15.4645 52 2.2542= 4 4.01492 >>>>>> 31 16 133 117 15.0946 4 0.97465= 2 3.98893 >>>>>> 32 16 133 117 14.6229 0 = - 3.98893 >>>>>> Total time run: 32.575351 >>>>>> Total writes made: 133 >>>>>> Write size: 4194304 >>>>>> Bandwidth (MB/sec): 16.331 >>>>>> >>>>>> Stddev Bandwidth: 31.8794 >>>>>> Max bandwidth (MB/sec): 148 >>>>>> Min bandwidth (MB/sec): 0 >>>>>> Average Latency: 3.91583 >>>>>> Stddev Latency: 7.42821 >>>>>> Max latency: 25.24 >>>>>> Min latency: 0.036475 >>>>>> >>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>> http://pastebin.com/BqLsyMBC >>>>> >>>>> Well, that did improve it a bit; but yes, I think there's somethi= ng >>>>> else going on. Just wanted to verify. :) >>>>> >>>>>> >>>>>> I have still no idea. >>>>>> >>>>>> All the best. Alex >>>>>> >>>>>> >>>>>> >>>>>> 2012/11/4 Gregory Farnum : >>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin wrote: >>>>>>>> Hi all >>>>>>>> >>>>>>>> Im planning use ceph for cloud storage. >>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2T= b disks per node. >>>>>>>> Centos 6.2 >>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>> journal on tmpfs >>>>>>>> well, im create bench pool and test it: >>>>>>>> ceph osd pool create bench >>>>>>>> rados -p bench bench 30 write >>>>>>>> >>>>>>>> Total time run: 43.258228 >>>>>>>> Total writes made: 151 >>>>>>>> Write size: 4194304 >>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>> Average Latency: 4.48605 >>>>>>>> Stddev Latency: 8.17709 >>>>>>>> Max latency: 29.7957 >>>>>>>> Min latency: 0.039435 >>>>>>>> >>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>> Total time run: 20.626935 >>>>>>>> Total reads made: 275 >>>>>>>> Read size: 4194304 >>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>> Average Latency: 1.19754 >>>>>>>> Max latency: 7.0215 >>>>>>>> Min latency: 0.011647 >>>>>>>> >>>>>>>> I tested the single drive via dd if=3D/dev/zero of=3D/mnt/hdd2= /testfile >>>>>>>> bs=3D1024k count=3D20000 >>>>>>>> result: 158 MB/sec >>>>>>>> >>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed= something? >>>>>>> >>>>>>> Can you run "ceph tell osd \* bench" and report the results? (I= t'll go >>>>>>> to the "central log" which you can keep an eye on if you run "c= eph -w" >>>>>>> in another terminal.) >>>>>>> I think you also didn't create your bench pool correctly; it pr= obably >>>>>>> only has 8 PGs which is not going to perform very well with you= r disk >>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark = against >>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>> placement groups. >>>>>>> -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel"= in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html