* poor performance @ 2012-11-04 9:58 Aleksey Samarin 2012-11-04 11:00 ` Gregory Farnum 2012-11-04 12:29 ` Mark Nelson 0 siblings, 2 replies; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 9:58 UTC (permalink / raw) To: ceph-devel Hi all Im planning use ceph for cloud storage. My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. Centos 6.2 Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 This is my config http://pastebin.com/Pzxafnsm journal on tmpfs well, im create bench pool and test it: ceph osd pool create bench rados -p bench bench 30 write Total time run: 43.258228 Total writes made: 151 Write size: 4194304 Bandwidth (MB/sec): 13.963 Stddev Bandwidth: 26.307 Max bandwidth (MB/sec): 128 Min bandwidth (MB/sec): 0 Average Latency: 4.48605 Stddev Latency: 8.17709 Max latency: 29.7957 Min latency: 0.039435 when i do rados -p bench bench 30 seq Total time run: 20.626935 Total reads made: 275 Read size: 4194304 Bandwidth (MB/sec): 53.328 Average Latency: 1.19754 Max latency: 7.0215 Min latency: 0.011647 I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile bs=1024k count=20000 result: 158 MB/sec Anyone can tell me why such a weak performance? Maybe I missed something? All the best, Alex! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 9:58 poor performance Aleksey Samarin @ 2012-11-04 11:00 ` Gregory Farnum 2012-11-04 12:04 ` Aleksey Samarin 2012-11-04 12:29 ` Mark Nelson 1 sibling, 1 reply; 14+ messages in thread From: Gregory Farnum @ 2012-11-04 11:00 UTC (permalink / raw) To: Aleksey Samarin; +Cc: ceph-devel@vger.kernel.org, Mike Ryan On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: > Hi all > > Im planning use ceph for cloud storage. > My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. > Centos 6.2 > Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 > This is my config http://pastebin.com/Pzxafnsm > journal on tmpfs > well, im create bench pool and test it: > ceph osd pool create bench > rados -p bench bench 30 write > > Total time run: 43.258228 > Total writes made: 151 > Write size: 4194304 > Bandwidth (MB/sec): 13.963 > Stddev Bandwidth: 26.307 > Max bandwidth (MB/sec): 128 > Min bandwidth (MB/sec): 0 > Average Latency: 4.48605 > Stddev Latency: 8.17709 > Max latency: 29.7957 > Min latency: 0.039435 > > when i do rados -p bench bench 30 seq > Total time run: 20.626935 > Total reads made: 275 > Read size: 4194304 > Bandwidth (MB/sec): 53.328 > Average Latency: 1.19754 > Max latency: 7.0215 > Min latency: 0.011647 > > I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile > bs=1024k count=20000 > result: 158 MB/sec > > Anyone can tell me why such a weak performance? Maybe I missed something? Can you run "ceph tell osd \* bench" and report the results? (It'll go to the "central log" which you can keep an eye on if you run "ceph -w" in another terminal.) I think you also didn't create your bench pool correctly; it probably only has 8 PGs which is not going to perform very well with your disk count. Try "ceph pool create bench2 120" and run the benchmark against that pool. The extra number at the end tells it to create 120 placement groups. -Greg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 11:00 ` Gregory Farnum @ 2012-11-04 12:04 ` Aleksey Samarin 2012-11-04 12:15 ` Gregory Farnum 2012-11-04 12:18 ` Gregory Farnum 0 siblings, 2 replies; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 12:04 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org, Mike Ryan Hi! This command? ceph tell osd \* bench Output: tell target 'osd' not a valid entity name Well, i did pool by command ceph osd pool create bench2 120 This output of rados -p bench2 bench 30 write --no-cleanup rados -p bench2 bench 30 write --no-cleanup Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. Object prefix: benchmark_data_host01_5827 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 29 13 51.9885 52 0.489268 0.186749 2 16 52 36 71.9866 92 1.87226 0.711888 3 16 57 41 54.657 20 0.089697 0.697821 4 16 60 44 43.9923 12 1.61868 0.765361 5 16 60 44 35.1941 0 - 0.765361 6 16 60 44 29.3285 0 - 0.765361 7 16 60 44 25.1388 0 - 0.765361 8 16 61 45 22.4964 1 5.89643 0.879384 9 16 62 46 20.4412 4 6.0234 0.991211 10 16 62 46 18.3971 0 - 0.991211 11 16 63 47 17.0883 2 8.79749 1.1573 12 16 63 47 15.6643 0 - 1.1573 13 16 63 47 14.4593 0 - 1.1573 14 16 63 47 13.4266 0 - 1.1573 15 16 63 47 12.5315 0 - 1.1573 16 16 63 47 11.7483 0 - 1.1573 17 16 63 47 11.0572 0 - 1.1573 18 16 63 47 10.4429 0 - 1.1573 19 16 63 47 9.89331 0 - 1.1573 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 63 47 9.39865 0 - 1.1573 21 16 63 47 8.95105 0 - 1.1573 22 16 63 47 8.54419 0 - 1.1573 23 16 63 47 8.17271 0 - 1.1573 24 16 63 47 7.83218 0 - 1.1573 25 16 63 47 7.5189 0 - 1.1573 26 16 63 47 7.22972 0 - 1.1573 27 16 81 65 9.62824 4.5 0.076456 4.9428 28 16 118 102 14.5693 148 0.427273 4.34095 29 16 119 103 14.2049 4 1.57897 4.31414 30 16 132 116 15.4645 52 2.25424 4.01492 31 16 133 117 15.0946 4 0.974652 3.98893 32 16 133 117 14.6229 0 - 3.98893 Total time run: 32.575351 Total writes made: 133 Write size: 4194304 Bandwidth (MB/sec): 16.331 Stddev Bandwidth: 31.8794 Max bandwidth (MB/sec): 148 Min bandwidth (MB/sec): 0 Average Latency: 3.91583 Stddev Latency: 7.42821 Max latency: 25.24 Min latency: 0.036475 Im think problem not in pg. This output of ceph pg dump > http://pastebin.com/BqLsyMBC I have still no idea. All the best. Alex 2012/11/4 Gregory Farnum <greg@inktank.com>: > On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >> Hi all >> >> Im planning use ceph for cloud storage. >> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >> Centos 6.2 >> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >> This is my config http://pastebin.com/Pzxafnsm >> journal on tmpfs >> well, im create bench pool and test it: >> ceph osd pool create bench >> rados -p bench bench 30 write >> >> Total time run: 43.258228 >> Total writes made: 151 >> Write size: 4194304 >> Bandwidth (MB/sec): 13.963 >> Stddev Bandwidth: 26.307 >> Max bandwidth (MB/sec): 128 >> Min bandwidth (MB/sec): 0 >> Average Latency: 4.48605 >> Stddev Latency: 8.17709 >> Max latency: 29.7957 >> Min latency: 0.039435 >> >> when i do rados -p bench bench 30 seq >> Total time run: 20.626935 >> Total reads made: 275 >> Read size: 4194304 >> Bandwidth (MB/sec): 53.328 >> Average Latency: 1.19754 >> Max latency: 7.0215 >> Min latency: 0.011647 >> >> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >> bs=1024k count=20000 >> result: 158 MB/sec >> >> Anyone can tell me why such a weak performance? Maybe I missed something? > > Can you run "ceph tell osd \* bench" and report the results? (It'll go > to the "central log" which you can keep an eye on if you run "ceph -w" > in another terminal.) > I think you also didn't create your bench pool correctly; it probably > only has 8 PGs which is not going to perform very well with your disk > count. Try "ceph pool create bench2 120" and run the benchmark against > that pool. The extra number at the end tells it to create 120 > placement groups. > -Greg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:04 ` Aleksey Samarin @ 2012-11-04 12:15 ` Gregory Farnum 2012-11-04 12:18 ` Gregory Farnum 1 sibling, 0 replies; 14+ messages in thread From: Gregory Farnum @ 2012-11-04 12:15 UTC (permalink / raw) To: Aleksey Samarin; +Cc: ceph-devel@vger.kernel.org, Mike Ryan On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: > Hi! > This command? ceph tell osd \* bench > Output: tell target 'osd' not a valid entity name > > Well, i did pool by command ceph osd pool create bench2 120 > This output of rados -p bench2 bench 30 write --no-cleanup > > rados -p bench2 bench 30 write --no-cleanup > > Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. > Object prefix: benchmark_data_host01_5827 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 29 13 51.9885 52 0.489268 0.186749 > 2 16 52 36 71.9866 92 1.87226 0.711888 > 3 16 57 41 54.657 20 0.089697 0.697821 > 4 16 60 44 43.9923 12 1.61868 0.765361 > 5 16 60 44 35.1941 0 - 0.765361 > 6 16 60 44 29.3285 0 - 0.765361 > 7 16 60 44 25.1388 0 - 0.765361 > 8 16 61 45 22.4964 1 5.89643 0.879384 > 9 16 62 46 20.4412 4 6.0234 0.991211 > 10 16 62 46 18.3971 0 - 0.991211 > 11 16 63 47 17.0883 2 8.79749 1.1573 > 12 16 63 47 15.6643 0 - 1.1573 > 13 16 63 47 14.4593 0 - 1.1573 > 14 16 63 47 13.4266 0 - 1.1573 > 15 16 63 47 12.5315 0 - 1.1573 > 16 16 63 47 11.7483 0 - 1.1573 > 17 16 63 47 11.0572 0 - 1.1573 > 18 16 63 47 10.4429 0 - 1.1573 > 19 16 63 47 9.89331 0 - 1.1573 > 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 20 16 63 47 9.39865 0 - 1.1573 > 21 16 63 47 8.95105 0 - 1.1573 > 22 16 63 47 8.54419 0 - 1.1573 > 23 16 63 47 8.17271 0 - 1.1573 > 24 16 63 47 7.83218 0 - 1.1573 > 25 16 63 47 7.5189 0 - 1.1573 > 26 16 63 47 7.22972 0 - 1.1573 > 27 16 81 65 9.62824 4.5 0.076456 4.9428 > 28 16 118 102 14.5693 148 0.427273 4.34095 > 29 16 119 103 14.2049 4 1.57897 4.31414 > 30 16 132 116 15.4645 52 2.25424 4.01492 > 31 16 133 117 15.0946 4 0.974652 3.98893 > 32 16 133 117 14.6229 0 - 3.98893 > Total time run: 32.575351 > Total writes made: 133 > Write size: 4194304 > Bandwidth (MB/sec): 16.331 > > Stddev Bandwidth: 31.8794 > Max bandwidth (MB/sec): 148 > Min bandwidth (MB/sec): 0 > Average Latency: 3.91583 > Stddev Latency: 7.42821 > Max latency: 25.24 > Min latency: 0.036475 > > Im think problem not in pg. This output of ceph pg dump > > http://pastebin.com/BqLsyMBC > > I have still no idea. > > All the best. Alex > > > > 2012/11/4 Gregory Farnum <greg@inktank.com>: >> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>> Hi all >>> >>> Im planning use ceph for cloud storage. >>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>> Centos 6.2 >>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>> This is my config http://pastebin.com/Pzxafnsm >>> journal on tmpfs >>> well, im create bench pool and test it: >>> ceph osd pool create bench >>> rados -p bench bench 30 write >>> >>> Total time run: 43.258228 >>> Total writes made: 151 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 13.963 >>> Stddev Bandwidth: 26.307 >>> Max bandwidth (MB/sec): 128 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 4.48605 >>> Stddev Latency: 8.17709 >>> Max latency: 29.7957 >>> Min latency: 0.039435 >>> >>> when i do rados -p bench bench 30 seq >>> Total time run: 20.626935 >>> Total reads made: 275 >>> Read size: 4194304 >>> Bandwidth (MB/sec): 53.328 >>> Average Latency: 1.19754 >>> Max latency: 7.0215 >>> Min latency: 0.011647 >>> >>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>> bs=1024k count=20000 >>> result: 158 MB/sec >>> >>> Anyone can tell me why such a weak performance? Maybe I missed something? >> >> Can you run "ceph tell osd \* bench" and report the results? (It'll go >> to the "central log" which you can keep an eye on if you run "ceph -w" >> in another terminal.) >> I think you also didn't create your bench pool correctly; it probably >> only has 8 PGs which is not going to perform very well with your disk >> count. Try "ceph pool create bench2 120" and run the benchmark against >> that pool. The extra number at the end tells it to create 120 >> placement groups. >> -Greg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:04 ` Aleksey Samarin 2012-11-04 12:15 ` Gregory Farnum @ 2012-11-04 12:18 ` Gregory Farnum 2012-11-04 12:26 ` Aleksey Samarin 1 sibling, 1 reply; 14+ messages in thread From: Gregory Farnum @ 2012-11-04 12:18 UTC (permalink / raw) To: Aleksey Samarin; +Cc: ceph-devel@vger.kernel.org, Mike Ryan [Sorry for the blank email; I missed!] On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: > Hi! > This command? ceph tell osd \* bench > Output: tell target 'osd' not a valid entity name I guess it's "ceph osd tell \* bench". Try that one. :) > Well, i did pool by command ceph osd pool create bench2 120 > This output of rados -p bench2 bench 30 write --no-cleanup > > rados -p bench2 bench 30 write --no-cleanup > > Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. > Object prefix: benchmark_data_host01_5827 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 29 13 51.9885 52 0.489268 0.186749 > 2 16 52 36 71.9866 92 1.87226 0.711888 > 3 16 57 41 54.657 20 0.089697 0.697821 > 4 16 60 44 43.9923 12 1.61868 0.765361 > 5 16 60 44 35.1941 0 - 0.765361 > 6 16 60 44 29.3285 0 - 0.765361 > 7 16 60 44 25.1388 0 - 0.765361 > 8 16 61 45 22.4964 1 5.89643 0.879384 > 9 16 62 46 20.4412 4 6.0234 0.991211 > 10 16 62 46 18.3971 0 - 0.991211 > 11 16 63 47 17.0883 2 8.79749 1.1573 > 12 16 63 47 15.6643 0 - 1.1573 > 13 16 63 47 14.4593 0 - 1.1573 > 14 16 63 47 13.4266 0 - 1.1573 > 15 16 63 47 12.5315 0 - 1.1573 > 16 16 63 47 11.7483 0 - 1.1573 > 17 16 63 47 11.0572 0 - 1.1573 > 18 16 63 47 10.4429 0 - 1.1573 > 19 16 63 47 9.89331 0 - 1.1573 > 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 20 16 63 47 9.39865 0 - 1.1573 > 21 16 63 47 8.95105 0 - 1.1573 > 22 16 63 47 8.54419 0 - 1.1573 > 23 16 63 47 8.17271 0 - 1.1573 > 24 16 63 47 7.83218 0 - 1.1573 > 25 16 63 47 7.5189 0 - 1.1573 > 26 16 63 47 7.22972 0 - 1.1573 > 27 16 81 65 9.62824 4.5 0.076456 4.9428 > 28 16 118 102 14.5693 148 0.427273 4.34095 > 29 16 119 103 14.2049 4 1.57897 4.31414 > 30 16 132 116 15.4645 52 2.25424 4.01492 > 31 16 133 117 15.0946 4 0.974652 3.98893 > 32 16 133 117 14.6229 0 - 3.98893 > Total time run: 32.575351 > Total writes made: 133 > Write size: 4194304 > Bandwidth (MB/sec): 16.331 > > Stddev Bandwidth: 31.8794 > Max bandwidth (MB/sec): 148 > Min bandwidth (MB/sec): 0 > Average Latency: 3.91583 > Stddev Latency: 7.42821 > Max latency: 25.24 > Min latency: 0.036475 > > Im think problem not in pg. This output of ceph pg dump > > http://pastebin.com/BqLsyMBC Well, that did improve it a bit; but yes, I think there's something else going on. Just wanted to verify. :) > > I have still no idea. > > All the best. Alex > > > > 2012/11/4 Gregory Farnum <greg@inktank.com>: >> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>> Hi all >>> >>> Im planning use ceph for cloud storage. >>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>> Centos 6.2 >>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>> This is my config http://pastebin.com/Pzxafnsm >>> journal on tmpfs >>> well, im create bench pool and test it: >>> ceph osd pool create bench >>> rados -p bench bench 30 write >>> >>> Total time run: 43.258228 >>> Total writes made: 151 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 13.963 >>> Stddev Bandwidth: 26.307 >>> Max bandwidth (MB/sec): 128 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 4.48605 >>> Stddev Latency: 8.17709 >>> Max latency: 29.7957 >>> Min latency: 0.039435 >>> >>> when i do rados -p bench bench 30 seq >>> Total time run: 20.626935 >>> Total reads made: 275 >>> Read size: 4194304 >>> Bandwidth (MB/sec): 53.328 >>> Average Latency: 1.19754 >>> Max latency: 7.0215 >>> Min latency: 0.011647 >>> >>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>> bs=1024k count=20000 >>> result: 158 MB/sec >>> >>> Anyone can tell me why such a weak performance? Maybe I missed something? >> >> Can you run "ceph tell osd \* bench" and report the results? (It'll go >> to the "central log" which you can keep an eye on if you run "ceph -w" >> in another terminal.) >> I think you also didn't create your bench pool correctly; it probably >> only has 8 PGs which is not going to perform very well with your disk >> count. Try "ceph pool create bench2 120" and run the benchmark against >> that pool. The extra number at the end tells it to create 120 >> placement groups. >> -Greg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:18 ` Gregory Farnum @ 2012-11-04 12:26 ` Aleksey Samarin 2012-11-04 12:39 ` Gregory Farnum 0 siblings, 1 reply; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 12:26 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org, Mike Ryan It`s ok! Output: 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 11.441035 sec at 91650 KB/sec 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 13.225048 sec at 79287 KB/sec 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 13.917157 sec at 75344 KB/sec 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 16.453375 sec at 63730 KB/sec 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 17.108887 sec at 61288 KB/sec 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 11.834639 sec at 88602 KB/sec 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 12.418276 sec at 84438 KB/sec 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 13.011955 sec at 80585 KB/sec 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 13.541710 sec at 77433 KB/sec All the best. 2012/11/4 Gregory Farnum <greg@inktank.com>: > [Sorry for the blank email; I missed!] > On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >> Hi! >> This command? ceph tell osd \* bench >> Output: tell target 'osd' not a valid entity name > > I guess it's "ceph osd tell \* bench". Try that one. :) > >> Well, i did pool by command ceph osd pool create bench2 120 >> This output of rados -p bench2 bench 30 write --no-cleanup >> >> rados -p bench2 bench 30 write --no-cleanup >> >> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. >> Object prefix: benchmark_data_host01_5827 >> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >> 0 0 0 0 0 0 - 0 >> 1 16 29 13 51.9885 52 0.489268 0.186749 >> 2 16 52 36 71.9866 92 1.87226 0.711888 >> 3 16 57 41 54.657 20 0.089697 0.697821 >> 4 16 60 44 43.9923 12 1.61868 0.765361 >> 5 16 60 44 35.1941 0 - 0.765361 >> 6 16 60 44 29.3285 0 - 0.765361 >> 7 16 60 44 25.1388 0 - 0.765361 >> 8 16 61 45 22.4964 1 5.89643 0.879384 >> 9 16 62 46 20.4412 4 6.0234 0.991211 >> 10 16 62 46 18.3971 0 - 0.991211 >> 11 16 63 47 17.0883 2 8.79749 1.1573 >> 12 16 63 47 15.6643 0 - 1.1573 >> 13 16 63 47 14.4593 0 - 1.1573 >> 14 16 63 47 13.4266 0 - 1.1573 >> 15 16 63 47 12.5315 0 - 1.1573 >> 16 16 63 47 11.7483 0 - 1.1573 >> 17 16 63 47 11.0572 0 - 1.1573 >> 18 16 63 47 10.4429 0 - 1.1573 >> 19 16 63 47 9.89331 0 - 1.1573 >> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 >> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >> 20 16 63 47 9.39865 0 - 1.1573 >> 21 16 63 47 8.95105 0 - 1.1573 >> 22 16 63 47 8.54419 0 - 1.1573 >> 23 16 63 47 8.17271 0 - 1.1573 >> 24 16 63 47 7.83218 0 - 1.1573 >> 25 16 63 47 7.5189 0 - 1.1573 >> 26 16 63 47 7.22972 0 - 1.1573 >> 27 16 81 65 9.62824 4.5 0.076456 4.9428 >> 28 16 118 102 14.5693 148 0.427273 4.34095 >> 29 16 119 103 14.2049 4 1.57897 4.31414 >> 30 16 132 116 15.4645 52 2.25424 4.01492 >> 31 16 133 117 15.0946 4 0.974652 3.98893 >> 32 16 133 117 14.6229 0 - 3.98893 >> Total time run: 32.575351 >> Total writes made: 133 >> Write size: 4194304 >> Bandwidth (MB/sec): 16.331 >> >> Stddev Bandwidth: 31.8794 >> Max bandwidth (MB/sec): 148 >> Min bandwidth (MB/sec): 0 >> Average Latency: 3.91583 >> Stddev Latency: 7.42821 >> Max latency: 25.24 >> Min latency: 0.036475 >> >> Im think problem not in pg. This output of ceph pg dump > >> http://pastebin.com/BqLsyMBC > > Well, that did improve it a bit; but yes, I think there's something > else going on. Just wanted to verify. :) > >> >> I have still no idea. >> >> All the best. Alex >> >> >> >> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>> Hi all >>>> >>>> Im planning use ceph for cloud storage. >>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>>> Centos 6.2 >>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>> This is my config http://pastebin.com/Pzxafnsm >>>> journal on tmpfs >>>> well, im create bench pool and test it: >>>> ceph osd pool create bench >>>> rados -p bench bench 30 write >>>> >>>> Total time run: 43.258228 >>>> Total writes made: 151 >>>> Write size: 4194304 >>>> Bandwidth (MB/sec): 13.963 >>>> Stddev Bandwidth: 26.307 >>>> Max bandwidth (MB/sec): 128 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 4.48605 >>>> Stddev Latency: 8.17709 >>>> Max latency: 29.7957 >>>> Min latency: 0.039435 >>>> >>>> when i do rados -p bench bench 30 seq >>>> Total time run: 20.626935 >>>> Total reads made: 275 >>>> Read size: 4194304 >>>> Bandwidth (MB/sec): 53.328 >>>> Average Latency: 1.19754 >>>> Max latency: 7.0215 >>>> Min latency: 0.011647 >>>> >>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>> bs=1024k count=20000 >>>> result: 158 MB/sec >>>> >>>> Anyone can tell me why such a weak performance? Maybe I missed something? >>> >>> Can you run "ceph tell osd \* bench" and report the results? (It'll go >>> to the "central log" which you can keep an eye on if you run "ceph -w" >>> in another terminal.) >>> I think you also didn't create your bench pool correctly; it probably >>> only has 8 PGs which is not going to perform very well with your disk >>> count. Try "ceph pool create bench2 120" and run the benchmark against >>> that pool. The extra number at the end tells it to create 120 >>> placement groups. >>> -Greg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:26 ` Aleksey Samarin @ 2012-11-04 12:39 ` Gregory Farnum 2012-11-04 12:52 ` Aleksey Samarin 0 siblings, 1 reply; 14+ messages in thread From: Gregory Farnum @ 2012-11-04 12:39 UTC (permalink / raw) To: Aleksey Samarin; +Cc: ceph-devel@vger.kernel.org, Mike Ryan That's only nine — where are the other three? If you have three slow disks that could definitely cause the troubles you're seeing. Also, what Mark said about sync versus syncfs. On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: > It`s ok! > > Output: > > 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 11.441035 sec at 91650 KB/sec > 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 13.225048 sec at 79287 KB/sec > 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 13.917157 sec at 75344 KB/sec > 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 16.453375 sec at 63730 KB/sec > 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 17.108887 sec at 61288 KB/sec > 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 11.834639 sec at 88602 KB/sec > 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 12.418276 sec at 84438 KB/sec > 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 13.011955 sec at 80585 KB/sec > 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks > of 4096 KB in 13.541710 sec at 77433 KB/sec > > All the best. > > 2012/11/4 Gregory Farnum <greg@inktank.com>: >> [Sorry for the blank email; I missed!] >> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>> Hi! >>> This command? ceph tell osd \* bench >>> Output: tell target 'osd' not a valid entity name >> >> I guess it's "ceph osd tell \* bench". Try that one. :) >> >>> Well, i did pool by command ceph osd pool create bench2 120 >>> This output of rados -p bench2 bench 30 write --no-cleanup >>> >>> rados -p bench2 bench 30 write --no-cleanup >>> >>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. >>> Object prefix: benchmark_data_host01_5827 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 16 29 13 51.9885 52 0.489268 0.186749 >>> 2 16 52 36 71.9866 92 1.87226 0.711888 >>> 3 16 57 41 54.657 20 0.089697 0.697821 >>> 4 16 60 44 43.9923 12 1.61868 0.765361 >>> 5 16 60 44 35.1941 0 - 0.765361 >>> 6 16 60 44 29.3285 0 - 0.765361 >>> 7 16 60 44 25.1388 0 - 0.765361 >>> 8 16 61 45 22.4964 1 5.89643 0.879384 >>> 9 16 62 46 20.4412 4 6.0234 0.991211 >>> 10 16 62 46 18.3971 0 - 0.991211 >>> 11 16 63 47 17.0883 2 8.79749 1.1573 >>> 12 16 63 47 15.6643 0 - 1.1573 >>> 13 16 63 47 14.4593 0 - 1.1573 >>> 14 16 63 47 13.4266 0 - 1.1573 >>> 15 16 63 47 12.5315 0 - 1.1573 >>> 16 16 63 47 11.7483 0 - 1.1573 >>> 17 16 63 47 11.0572 0 - 1.1573 >>> 18 16 63 47 10.4429 0 - 1.1573 >>> 19 16 63 47 9.89331 0 - 1.1573 >>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 20 16 63 47 9.39865 0 - 1.1573 >>> 21 16 63 47 8.95105 0 - 1.1573 >>> 22 16 63 47 8.54419 0 - 1.1573 >>> 23 16 63 47 8.17271 0 - 1.1573 >>> 24 16 63 47 7.83218 0 - 1.1573 >>> 25 16 63 47 7.5189 0 - 1.1573 >>> 26 16 63 47 7.22972 0 - 1.1573 >>> 27 16 81 65 9.62824 4.5 0.076456 4.9428 >>> 28 16 118 102 14.5693 148 0.427273 4.34095 >>> 29 16 119 103 14.2049 4 1.57897 4.31414 >>> 30 16 132 116 15.4645 52 2.25424 4.01492 >>> 31 16 133 117 15.0946 4 0.974652 3.98893 >>> 32 16 133 117 14.6229 0 - 3.98893 >>> Total time run: 32.575351 >>> Total writes made: 133 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 16.331 >>> >>> Stddev Bandwidth: 31.8794 >>> Max bandwidth (MB/sec): 148 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 3.91583 >>> Stddev Latency: 7.42821 >>> Max latency: 25.24 >>> Min latency: 0.036475 >>> >>> Im think problem not in pg. This output of ceph pg dump > >>> http://pastebin.com/BqLsyMBC >> >> Well, that did improve it a bit; but yes, I think there's something >> else going on. Just wanted to verify. :) >> >>> >>> I have still no idea. >>> >>> All the best. Alex >>> >>> >>> >>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>> Hi all >>>>> >>>>> Im planning use ceph for cloud storage. >>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>>>> Centos 6.2 >>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>> This is my config http://pastebin.com/Pzxafnsm >>>>> journal on tmpfs >>>>> well, im create bench pool and test it: >>>>> ceph osd pool create bench >>>>> rados -p bench bench 30 write >>>>> >>>>> Total time run: 43.258228 >>>>> Total writes made: 151 >>>>> Write size: 4194304 >>>>> Bandwidth (MB/sec): 13.963 >>>>> Stddev Bandwidth: 26.307 >>>>> Max bandwidth (MB/sec): 128 >>>>> Min bandwidth (MB/sec): 0 >>>>> Average Latency: 4.48605 >>>>> Stddev Latency: 8.17709 >>>>> Max latency: 29.7957 >>>>> Min latency: 0.039435 >>>>> >>>>> when i do rados -p bench bench 30 seq >>>>> Total time run: 20.626935 >>>>> Total reads made: 275 >>>>> Read size: 4194304 >>>>> Bandwidth (MB/sec): 53.328 >>>>> Average Latency: 1.19754 >>>>> Max latency: 7.0215 >>>>> Min latency: 0.011647 >>>>> >>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>> bs=1024k count=20000 >>>>> result: 158 MB/sec >>>>> >>>>> Anyone can tell me why such a weak performance? Maybe I missed something? >>>> >>>> Can you run "ceph tell osd \* bench" and report the results? (It'll go >>>> to the "central log" which you can keep an eye on if you run "ceph -w" >>>> in another terminal.) >>>> I think you also didn't create your bench pool correctly; it probably >>>> only has 8 PGs which is not going to perform very well with your disk >>>> count. Try "ceph pool create bench2 120" and run the benchmark against >>>> that pool. The extra number at the end tells it to create 120 >>>> placement groups. >>>> -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:39 ` Gregory Farnum @ 2012-11-04 12:52 ` Aleksey Samarin 2012-11-04 13:18 ` Aleksey Samarin 0 siblings, 1 reply; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 12:52 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org, Mike Ryan Ok! Well, I'll take these tests and write about the results. btw, disks are the same, as some may be faster than others? 2012/11/4 Gregory Farnum <greg@inktank.com>: > That's only nine — where are the other three? If you have three slow > disks that could definitely cause the troubles you're seeing. > > Also, what Mark said about sync versus syncfs. > > On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >> It`s ok! >> >> Output: >> >> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 11.441035 sec at 91650 KB/sec >> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 13.225048 sec at 79287 KB/sec >> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 13.917157 sec at 75344 KB/sec >> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 16.453375 sec at 63730 KB/sec >> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 17.108887 sec at 61288 KB/sec >> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 11.834639 sec at 88602 KB/sec >> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 12.418276 sec at 84438 KB/sec >> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 13.011955 sec at 80585 KB/sec >> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >> of 4096 KB in 13.541710 sec at 77433 KB/sec >> >> All the best. >> >> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>> [Sorry for the blank email; I missed!] >>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>> Hi! >>>> This command? ceph tell osd \* bench >>>> Output: tell target 'osd' not a valid entity name >>> >>> I guess it's "ceph osd tell \* bench". Try that one. :) >>> >>>> Well, i did pool by command ceph osd pool create bench2 120 >>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>> >>>> rados -p bench2 bench 30 write --no-cleanup >>>> >>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. >>>> Object prefix: benchmark_data_host01_5827 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>> 0 0 0 0 0 0 - 0 >>>> 1 16 29 13 51.9885 52 0.489268 0.186749 >>>> 2 16 52 36 71.9866 92 1.87226 0.711888 >>>> 3 16 57 41 54.657 20 0.089697 0.697821 >>>> 4 16 60 44 43.9923 12 1.61868 0.765361 >>>> 5 16 60 44 35.1941 0 - 0.765361 >>>> 6 16 60 44 29.3285 0 - 0.765361 >>>> 7 16 60 44 25.1388 0 - 0.765361 >>>> 8 16 61 45 22.4964 1 5.89643 0.879384 >>>> 9 16 62 46 20.4412 4 6.0234 0.991211 >>>> 10 16 62 46 18.3971 0 - 0.991211 >>>> 11 16 63 47 17.0883 2 8.79749 1.1573 >>>> 12 16 63 47 15.6643 0 - 1.1573 >>>> 13 16 63 47 14.4593 0 - 1.1573 >>>> 14 16 63 47 13.4266 0 - 1.1573 >>>> 15 16 63 47 12.5315 0 - 1.1573 >>>> 16 16 63 47 11.7483 0 - 1.1573 >>>> 17 16 63 47 11.0572 0 - 1.1573 >>>> 18 16 63 47 10.4429 0 - 1.1573 >>>> 19 16 63 47 9.89331 0 - 1.1573 >>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>> 20 16 63 47 9.39865 0 - 1.1573 >>>> 21 16 63 47 8.95105 0 - 1.1573 >>>> 22 16 63 47 8.54419 0 - 1.1573 >>>> 23 16 63 47 8.17271 0 - 1.1573 >>>> 24 16 63 47 7.83218 0 - 1.1573 >>>> 25 16 63 47 7.5189 0 - 1.1573 >>>> 26 16 63 47 7.22972 0 - 1.1573 >>>> 27 16 81 65 9.62824 4.5 0.076456 4.9428 >>>> 28 16 118 102 14.5693 148 0.427273 4.34095 >>>> 29 16 119 103 14.2049 4 1.57897 4.31414 >>>> 30 16 132 116 15.4645 52 2.25424 4.01492 >>>> 31 16 133 117 15.0946 4 0.974652 3.98893 >>>> 32 16 133 117 14.6229 0 - 3.98893 >>>> Total time run: 32.575351 >>>> Total writes made: 133 >>>> Write size: 4194304 >>>> Bandwidth (MB/sec): 16.331 >>>> >>>> Stddev Bandwidth: 31.8794 >>>> Max bandwidth (MB/sec): 148 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 3.91583 >>>> Stddev Latency: 7.42821 >>>> Max latency: 25.24 >>>> Min latency: 0.036475 >>>> >>>> Im think problem not in pg. This output of ceph pg dump > >>>> http://pastebin.com/BqLsyMBC >>> >>> Well, that did improve it a bit; but yes, I think there's something >>> else going on. Just wanted to verify. :) >>> >>>> >>>> I have still no idea. >>>> >>>> All the best. Alex >>>> >>>> >>>> >>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>>> Hi all >>>>>> >>>>>> Im planning use ceph for cloud storage. >>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>>>>> Centos 6.2 >>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>> journal on tmpfs >>>>>> well, im create bench pool and test it: >>>>>> ceph osd pool create bench >>>>>> rados -p bench bench 30 write >>>>>> >>>>>> Total time run: 43.258228 >>>>>> Total writes made: 151 >>>>>> Write size: 4194304 >>>>>> Bandwidth (MB/sec): 13.963 >>>>>> Stddev Bandwidth: 26.307 >>>>>> Max bandwidth (MB/sec): 128 >>>>>> Min bandwidth (MB/sec): 0 >>>>>> Average Latency: 4.48605 >>>>>> Stddev Latency: 8.17709 >>>>>> Max latency: 29.7957 >>>>>> Min latency: 0.039435 >>>>>> >>>>>> when i do rados -p bench bench 30 seq >>>>>> Total time run: 20.626935 >>>>>> Total reads made: 275 >>>>>> Read size: 4194304 >>>>>> Bandwidth (MB/sec): 53.328 >>>>>> Average Latency: 1.19754 >>>>>> Max latency: 7.0215 >>>>>> Min latency: 0.011647 >>>>>> >>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>> bs=1024k count=20000 >>>>>> result: 158 MB/sec >>>>>> >>>>>> Anyone can tell me why such a weak performance? Maybe I missed something? >>>>> >>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll go >>>>> to the "central log" which you can keep an eye on if you run "ceph -w" >>>>> in another terminal.) >>>>> I think you also didn't create your bench pool correctly; it probably >>>>> only has 8 PGs which is not going to perform very well with your disk >>>>> count. Try "ceph pool create bench2 120" and run the benchmark against >>>>> that pool. The extra number at the end tells it to create 120 >>>>> placement groups. >>>>> -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 12:52 ` Aleksey Samarin @ 2012-11-04 13:18 ` Aleksey Samarin 2012-11-04 13:52 ` Mark Nelson 0 siblings, 1 reply; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 13:18 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org, Mike Ryan Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. here is what I did: ceph osd pool create bench ceph osd tell \* bench rados -p bench bench 30 write --no-cleanup output: Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. Object prefix: benchmark_data_host01_11635 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 16 0 0 0 - 0 2 16 37 21 41.9911 42 0.139005 1.08941 3 16 53 37 49.3243 64 0.754114 1.09392 4 16 75 59 58.9893 88 0.284647 0.914221 5 16 89 73 58.3896 56 0.072228 0.881008 6 16 95 79 52.6575 24 1.56959 0.961477 7 16 111 95 54.2764 64 0.046105 1.08791 8 16 128 112 55.9906 68 0.035714 1.04594 9 16 150 134 59.5457 88 0.046298 1.04415 10 16 166 150 59.9901 64 0.048635 0.986384 11 16 176 160 58.1723 40 0.727784 0.988408 12 16 206 190 63.3231 120 0.28869 0.946624 13 16 225 209 64.2976 76 1.34472 0.919464 14 16 263 247 70.5605 152 0.070926 0.90046 15 16 295 279 74.3887 128 0.041517 0.830466 16 16 315 299 74.7388 80 0.296037 0.841527 17 16 333 317 74.5772 72 0.286097 0.849558 18 16 340 324 71.9891 28 0.295084 0.83922 19 16 343 327 68.8317 12 1.46948 0.845797 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: 0.861539 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 20 16 378 362 72.389 140 0.566232 0.861539 21 16 400 384 73.1313 88 0.038835 0.857785 22 16 404 388 70.5344 16 0.801216 0.857002 23 16 413 397 69.0327 36 0.062256 0.86376 24 16 428 412 68.6543 60 0.042583 0.89389 25 16 450 434 69.4277 88 0.383877 0.905833 26 16 472 456 70.1415 88 0.269878 0.898023 27 16 472 456 67.5437 0 - 0.898023 28 16 512 496 70.8448 80 0.056798 0.891163 29 16 530 514 70.8843 72 1.20653 0.898112 30 16 542 526 70.1212 48 0.744383 0.890733 Total time run: 30.174151 Total writes made: 543 Write size: 4194304 Bandwidth (MB/sec): 71.982 Stddev Bandwidth: 38.318 Max bandwidth (MB/sec): 152 Min bandwidth (MB/sec): 0 Average Latency: 0.889026 Stddev Latency: 0.677425 Max latency: 2.94467 Min latency: 0.035714 2012/11/4 Aleksey Samarin <nrg3tik@gmail.com>: > Ok! > Well, I'll take these tests and write about the results. > > btw, > disks are the same, as some may be faster than others? > > 2012/11/4 Gregory Farnum <greg@inktank.com>: >> That's only nine — where are the other three? If you have three slow >> disks that could definitely cause the troubles you're seeing. >> >> Also, what Mark said about sync versus syncfs. >> >> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>> It`s ok! >>> >>> Output: >>> >>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>> >>> All the best. >>> >>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>> [Sorry for the blank email; I missed!] >>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>> Hi! >>>>> This command? ceph tell osd \* bench >>>>> Output: tell target 'osd' not a valid entity name >>>> >>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>> >>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>> >>>>> rados -p bench2 bench 30 write --no-cleanup >>>>> >>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. >>>>> Object prefix: benchmark_data_host01_5827 >>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>>> 0 0 0 0 0 0 - 0 >>>>> 1 16 29 13 51.9885 52 0.489268 0.186749 >>>>> 2 16 52 36 71.9866 92 1.87226 0.711888 >>>>> 3 16 57 41 54.657 20 0.089697 0.697821 >>>>> 4 16 60 44 43.9923 12 1.61868 0.765361 >>>>> 5 16 60 44 35.1941 0 - 0.765361 >>>>> 6 16 60 44 29.3285 0 - 0.765361 >>>>> 7 16 60 44 25.1388 0 - 0.765361 >>>>> 8 16 61 45 22.4964 1 5.89643 0.879384 >>>>> 9 16 62 46 20.4412 4 6.0234 0.991211 >>>>> 10 16 62 46 18.3971 0 - 0.991211 >>>>> 11 16 63 47 17.0883 2 8.79749 1.1573 >>>>> 12 16 63 47 15.6643 0 - 1.1573 >>>>> 13 16 63 47 14.4593 0 - 1.1573 >>>>> 14 16 63 47 13.4266 0 - 1.1573 >>>>> 15 16 63 47 12.5315 0 - 1.1573 >>>>> 16 16 63 47 11.7483 0 - 1.1573 >>>>> 17 16 63 47 11.0572 0 - 1.1573 >>>>> 18 16 63 47 10.4429 0 - 1.1573 >>>>> 19 16 63 47 9.89331 0 - 1.1573 >>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 >>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>>> 20 16 63 47 9.39865 0 - 1.1573 >>>>> 21 16 63 47 8.95105 0 - 1.1573 >>>>> 22 16 63 47 8.54419 0 - 1.1573 >>>>> 23 16 63 47 8.17271 0 - 1.1573 >>>>> 24 16 63 47 7.83218 0 - 1.1573 >>>>> 25 16 63 47 7.5189 0 - 1.1573 >>>>> 26 16 63 47 7.22972 0 - 1.1573 >>>>> 27 16 81 65 9.62824 4.5 0.076456 4.9428 >>>>> 28 16 118 102 14.5693 148 0.427273 4.34095 >>>>> 29 16 119 103 14.2049 4 1.57897 4.31414 >>>>> 30 16 132 116 15.4645 52 2.25424 4.01492 >>>>> 31 16 133 117 15.0946 4 0.974652 3.98893 >>>>> 32 16 133 117 14.6229 0 - 3.98893 >>>>> Total time run: 32.575351 >>>>> Total writes made: 133 >>>>> Write size: 4194304 >>>>> Bandwidth (MB/sec): 16.331 >>>>> >>>>> Stddev Bandwidth: 31.8794 >>>>> Max bandwidth (MB/sec): 148 >>>>> Min bandwidth (MB/sec): 0 >>>>> Average Latency: 3.91583 >>>>> Stddev Latency: 7.42821 >>>>> Max latency: 25.24 >>>>> Min latency: 0.036475 >>>>> >>>>> Im think problem not in pg. This output of ceph pg dump > >>>>> http://pastebin.com/BqLsyMBC >>>> >>>> Well, that did improve it a bit; but yes, I think there's something >>>> else going on. Just wanted to verify. :) >>>> >>>>> >>>>> I have still no idea. >>>>> >>>>> All the best. Alex >>>>> >>>>> >>>>> >>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>>>> Hi all >>>>>>> >>>>>>> Im planning use ceph for cloud storage. >>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>>>>>> Centos 6.2 >>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>> journal on tmpfs >>>>>>> well, im create bench pool and test it: >>>>>>> ceph osd pool create bench >>>>>>> rados -p bench bench 30 write >>>>>>> >>>>>>> Total time run: 43.258228 >>>>>>> Total writes made: 151 >>>>>>> Write size: 4194304 >>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>> Stddev Bandwidth: 26.307 >>>>>>> Max bandwidth (MB/sec): 128 >>>>>>> Min bandwidth (MB/sec): 0 >>>>>>> Average Latency: 4.48605 >>>>>>> Stddev Latency: 8.17709 >>>>>>> Max latency: 29.7957 >>>>>>> Min latency: 0.039435 >>>>>>> >>>>>>> when i do rados -p bench bench 30 seq >>>>>>> Total time run: 20.626935 >>>>>>> Total reads made: 275 >>>>>>> Read size: 4194304 >>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>> Average Latency: 1.19754 >>>>>>> Max latency: 7.0215 >>>>>>> Min latency: 0.011647 >>>>>>> >>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>> bs=1024k count=20000 >>>>>>> result: 158 MB/sec >>>>>>> >>>>>>> Anyone can tell me why such a weak performance? Maybe I missed something? >>>>>> >>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll go >>>>>> to the "central log" which you can keep an eye on if you run "ceph -w" >>>>>> in another terminal.) >>>>>> I think you also didn't create your bench pool correctly; it probably >>>>>> only has 8 PGs which is not going to perform very well with your disk >>>>>> count. Try "ceph pool create bench2 120" and run the benchmark against >>>>>> that pool. The extra number at the end tells it to create 120 >>>>>> placement groups. >>>>>> -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 13:18 ` Aleksey Samarin @ 2012-11-04 13:52 ` Mark Nelson 2012-11-04 15:13 ` Aleksey Samarin 0 siblings, 1 reply; 14+ messages in thread From: Mark Nelson @ 2012-11-04 13:52 UTC (permalink / raw) To: Aleksey Samarin; +Cc: Gregory Farnum, ceph-devel@vger.kernel.org, Mike Ryan On 11/04/2012 07:18 AM, Aleksey Samarin wrote: > Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. > here is what I did: > ceph osd pool create bench > ceph osd tell \* bench > rados -p bench bench 30 write --no-cleanup > output: > > Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. > Object prefix: benchmark_data_host01_11635 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 0 0 0 0 0 0 - 0 > 1 16 16 0 0 0 - 0 > 2 16 37 21 41.9911 42 0.139005 1.08941 > 3 16 53 37 49.3243 64 0.754114 1.09392 > 4 16 75 59 58.9893 88 0.284647 0.914221 > 5 16 89 73 58.3896 56 0.072228 0.881008 > 6 16 95 79 52.6575 24 1.56959 0.961477 > 7 16 111 95 54.2764 64 0.046105 1.08791 > 8 16 128 112 55.9906 68 0.035714 1.04594 > 9 16 150 134 59.5457 88 0.046298 1.04415 > 10 16 166 150 59.9901 64 0.048635 0.986384 > 11 16 176 160 58.1723 40 0.727784 0.988408 > 12 16 206 190 63.3231 120 0.28869 0.946624 > 13 16 225 209 64.2976 76 1.34472 0.919464 > 14 16 263 247 70.5605 152 0.070926 0.90046 > 15 16 295 279 74.3887 128 0.041517 0.830466 > 16 16 315 299 74.7388 80 0.296037 0.841527 > 17 16 333 317 74.5772 72 0.286097 0.849558 > 18 16 340 324 71.9891 28 0.295084 0.83922 > 19 16 343 327 68.8317 12 1.46948 0.845797 > 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: 0.861539 > sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > 20 16 378 362 72.389 140 0.566232 0.861539 > 21 16 400 384 73.1313 88 0.038835 0.857785 > 22 16 404 388 70.5344 16 0.801216 0.857002 > 23 16 413 397 69.0327 36 0.062256 0.86376 > 24 16 428 412 68.6543 60 0.042583 0.89389 > 25 16 450 434 69.4277 88 0.383877 0.905833 > 26 16 472 456 70.1415 88 0.269878 0.898023 > 27 16 472 456 67.5437 0 - 0.898023 > 28 16 512 496 70.8448 80 0.056798 0.891163 > 29 16 530 514 70.8843 72 1.20653 0.898112 > 30 16 542 526 70.1212 48 0.744383 0.890733 > Total time run: 30.174151 > Total writes made: 543 > Write size: 4194304 > Bandwidth (MB/sec): 71.982 > > Stddev Bandwidth: 38.318 > Max bandwidth (MB/sec): 152 > Min bandwidth (MB/sec): 0 > Average Latency: 0.889026 > Stddev Latency: 0.677425 > Max latency: 2.94467 > Min latency: 0.035714 > Much better for 1 disk per node! I suspect that lack of syncfs is hurting you, or perhaps some other issue with writes to lots of disks at the same time. > > 2012/11/4 Aleksey Samarin <nrg3tik@gmail.com>: >> Ok! >> Well, I'll take these tests and write about the results. >> >> btw, >> disks are the same, as some may be faster than others? >> >> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>> That's only nine — where are the other three? If you have three slow >>> disks that could definitely cause the troubles you're seeing. >>> >>> Also, what Mark said about sync versus syncfs. >>> >>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>> It`s ok! >>>> >>>> Output: >>>> >>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>> >>>> All the best. >>>> >>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>> [Sorry for the blank email; I missed!] >>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>>> Hi! >>>>>> This command? ceph tell osd \* bench >>>>>> Output: tell target 'osd' not a valid entity name >>>>> >>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>> >>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>> >>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>> >>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 seconds. >>>>>> Object prefix: benchmark_data_host01_5827 >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>>>> 0 0 0 0 0 0 - 0 >>>>>> 1 16 29 13 51.9885 52 0.489268 0.186749 >>>>>> 2 16 52 36 71.9866 92 1.87226 0.711888 >>>>>> 3 16 57 41 54.657 20 0.089697 0.697821 >>>>>> 4 16 60 44 43.9923 12 1.61868 0.765361 >>>>>> 5 16 60 44 35.1941 0 - 0.765361 >>>>>> 6 16 60 44 29.3285 0 - 0.765361 >>>>>> 7 16 60 44 25.1388 0 - 0.765361 >>>>>> 8 16 61 45 22.4964 1 5.89643 0.879384 >>>>>> 9 16 62 46 20.4412 4 6.0234 0.991211 >>>>>> 10 16 62 46 18.3971 0 - 0.991211 >>>>>> 11 16 63 47 17.0883 2 8.79749 1.1573 >>>>>> 12 16 63 47 15.6643 0 - 1.1573 >>>>>> 13 16 63 47 14.4593 0 - 1.1573 >>>>>> 14 16 63 47 13.4266 0 - 1.1573 >>>>>> 15 16 63 47 12.5315 0 - 1.1573 >>>>>> 16 16 63 47 11.7483 0 - 1.1573 >>>>>> 17 16 63 47 11.0572 0 - 1.1573 >>>>>> 18 16 63 47 10.4429 0 - 1.1573 >>>>>> 19 16 63 47 9.89331 0 - 1.1573 >>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: 1.1573 >>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>>>> 20 16 63 47 9.39865 0 - 1.1573 >>>>>> 21 16 63 47 8.95105 0 - 1.1573 >>>>>> 22 16 63 47 8.54419 0 - 1.1573 >>>>>> 23 16 63 47 8.17271 0 - 1.1573 >>>>>> 24 16 63 47 7.83218 0 - 1.1573 >>>>>> 25 16 63 47 7.5189 0 - 1.1573 >>>>>> 26 16 63 47 7.22972 0 - 1.1573 >>>>>> 27 16 81 65 9.62824 4.5 0.076456 4.9428 >>>>>> 28 16 118 102 14.5693 148 0.427273 4.34095 >>>>>> 29 16 119 103 14.2049 4 1.57897 4.31414 >>>>>> 30 16 132 116 15.4645 52 2.25424 4.01492 >>>>>> 31 16 133 117 15.0946 4 0.974652 3.98893 >>>>>> 32 16 133 117 14.6229 0 - 3.98893 >>>>>> Total time run: 32.575351 >>>>>> Total writes made: 133 >>>>>> Write size: 4194304 >>>>>> Bandwidth (MB/sec): 16.331 >>>>>> >>>>>> Stddev Bandwidth: 31.8794 >>>>>> Max bandwidth (MB/sec): 148 >>>>>> Min bandwidth (MB/sec): 0 >>>>>> Average Latency: 3.91583 >>>>>> Stddev Latency: 7.42821 >>>>>> Max latency: 25.24 >>>>>> Min latency: 0.036475 >>>>>> >>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>> http://pastebin.com/BqLsyMBC >>>>> >>>>> Well, that did improve it a bit; but yes, I think there's something >>>>> else going on. Just wanted to verify. :) >>>>> >>>>>> >>>>>> I have still no idea. >>>>>> >>>>>> All the best. Alex >>>>>> >>>>>> >>>>>> >>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >>>>>>>> Hi all >>>>>>>> >>>>>>>> Im planning use ceph for cloud storage. >>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. >>>>>>>> Centos 6.2 >>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>> journal on tmpfs >>>>>>>> well, im create bench pool and test it: >>>>>>>> ceph osd pool create bench >>>>>>>> rados -p bench bench 30 write >>>>>>>> >>>>>>>> Total time run: 43.258228 >>>>>>>> Total writes made: 151 >>>>>>>> Write size: 4194304 >>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>> Average Latency: 4.48605 >>>>>>>> Stddev Latency: 8.17709 >>>>>>>> Max latency: 29.7957 >>>>>>>> Min latency: 0.039435 >>>>>>>> >>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>> Total time run: 20.626935 >>>>>>>> Total reads made: 275 >>>>>>>> Read size: 4194304 >>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>> Average Latency: 1.19754 >>>>>>>> Max latency: 7.0215 >>>>>>>> Min latency: 0.011647 >>>>>>>> >>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>> bs=1024k count=20000 >>>>>>>> result: 158 MB/sec >>>>>>>> >>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed something? >>>>>>> >>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll go >>>>>>> to the "central log" which you can keep an eye on if you run "ceph -w" >>>>>>> in another terminal.) >>>>>>> I think you also didn't create your bench pool correctly; it probably >>>>>>> only has 8 PGs which is not going to perform very well with your disk >>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark against >>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>> placement groups. >>>>>>> -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 13:52 ` Mark Nelson @ 2012-11-04 15:13 ` Aleksey Samarin 2012-11-15 21:16 ` Gregory Farnum 0 siblings, 1 reply; 14+ messages in thread From: Aleksey Samarin @ 2012-11-04 15:13 UTC (permalink / raw) To: Mark Nelson; +Cc: Gregory Farnum, ceph-devel@vger.kernel.org, Mike Ryan What may be possible solutions? Update centos to 6.3? About issue with writes to lots of disk, i think parallel dd command will be good as test! :) 2012/11/4 Mark Nelson <mark.nelson@inktank.com>: > On 11/04/2012 07:18 AM, Aleksey Samarin wrote: >> >> Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. >> here is what I did: >> ceph osd pool create bench >> ceph osd tell \* bench >> rados -p bench bench 30 write --no-cleanup >> output: >> >> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >> seconds. >> Object prefix: benchmark_data_host01_11635 >> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >> 0 0 0 0 0 0 - >> 0 >> 1 16 16 0 0 0 - >> 0 >> 2 16 37 21 41.9911 42 0.139005 >> 1.08941 >> 3 16 53 37 49.3243 64 0.754114 >> 1.09392 >> 4 16 75 59 58.9893 88 0.284647 >> 0.914221 >> 5 16 89 73 58.3896 56 0.072228 >> 0.881008 >> 6 16 95 79 52.6575 24 1.56959 >> 0.961477 >> 7 16 111 95 54.2764 64 0.046105 >> 1.08791 >> 8 16 128 112 55.9906 68 0.035714 >> 1.04594 >> 9 16 150 134 59.5457 88 0.046298 >> 1.04415 >> 10 16 166 150 59.9901 64 0.048635 >> 0.986384 >> 11 16 176 160 58.1723 40 0.727784 >> 0.988408 >> 12 16 206 190 63.3231 120 0.28869 >> 0.946624 >> 13 16 225 209 64.2976 76 1.34472 >> 0.919464 >> 14 16 263 247 70.5605 152 0.070926 >> 0.90046 >> 15 16 295 279 74.3887 128 0.041517 >> 0.830466 >> 16 16 315 299 74.7388 80 0.296037 >> 0.841527 >> 17 16 333 317 74.5772 72 0.286097 >> 0.849558 >> 18 16 340 324 71.9891 28 0.295084 >> 0.83922 >> 19 16 343 327 68.8317 12 1.46948 >> 0.845797 >> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: >> 0.861539 >> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >> 20 16 378 362 72.389 140 0.566232 >> 0.861539 >> 21 16 400 384 73.1313 88 0.038835 >> 0.857785 >> 22 16 404 388 70.5344 16 0.801216 >> 0.857002 >> 23 16 413 397 69.0327 36 0.062256 >> 0.86376 >> 24 16 428 412 68.6543 60 0.042583 >> 0.89389 >> 25 16 450 434 69.4277 88 0.383877 >> 0.905833 >> 26 16 472 456 70.1415 88 0.269878 >> 0.898023 >> 27 16 472 456 67.5437 0 - >> 0.898023 >> 28 16 512 496 70.8448 80 0.056798 >> 0.891163 >> 29 16 530 514 70.8843 72 1.20653 >> 0.898112 >> 30 16 542 526 70.1212 48 0.744383 >> 0.890733 >> Total time run: 30.174151 >> Total writes made: 543 >> Write size: 4194304 >> Bandwidth (MB/sec): 71.982 >> >> Stddev Bandwidth: 38.318 >> Max bandwidth (MB/sec): 152 >> Min bandwidth (MB/sec): 0 >> Average Latency: 0.889026 >> Stddev Latency: 0.677425 >> Max latency: 2.94467 >> Min latency: 0.035714 >> > > Much better for 1 disk per node! I suspect that lack of syncfs is hurting > you, or perhaps some other issue with writes to lots of disks at the same > time. > > >> >> 2012/11/4 Aleksey Samarin <nrg3tik@gmail.com>: >>> >>> Ok! >>> Well, I'll take these tests and write about the results. >>> >>> btw, >>> disks are the same, as some may be faster than others? >>> >>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>> >>>> That's only nine — where are the other three? If you have three slow >>>> disks that could definitely cause the troubles you're seeing. >>>> >>>> Also, what Mark said about sync versus syncfs. >>>> >>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>> wrote: >>>>> >>>>> It`s ok! >>>>> >>>>> Output: >>>>> >>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>>> >>>>> All the best. >>>>> >>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>> >>>>>> [Sorry for the blank email; I missed!] >>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Hi! >>>>>>> This command? ceph tell osd \* bench >>>>>>> Output: tell target 'osd' not a valid entity name >>>>>> >>>>>> >>>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>>> >>>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>>> >>>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>>> >>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>>>>> seconds. >>>>>>> Object prefix: benchmark_data_host01_5827 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>> avg lat >>>>>>> 0 0 0 0 0 0 - >>>>>>> 0 >>>>>>> 1 16 29 13 51.9885 52 0.489268 >>>>>>> 0.186749 >>>>>>> 2 16 52 36 71.9866 92 1.87226 >>>>>>> 0.711888 >>>>>>> 3 16 57 41 54.657 20 0.089697 >>>>>>> 0.697821 >>>>>>> 4 16 60 44 43.9923 12 1.61868 >>>>>>> 0.765361 >>>>>>> 5 16 60 44 35.1941 0 - >>>>>>> 0.765361 >>>>>>> 6 16 60 44 29.3285 0 - >>>>>>> 0.765361 >>>>>>> 7 16 60 44 25.1388 0 - >>>>>>> 0.765361 >>>>>>> 8 16 61 45 22.4964 1 5.89643 >>>>>>> 0.879384 >>>>>>> 9 16 62 46 20.4412 4 6.0234 >>>>>>> 0.991211 >>>>>>> 10 16 62 46 18.3971 0 - >>>>>>> 0.991211 >>>>>>> 11 16 63 47 17.0883 2 8.79749 >>>>>>> 1.1573 >>>>>>> 12 16 63 47 15.6643 0 - >>>>>>> 1.1573 >>>>>>> 13 16 63 47 14.4593 0 - >>>>>>> 1.1573 >>>>>>> 14 16 63 47 13.4266 0 - >>>>>>> 1.1573 >>>>>>> 15 16 63 47 12.5315 0 - >>>>>>> 1.1573 >>>>>>> 16 16 63 47 11.7483 0 - >>>>>>> 1.1573 >>>>>>> 17 16 63 47 11.0572 0 - >>>>>>> 1.1573 >>>>>>> 18 16 63 47 10.4429 0 - >>>>>>> 1.1573 >>>>>>> 19 16 63 47 9.89331 0 - >>>>>>> 1.1573 >>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: >>>>>>> 1.1573 >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>> avg lat >>>>>>> 20 16 63 47 9.39865 0 - >>>>>>> 1.1573 >>>>>>> 21 16 63 47 8.95105 0 - >>>>>>> 1.1573 >>>>>>> 22 16 63 47 8.54419 0 - >>>>>>> 1.1573 >>>>>>> 23 16 63 47 8.17271 0 - >>>>>>> 1.1573 >>>>>>> 24 16 63 47 7.83218 0 - >>>>>>> 1.1573 >>>>>>> 25 16 63 47 7.5189 0 - >>>>>>> 1.1573 >>>>>>> 26 16 63 47 7.22972 0 - >>>>>>> 1.1573 >>>>>>> 27 16 81 65 9.62824 4.5 0.076456 >>>>>>> 4.9428 >>>>>>> 28 16 118 102 14.5693 148 0.427273 >>>>>>> 4.34095 >>>>>>> 29 16 119 103 14.2049 4 1.57897 >>>>>>> 4.31414 >>>>>>> 30 16 132 116 15.4645 52 2.25424 >>>>>>> 4.01492 >>>>>>> 31 16 133 117 15.0946 4 0.974652 >>>>>>> 3.98893 >>>>>>> 32 16 133 117 14.6229 0 - >>>>>>> 3.98893 >>>>>>> Total time run: 32.575351 >>>>>>> Total writes made: 133 >>>>>>> Write size: 4194304 >>>>>>> Bandwidth (MB/sec): 16.331 >>>>>>> >>>>>>> Stddev Bandwidth: 31.8794 >>>>>>> Max bandwidth (MB/sec): 148 >>>>>>> Min bandwidth (MB/sec): 0 >>>>>>> Average Latency: 3.91583 >>>>>>> Stddev Latency: 7.42821 >>>>>>> Max latency: 25.24 >>>>>>> Min latency: 0.036475 >>>>>>> >>>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>>> http://pastebin.com/BqLsyMBC >>>>>> >>>>>> >>>>>> Well, that did improve it a bit; but yes, I think there's something >>>>>> else going on. Just wanted to verify. :) >>>>>> >>>>>>> >>>>>>> I have still no idea. >>>>>>> >>>>>>> All the best. Alex >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>>> >>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> Im planning use ceph for cloud storage. >>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb >>>>>>>>> disks per node. >>>>>>>>> Centos 6.2 >>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>>> journal on tmpfs >>>>>>>>> well, im create bench pool and test it: >>>>>>>>> ceph osd pool create bench >>>>>>>>> rados -p bench bench 30 write >>>>>>>>> >>>>>>>>> Total time run: 43.258228 >>>>>>>>> Total writes made: 151 >>>>>>>>> Write size: 4194304 >>>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>> Average Latency: 4.48605 >>>>>>>>> Stddev Latency: 8.17709 >>>>>>>>> Max latency: 29.7957 >>>>>>>>> Min latency: 0.039435 >>>>>>>>> >>>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>>> Total time run: 20.626935 >>>>>>>>> Total reads made: 275 >>>>>>>>> Read size: 4194304 >>>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>>> Average Latency: 1.19754 >>>>>>>>> Max latency: 7.0215 >>>>>>>>> Min latency: 0.011647 >>>>>>>>> >>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>>> bs=1024k count=20000 >>>>>>>>> result: 158 MB/sec >>>>>>>>> >>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed >>>>>>>>> something? >>>>>>>> >>>>>>>> >>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll >>>>>>>> go >>>>>>>> to the "central log" which you can keep an eye on if you run "ceph >>>>>>>> -w" >>>>>>>> in another terminal.) >>>>>>>> I think you also didn't create your bench pool correctly; it >>>>>>>> probably >>>>>>>> only has 8 PGs which is not going to perform very well with your >>>>>>>> disk >>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark >>>>>>>> against >>>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>>> placement groups. >>>>>>>> -Greg >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 15:13 ` Aleksey Samarin @ 2012-11-15 21:16 ` Gregory Farnum 2012-11-16 7:41 ` Aleksey Samarin 0 siblings, 1 reply; 14+ messages in thread From: Gregory Farnum @ 2012-11-15 21:16 UTC (permalink / raw) To: Aleksey Samarin; +Cc: Mark Nelson, ceph-devel@vger.kernel.org, Mike Ryan On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: > What may be possible solutions? > Update centos to 6.3? From what I've heard the RHEL libc doesn't support the syncfs syscall (even though the kernel does have it). :( So you'd need to make sure the kernel supports it and then build a custom glibc, and then make sure your Ceph software is built to use it. > About issue with writes to lots of disk, i think parallel dd command > will be good as test! :) Yes — it really looks like maybe some of your disks are much slower than the others. Try benchmarking each individually one-at-a-time, and then in groups. I suspect you'll see a problem below the Ceph layers. > > 2012/11/4 Mark Nelson <mark.nelson@inktank.com>: >> On 11/04/2012 07:18 AM, Aleksey Samarin wrote: >>> >>> Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. >>> here is what I did: >>> ceph osd pool create bench >>> ceph osd tell \* bench >>> rados -p bench bench 30 write --no-cleanup >>> output: >>> >>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>> seconds. >>> Object prefix: benchmark_data_host01_11635 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>> lat >>> 0 0 0 0 0 0 - >>> 0 >>> 1 16 16 0 0 0 - >>> 0 >>> 2 16 37 21 41.9911 42 0.139005 >>> 1.08941 >>> 3 16 53 37 49.3243 64 0.754114 >>> 1.09392 >>> 4 16 75 59 58.9893 88 0.284647 >>> 0.914221 >>> 5 16 89 73 58.3896 56 0.072228 >>> 0.881008 >>> 6 16 95 79 52.6575 24 1.56959 >>> 0.961477 >>> 7 16 111 95 54.2764 64 0.046105 >>> 1.08791 >>> 8 16 128 112 55.9906 68 0.035714 >>> 1.04594 >>> 9 16 150 134 59.5457 88 0.046298 >>> 1.04415 >>> 10 16 166 150 59.9901 64 0.048635 >>> 0.986384 >>> 11 16 176 160 58.1723 40 0.727784 >>> 0.988408 >>> 12 16 206 190 63.3231 120 0.28869 >>> 0.946624 >>> 13 16 225 209 64.2976 76 1.34472 >>> 0.919464 >>> 14 16 263 247 70.5605 152 0.070926 >>> 0.90046 >>> 15 16 295 279 74.3887 128 0.041517 >>> 0.830466 >>> 16 16 315 299 74.7388 80 0.296037 >>> 0.841527 >>> 17 16 333 317 74.5772 72 0.286097 >>> 0.849558 >>> 18 16 340 324 71.9891 28 0.295084 >>> 0.83922 >>> 19 16 343 327 68.8317 12 1.46948 >>> 0.845797 >>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: >>> 0.861539 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>> lat >>> 20 16 378 362 72.389 140 0.566232 >>> 0.861539 >>> 21 16 400 384 73.1313 88 0.038835 >>> 0.857785 >>> 22 16 404 388 70.5344 16 0.801216 >>> 0.857002 >>> 23 16 413 397 69.0327 36 0.062256 >>> 0.86376 >>> 24 16 428 412 68.6543 60 0.042583 >>> 0.89389 >>> 25 16 450 434 69.4277 88 0.383877 >>> 0.905833 >>> 26 16 472 456 70.1415 88 0.269878 >>> 0.898023 >>> 27 16 472 456 67.5437 0 - >>> 0.898023 >>> 28 16 512 496 70.8448 80 0.056798 >>> 0.891163 >>> 29 16 530 514 70.8843 72 1.20653 >>> 0.898112 >>> 30 16 542 526 70.1212 48 0.744383 >>> 0.890733 >>> Total time run: 30.174151 >>> Total writes made: 543 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 71.982 >>> >>> Stddev Bandwidth: 38.318 >>> Max bandwidth (MB/sec): 152 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 0.889026 >>> Stddev Latency: 0.677425 >>> Max latency: 2.94467 >>> Min latency: 0.035714 >>> >> >> Much better for 1 disk per node! I suspect that lack of syncfs is hurting >> you, or perhaps some other issue with writes to lots of disks at the same >> time. >> >> >>> >>> 2012/11/4 Aleksey Samarin <nrg3tik@gmail.com>: >>>> >>>> Ok! >>>> Well, I'll take these tests and write about the results. >>>> >>>> btw, >>>> disks are the same, as some may be faster than others? >>>> >>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>> >>>>> That's only nine — where are the other three? If you have three slow >>>>> disks that could definitely cause the troubles you're seeing. >>>>> >>>>> Also, what Mark said about sync versus syncfs. >>>>> >>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>>> wrote: >>>>>> >>>>>> It`s ok! >>>>>> >>>>>> Output: >>>>>> >>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>>>> >>>>>> All the best. >>>>>> >>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>> >>>>>>> [Sorry for the blank email; I missed!] >>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> This command? ceph tell osd \* bench >>>>>>>> Output: tell target 'osd' not a valid entity name >>>>>>> >>>>>>> >>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>>>> >>>>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>>>> >>>>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>>>> >>>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>>>>>> seconds. >>>>>>>> Object prefix: benchmark_data_host01_5827 >>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>> avg lat >>>>>>>> 0 0 0 0 0 0 - >>>>>>>> 0 >>>>>>>> 1 16 29 13 51.9885 52 0.489268 >>>>>>>> 0.186749 >>>>>>>> 2 16 52 36 71.9866 92 1.87226 >>>>>>>> 0.711888 >>>>>>>> 3 16 57 41 54.657 20 0.089697 >>>>>>>> 0.697821 >>>>>>>> 4 16 60 44 43.9923 12 1.61868 >>>>>>>> 0.765361 >>>>>>>> 5 16 60 44 35.1941 0 - >>>>>>>> 0.765361 >>>>>>>> 6 16 60 44 29.3285 0 - >>>>>>>> 0.765361 >>>>>>>> 7 16 60 44 25.1388 0 - >>>>>>>> 0.765361 >>>>>>>> 8 16 61 45 22.4964 1 5.89643 >>>>>>>> 0.879384 >>>>>>>> 9 16 62 46 20.4412 4 6.0234 >>>>>>>> 0.991211 >>>>>>>> 10 16 62 46 18.3971 0 - >>>>>>>> 0.991211 >>>>>>>> 11 16 63 47 17.0883 2 8.79749 >>>>>>>> 1.1573 >>>>>>>> 12 16 63 47 15.6643 0 - >>>>>>>> 1.1573 >>>>>>>> 13 16 63 47 14.4593 0 - >>>>>>>> 1.1573 >>>>>>>> 14 16 63 47 13.4266 0 - >>>>>>>> 1.1573 >>>>>>>> 15 16 63 47 12.5315 0 - >>>>>>>> 1.1573 >>>>>>>> 16 16 63 47 11.7483 0 - >>>>>>>> 1.1573 >>>>>>>> 17 16 63 47 11.0572 0 - >>>>>>>> 1.1573 >>>>>>>> 18 16 63 47 10.4429 0 - >>>>>>>> 1.1573 >>>>>>>> 19 16 63 47 9.89331 0 - >>>>>>>> 1.1573 >>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: >>>>>>>> 1.1573 >>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>> avg lat >>>>>>>> 20 16 63 47 9.39865 0 - >>>>>>>> 1.1573 >>>>>>>> 21 16 63 47 8.95105 0 - >>>>>>>> 1.1573 >>>>>>>> 22 16 63 47 8.54419 0 - >>>>>>>> 1.1573 >>>>>>>> 23 16 63 47 8.17271 0 - >>>>>>>> 1.1573 >>>>>>>> 24 16 63 47 7.83218 0 - >>>>>>>> 1.1573 >>>>>>>> 25 16 63 47 7.5189 0 - >>>>>>>> 1.1573 >>>>>>>> 26 16 63 47 7.22972 0 - >>>>>>>> 1.1573 >>>>>>>> 27 16 81 65 9.62824 4.5 0.076456 >>>>>>>> 4.9428 >>>>>>>> 28 16 118 102 14.5693 148 0.427273 >>>>>>>> 4.34095 >>>>>>>> 29 16 119 103 14.2049 4 1.57897 >>>>>>>> 4.31414 >>>>>>>> 30 16 132 116 15.4645 52 2.25424 >>>>>>>> 4.01492 >>>>>>>> 31 16 133 117 15.0946 4 0.974652 >>>>>>>> 3.98893 >>>>>>>> 32 16 133 117 14.6229 0 - >>>>>>>> 3.98893 >>>>>>>> Total time run: 32.575351 >>>>>>>> Total writes made: 133 >>>>>>>> Write size: 4194304 >>>>>>>> Bandwidth (MB/sec): 16.331 >>>>>>>> >>>>>>>> Stddev Bandwidth: 31.8794 >>>>>>>> Max bandwidth (MB/sec): 148 >>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>> Average Latency: 3.91583 >>>>>>>> Stddev Latency: 7.42821 >>>>>>>> Max latency: 25.24 >>>>>>>> Min latency: 0.036475 >>>>>>>> >>>>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>>>> http://pastebin.com/BqLsyMBC >>>>>>> >>>>>>> >>>>>>> Well, that did improve it a bit; but yes, I think there's something >>>>>>> else going on. Just wanted to verify. :) >>>>>>> >>>>>>>> >>>>>>>> I have still no idea. >>>>>>>> >>>>>>>> All the best. Alex >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>>>> >>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi all >>>>>>>>>> >>>>>>>>>> Im planning use ceph for cloud storage. >>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb >>>>>>>>>> disks per node. >>>>>>>>>> Centos 6.2 >>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>>>> journal on tmpfs >>>>>>>>>> well, im create bench pool and test it: >>>>>>>>>> ceph osd pool create bench >>>>>>>>>> rados -p bench bench 30 write >>>>>>>>>> >>>>>>>>>> Total time run: 43.258228 >>>>>>>>>> Total writes made: 151 >>>>>>>>>> Write size: 4194304 >>>>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>>> Average Latency: 4.48605 >>>>>>>>>> Stddev Latency: 8.17709 >>>>>>>>>> Max latency: 29.7957 >>>>>>>>>> Min latency: 0.039435 >>>>>>>>>> >>>>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>>>> Total time run: 20.626935 >>>>>>>>>> Total reads made: 275 >>>>>>>>>> Read size: 4194304 >>>>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>>>> Average Latency: 1.19754 >>>>>>>>>> Max latency: 7.0215 >>>>>>>>>> Min latency: 0.011647 >>>>>>>>>> >>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>>>> bs=1024k count=20000 >>>>>>>>>> result: 158 MB/sec >>>>>>>>>> >>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed >>>>>>>>>> something? >>>>>>>>> >>>>>>>>> >>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll >>>>>>>>> go >>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph >>>>>>>>> -w" >>>>>>>>> in another terminal.) >>>>>>>>> I think you also didn't create your bench pool correctly; it >>>>>>>>> probably >>>>>>>>> only has 8 PGs which is not going to perform very well with your >>>>>>>>> disk >>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark >>>>>>>>> against >>>>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>>>> placement groups. >>>>>>>>> -Greg >>> >>> -- >>> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-15 21:16 ` Gregory Farnum @ 2012-11-16 7:41 ` Aleksey Samarin 0 siblings, 0 replies; 14+ messages in thread From: Aleksey Samarin @ 2012-11-16 7:41 UTC (permalink / raw) To: Gregory Farnum; +Cc: Mark Nelson, ceph-devel@vger.kernel.org, Mike Ryan Thanks for your reply! I was easier to change rhel on ubuntu. Now everything is fast and stable! :) If interested can attach logs. All the best, Alex! 2012/11/16 Gregory Farnum <greg@inktank.com>: > On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin <nrg3tik@gmail.com> wrote: >> What may be possible solutions? >> Update centos to 6.3? > > From what I've heard the RHEL libc doesn't support the syncfs syscall > (even though the kernel does have it). :( So you'd need to make sure > the kernel supports it and then build a custom glibc, and then make > sure your Ceph software is built to use it. > > >> About issue with writes to lots of disk, i think parallel dd command >> will be good as test! :) > > Yes — it really looks like maybe some of your disks are much slower > than the others. Try benchmarking each individually one-at-a-time, and > then in groups. I suspect you'll see a problem below the Ceph layers. > >> >> 2012/11/4 Mark Nelson <mark.nelson@inktank.com>: >>> On 11/04/2012 07:18 AM, Aleksey Samarin wrote: >>>> >>>> Well, i create ceph cluster with 2 osd ( 1 osd per node), 2 mon, 2 mds. >>>> here is what I did: >>>> ceph osd pool create bench >>>> ceph osd tell \* bench >>>> rados -p bench bench 30 write --no-cleanup >>>> output: >>>> >>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>> seconds. >>>> Object prefix: benchmark_data_host01_11635 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>>> lat >>>> 0 0 0 0 0 0 - >>>> 0 >>>> 1 16 16 0 0 0 - >>>> 0 >>>> 2 16 37 21 41.9911 42 0.139005 >>>> 1.08941 >>>> 3 16 53 37 49.3243 64 0.754114 >>>> 1.09392 >>>> 4 16 75 59 58.9893 88 0.284647 >>>> 0.914221 >>>> 5 16 89 73 58.3896 56 0.072228 >>>> 0.881008 >>>> 6 16 95 79 52.6575 24 1.56959 >>>> 0.961477 >>>> 7 16 111 95 54.2764 64 0.046105 >>>> 1.08791 >>>> 8 16 128 112 55.9906 68 0.035714 >>>> 1.04594 >>>> 9 16 150 134 59.5457 88 0.046298 >>>> 1.04415 >>>> 10 16 166 150 59.9901 64 0.048635 >>>> 0.986384 >>>> 11 16 176 160 58.1723 40 0.727784 >>>> 0.988408 >>>> 12 16 206 190 63.3231 120 0.28869 >>>> 0.946624 >>>> 13 16 225 209 64.2976 76 1.34472 >>>> 0.919464 >>>> 14 16 263 247 70.5605 152 0.070926 >>>> 0.90046 >>>> 15 16 295 279 74.3887 128 0.041517 >>>> 0.830466 >>>> 16 16 315 299 74.7388 80 0.296037 >>>> 0.841527 >>>> 17 16 333 317 74.5772 72 0.286097 >>>> 0.849558 >>>> 18 16 340 324 71.9891 28 0.295084 >>>> 0.83922 >>>> 19 16 343 327 68.8317 12 1.46948 >>>> 0.845797 >>>> 2012-11-04 17:14:52.090941min lat: 0.035714 max lat: 2.64841 avg lat: >>>> 0.861539 >>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >>>> lat >>>> 20 16 378 362 72.389 140 0.566232 >>>> 0.861539 >>>> 21 16 400 384 73.1313 88 0.038835 >>>> 0.857785 >>>> 22 16 404 388 70.5344 16 0.801216 >>>> 0.857002 >>>> 23 16 413 397 69.0327 36 0.062256 >>>> 0.86376 >>>> 24 16 428 412 68.6543 60 0.042583 >>>> 0.89389 >>>> 25 16 450 434 69.4277 88 0.383877 >>>> 0.905833 >>>> 26 16 472 456 70.1415 88 0.269878 >>>> 0.898023 >>>> 27 16 472 456 67.5437 0 - >>>> 0.898023 >>>> 28 16 512 496 70.8448 80 0.056798 >>>> 0.891163 >>>> 29 16 530 514 70.8843 72 1.20653 >>>> 0.898112 >>>> 30 16 542 526 70.1212 48 0.744383 >>>> 0.890733 >>>> Total time run: 30.174151 >>>> Total writes made: 543 >>>> Write size: 4194304 >>>> Bandwidth (MB/sec): 71.982 >>>> >>>> Stddev Bandwidth: 38.318 >>>> Max bandwidth (MB/sec): 152 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 0.889026 >>>> Stddev Latency: 0.677425 >>>> Max latency: 2.94467 >>>> Min latency: 0.035714 >>>> >>> >>> Much better for 1 disk per node! I suspect that lack of syncfs is hurting >>> you, or perhaps some other issue with writes to lots of disks at the same >>> time. >>> >>> >>>> >>>> 2012/11/4 Aleksey Samarin <nrg3tik@gmail.com>: >>>>> >>>>> Ok! >>>>> Well, I'll take these tests and write about the results. >>>>> >>>>> btw, >>>>> disks are the same, as some may be faster than others? >>>>> >>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>> >>>>>> That's only nine — where are the other three? If you have three slow >>>>>> disks that could definitely cause the troubles you're seeing. >>>>>> >>>>>> Also, what Mark said about sync versus syncfs. >>>>>> >>>>>> On Sun, Nov 4, 2012 at 1:26 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> It`s ok! >>>>>>> >>>>>>> Output: >>>>>>> >>>>>>> 2012-11-04 16:19:23.195891 osd.0 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 11.441035 sec at 91650 KB/sec >>>>>>> 2012-11-04 16:19:24.981631 osd.1 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.225048 sec at 79287 KB/sec >>>>>>> 2012-11-04 16:19:25.672896 osd.2 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.917157 sec at 75344 KB/sec >>>>>>> 2012-11-04 16:19:28.058517 osd.21 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 16.453375 sec at 63730 KB/sec >>>>>>> 2012-11-04 16:19:28.715552 osd.22 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 17.108887 sec at 61288 KB/sec >>>>>>> 2012-11-04 16:19:23.440054 osd.23 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 11.834639 sec at 88602 KB/sec >>>>>>> 2012-11-04 16:19:24.023650 osd.24 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 12.418276 sec at 84438 KB/sec >>>>>>> 2012-11-04 16:19:24.617514 osd.25 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.011955 sec at 80585 KB/sec >>>>>>> 2012-11-04 16:19:25.148613 osd.26 [INF] bench: wrote 1024 MB in blocks >>>>>>> of 4096 KB in 13.541710 sec at 77433 KB/sec >>>>>>> >>>>>>> All the best. >>>>>>> >>>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>>> >>>>>>>> [Sorry for the blank email; I missed!] >>>>>>>> On Sun, Nov 4, 2012 at 1:04 PM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi! >>>>>>>>> This command? ceph tell osd \* bench >>>>>>>>> Output: tell target 'osd' not a valid entity name >>>>>>>> >>>>>>>> >>>>>>>> I guess it's "ceph osd tell \* bench". Try that one. :) >>>>>>>> >>>>>>>>> Well, i did pool by command ceph osd pool create bench2 120 >>>>>>>>> This output of rados -p bench2 bench 30 write --no-cleanup >>>>>>>>> >>>>>>>>> rados -p bench2 bench 30 write --no-cleanup >>>>>>>>> >>>>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for at least 30 >>>>>>>>> seconds. >>>>>>>>> Object prefix: benchmark_data_host01_5827 >>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>>> avg lat >>>>>>>>> 0 0 0 0 0 0 - >>>>>>>>> 0 >>>>>>>>> 1 16 29 13 51.9885 52 0.489268 >>>>>>>>> 0.186749 >>>>>>>>> 2 16 52 36 71.9866 92 1.87226 >>>>>>>>> 0.711888 >>>>>>>>> 3 16 57 41 54.657 20 0.089697 >>>>>>>>> 0.697821 >>>>>>>>> 4 16 60 44 43.9923 12 1.61868 >>>>>>>>> 0.765361 >>>>>>>>> 5 16 60 44 35.1941 0 - >>>>>>>>> 0.765361 >>>>>>>>> 6 16 60 44 29.3285 0 - >>>>>>>>> 0.765361 >>>>>>>>> 7 16 60 44 25.1388 0 - >>>>>>>>> 0.765361 >>>>>>>>> 8 16 61 45 22.4964 1 5.89643 >>>>>>>>> 0.879384 >>>>>>>>> 9 16 62 46 20.4412 4 6.0234 >>>>>>>>> 0.991211 >>>>>>>>> 10 16 62 46 18.3971 0 - >>>>>>>>> 0.991211 >>>>>>>>> 11 16 63 47 17.0883 2 8.79749 >>>>>>>>> 1.1573 >>>>>>>>> 12 16 63 47 15.6643 0 - >>>>>>>>> 1.1573 >>>>>>>>> 13 16 63 47 14.4593 0 - >>>>>>>>> 1.1573 >>>>>>>>> 14 16 63 47 13.4266 0 - >>>>>>>>> 1.1573 >>>>>>>>> 15 16 63 47 12.5315 0 - >>>>>>>>> 1.1573 >>>>>>>>> 16 16 63 47 11.7483 0 - >>>>>>>>> 1.1573 >>>>>>>>> 17 16 63 47 11.0572 0 - >>>>>>>>> 1.1573 >>>>>>>>> 18 16 63 47 10.4429 0 - >>>>>>>>> 1.1573 >>>>>>>>> 19 16 63 47 9.89331 0 - >>>>>>>>> 1.1573 >>>>>>>>> 2012-11-04 15:58:15.473733min lat: 0.036475 max lat: 8.79749 avg lat: >>>>>>>>> 1.1573 >>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat >>>>>>>>> avg lat >>>>>>>>> 20 16 63 47 9.39865 0 - >>>>>>>>> 1.1573 >>>>>>>>> 21 16 63 47 8.95105 0 - >>>>>>>>> 1.1573 >>>>>>>>> 22 16 63 47 8.54419 0 - >>>>>>>>> 1.1573 >>>>>>>>> 23 16 63 47 8.17271 0 - >>>>>>>>> 1.1573 >>>>>>>>> 24 16 63 47 7.83218 0 - >>>>>>>>> 1.1573 >>>>>>>>> 25 16 63 47 7.5189 0 - >>>>>>>>> 1.1573 >>>>>>>>> 26 16 63 47 7.22972 0 - >>>>>>>>> 1.1573 >>>>>>>>> 27 16 81 65 9.62824 4.5 0.076456 >>>>>>>>> 4.9428 >>>>>>>>> 28 16 118 102 14.5693 148 0.427273 >>>>>>>>> 4.34095 >>>>>>>>> 29 16 119 103 14.2049 4 1.57897 >>>>>>>>> 4.31414 >>>>>>>>> 30 16 132 116 15.4645 52 2.25424 >>>>>>>>> 4.01492 >>>>>>>>> 31 16 133 117 15.0946 4 0.974652 >>>>>>>>> 3.98893 >>>>>>>>> 32 16 133 117 14.6229 0 - >>>>>>>>> 3.98893 >>>>>>>>> Total time run: 32.575351 >>>>>>>>> Total writes made: 133 >>>>>>>>> Write size: 4194304 >>>>>>>>> Bandwidth (MB/sec): 16.331 >>>>>>>>> >>>>>>>>> Stddev Bandwidth: 31.8794 >>>>>>>>> Max bandwidth (MB/sec): 148 >>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>> Average Latency: 3.91583 >>>>>>>>> Stddev Latency: 7.42821 >>>>>>>>> Max latency: 25.24 >>>>>>>>> Min latency: 0.036475 >>>>>>>>> >>>>>>>>> Im think problem not in pg. This output of ceph pg dump > >>>>>>>>> http://pastebin.com/BqLsyMBC >>>>>>>> >>>>>>>> >>>>>>>> Well, that did improve it a bit; but yes, I think there's something >>>>>>>> else going on. Just wanted to verify. :) >>>>>>>> >>>>>>>>> >>>>>>>>> I have still no idea. >>>>>>>>> >>>>>>>>> All the best. Alex >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2012/11/4 Gregory Farnum <greg@inktank.com>: >>>>>>>>>> >>>>>>>>>> On Sun, Nov 4, 2012 at 10:58 AM, Aleksey Samarin <nrg3tik@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all >>>>>>>>>>> >>>>>>>>>>> Im planning use ceph for cloud storage. >>>>>>>>>>> My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb >>>>>>>>>>> disks per node. >>>>>>>>>>> Centos 6.2 >>>>>>>>>>> Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 >>>>>>>>>>> This is my config http://pastebin.com/Pzxafnsm >>>>>>>>>>> journal on tmpfs >>>>>>>>>>> well, im create bench pool and test it: >>>>>>>>>>> ceph osd pool create bench >>>>>>>>>>> rados -p bench bench 30 write >>>>>>>>>>> >>>>>>>>>>> Total time run: 43.258228 >>>>>>>>>>> Total writes made: 151 >>>>>>>>>>> Write size: 4194304 >>>>>>>>>>> Bandwidth (MB/sec): 13.963 >>>>>>>>>>> Stddev Bandwidth: 26.307 >>>>>>>>>>> Max bandwidth (MB/sec): 128 >>>>>>>>>>> Min bandwidth (MB/sec): 0 >>>>>>>>>>> Average Latency: 4.48605 >>>>>>>>>>> Stddev Latency: 8.17709 >>>>>>>>>>> Max latency: 29.7957 >>>>>>>>>>> Min latency: 0.039435 >>>>>>>>>>> >>>>>>>>>>> when i do rados -p bench bench 30 seq >>>>>>>>>>> Total time run: 20.626935 >>>>>>>>>>> Total reads made: 275 >>>>>>>>>>> Read size: 4194304 >>>>>>>>>>> Bandwidth (MB/sec): 53.328 >>>>>>>>>>> Average Latency: 1.19754 >>>>>>>>>>> Max latency: 7.0215 >>>>>>>>>>> Min latency: 0.011647 >>>>>>>>>>> >>>>>>>>>>> I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile >>>>>>>>>>> bs=1024k count=20000 >>>>>>>>>>> result: 158 MB/sec >>>>>>>>>>> >>>>>>>>>>> Anyone can tell me why such a weak performance? Maybe I missed >>>>>>>>>>> something? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you run "ceph tell osd \* bench" and report the results? (It'll >>>>>>>>>> go >>>>>>>>>> to the "central log" which you can keep an eye on if you run "ceph >>>>>>>>>> -w" >>>>>>>>>> in another terminal.) >>>>>>>>>> I think you also didn't create your bench pool correctly; it >>>>>>>>>> probably >>>>>>>>>> only has 8 PGs which is not going to perform very well with your >>>>>>>>>> disk >>>>>>>>>> count. Try "ceph pool create bench2 120" and run the benchmark >>>>>>>>>> against >>>>>>>>>> that pool. The extra number at the end tells it to create 120 >>>>>>>>>> placement groups. >>>>>>>>>> -Greg >>>> >>>> -- >>>> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: poor performance 2012-11-04 9:58 poor performance Aleksey Samarin 2012-11-04 11:00 ` Gregory Farnum @ 2012-11-04 12:29 ` Mark Nelson 1 sibling, 0 replies; 14+ messages in thread From: Mark Nelson @ 2012-11-04 12:29 UTC (permalink / raw) To: Aleksey Samarin; +Cc: ceph-devel On 11/04/2012 03:58 AM, Aleksey Samarin wrote: > Hi all > > Im planning use ceph for cloud storage. > My test setup is 2 servers connected via infiniband 40Gb, 6x2Tb disks per node. > Centos 6.2 > Ceph 0.52 from http://ceph.com/rpms/el6/x86_64 > This is my config http://pastebin.com/Pzxafnsm One thing that may be problematic is that I don't think centos 6.2 has a new enough version of glibc to support syncfs (Assuming it's even backported in their kernel). You may want to try moving the mons off to another node, and reducing each node down to a single OSD/disk and test it out to see what happens. Also, we are starting to ship some tools Sam wrote to test underlying filestore performance. I think Gary has been working on getting some of those tools packaged up. > journal on tmpfs > well, im create bench pool and test it: > ceph osd pool create bench > rados -p bench bench 30 write > > Total time run: 43.258228 > Total writes made: 151 > Write size: 4194304 > Bandwidth (MB/sec): 13.963 > Stddev Bandwidth: 26.307 > Max bandwidth (MB/sec): 128 > Min bandwidth (MB/sec): 0 > Average Latency: 4.48605 > Stddev Latency: 8.17709 > Max latency: 29.7957 > Min latency: 0.039435 > > when i do rados -p bench bench 30 seq > Total time run: 20.626935 > Total reads made: 275 > Read size: 4194304 > Bandwidth (MB/sec): 53.328 > Average Latency: 1.19754 > Max latency: 7.0215 > Min latency: 0.011647 > > I tested the single drive via dd if=/dev/zero of=/mnt/hdd2/testfile > bs=1024k count=20000 > result: 158 MB/sec > > Anyone can tell me why such a weak performance? Maybe I missed something? > > All the best, Alex! > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-11-16 7:41 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-04 9:58 poor performance Aleksey Samarin 2012-11-04 11:00 ` Gregory Farnum 2012-11-04 12:04 ` Aleksey Samarin 2012-11-04 12:15 ` Gregory Farnum 2012-11-04 12:18 ` Gregory Farnum 2012-11-04 12:26 ` Aleksey Samarin 2012-11-04 12:39 ` Gregory Farnum 2012-11-04 12:52 ` Aleksey Samarin 2012-11-04 13:18 ` Aleksey Samarin 2012-11-04 13:52 ` Mark Nelson 2012-11-04 15:13 ` Aleksey Samarin 2012-11-15 21:16 ` Gregory Farnum 2012-11-16 7:41 ` Aleksey Samarin 2012-11-04 12:29 ` Mark Nelson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.