From mboxrd@z Thu Jan 1 00:00:00 1970 From: Loic Dachary Subject: Re: CEPH Erasure Encoding + OSD Scalability Date: Tue, 10 Dec 2013 09:43:06 +0100 Message-ID: <52A6D41A.3060205@dachary.org> References: <-7369304096744919226@unknownmsgid> <3472A07E6605974CBC9BC573F1BC02E4A527147E@PLOXCHG03.cern.ch> <523C40B7.5060902@dachary.org> <523C7CAF.1020101@dachary.org>,<523DB725.2070104@dachary.org>,<3472A07E6605974CBC9BC573F1BC02E4A52727FF@PLOXCHG03.cern.ch> <3472A07E6605974CBC9BC573F1BC02E4AE69CCB4@PLOXCHG03.cern.ch> <52826E2D.2040503@dachary.org> <52A5F3A1.503@dachary.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="EfiU0jOswqKi7jaXK9K8Q02K45ECWjEdn" Return-path: Received: from smtp.dmail.dachary.org ([91.121.254.229]:35731 "EHLO smtp.dmail.dachary.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754177Ab3LJInJ (ORCPT ); Tue, 10 Dec 2013 03:43:09 -0500 In-Reply-To: <52A5F3A1.503@dachary.org> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Andreas Joachim Peters Cc: "ceph-devel@vger.kernel.org" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --EfiU0jOswqKi7jaXK9K8Q02K45ECWjEdn Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Maybe using http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html is enough. fsbench looks overkill indeed. /me exploring options ;-) On 09/12/2013 17:45, Loic Dachary wrote: > Hi, >=20 > Mark Nelson suggested we use perf ( linux-tools ) for benchmarking. It = looks like something that would help indeed : the benchmark program would= only concern itself with doing some work according to the options and le= t performances be collected from the outside, using tools that are famili= ar to people doing benchmarking. >=20 > What do you think ? >=20 > Cheers >=20 > $ perf stat -e > Error: switch `e' requires a value >=20 > usage: perf stat [] [] >=20 > -e, --event event selector. use 'perf list' to list avail= able events > --filter > event filter > -i, --no-inherit child tasks do not inherit counters > -p, --pid stat events on existing process id > -t, --tid stat events on existing thread id > -a, --all-cpus system-wide collection from all CPUs > -g, --group put the counters into a counter group > -c, --scale scale/normalize counters > -v, --verbose be more verbose (show counter open errors, et= c) > -r, --repeat repeat command and print average + stddev (ma= x: 100, forever: 0) > -n, --null null run - dont start any counters > -d, --detailed detailed run - start a lot of events > -S, --sync call sync() before starting a run > -B, --big-num print large numbers with thousands' separator= s > -C, --cpu list of cpus to monitor in system-wide > -A, --no-aggr disable CPU count aggregation > -x, --field-separator > print counts with custom separator > -G, --cgroup monitor event in cgroup name only > -o, --output output file name > --append append to the output file > --log-fd log output to fd, instead of stderr > --pre command to run prior to the measured command > --post command to run after to the measured command > -I, --interval-print > print counts at regular interval in ms (>=3D = 100) > --per-socket aggregate counts per processor socket > --per-core aggregate counts per physical processor core >=20 >=20 > On 12/11/2013 19:06, Loic Dachary wrote: >> Hi Andreas, >> >> On 12/11/2013 02:11, Andreas Joachim Peters wrote: >>> Hi Loic, >>> >>> I am finally doing the benchmark tool and I found a bunch of wrong pa= rameter checks which can make the whole thing SEGV. >>> >>> All the RAID-6 codes have restrictions on the parameters but they are= not correctly enforced for Liberation & Blaum-Roth codes in the CEPH wra= pper class ... see text from PDF >>> >>> "Minimal Density RAID-6 codes are MDS codes based on binary matrices = which satisfy a lower-bound on the number of non-zero entries. Unlike Ca= uchy coding, the bit-matrix elements do not correspond to elements in GF = (2 w ). Instead, the bit-matrix itself has the proper MDS property. Minim= al Density RAID-6 codes perform faster than Reed-Solomon and Cauchy Reed-= Solomon codes for the same parameters. Liberation coding, Liber8tion codi= ng, and Blaum-Roth coding are three examples of this kind of coding that = are supported in jerasure. >>> >>> With each of these codes, m must be equal to two and k must be less t= han or equal to w. The value of w has restrictions based on the code: >>> >>> =95 With Liberation coding, w must be a prime number [Pla08b]. >>> =95 With Blaum-Roth coding, w + 1 must be a prime number [BR99]. =95 = With Liber8tion coding, w must equal 8 [Pla08a]. >>> >>> ... >>> >>> Do you add this fixes? >> >> Nice catch. I created and assigned to myself : http://tracker.ceph.com= /issues/6754 >>> >>> For the benchmark suite it runs currently 308 different configuration= s for the 2 algorithm which make sense from the performance point of view= and provides this output: >>> >>> >>> # ----------------------------------------------------------------- >>> # Erasure Coding Benchmark - (C) CERN 2013 - Andreas.Joachim.Peters@c= ern.ch >>> # Ram-Size=3D12614856704 Allocation-Size=3D100000000 >>> # ----------------------------------------------------------------- >>> # [ -BENCH- ] [ ] technique=3Dmemcpy = speed=3D5.408 [GB/s] latency=3D9.245 ms >>> # [ -BENCH- ] [ ] technique=3Dd=3Da^b^c-xor = speed=3D4.377 [GB/s] latency=3D17.136 = ms >>> # [ -BENCH- ] [001/304] technique=3Dcauchy_good:k=3D05:m=3D2:w=3D8:lp= =3D0:packet=3D00064:size=3D50000000 speed=3D1.308 [GB/s] latency= =3D038 [ms] size-overhead=3D40 [%] >>> .. >>> .. >>> # [ -BENCH- ] [304/304] technique=3Dliberation:k=3D24:m=3D2:w=3D29:lp= =3D2:packet=3D65536:size=3D50000000 speed=3D0.083 [GB/s] latency= =3D604 [ms] size-overhead=3D16 [%] >>> # ----------------------------------------------------------------- >>> # Erasure Code Performance Summary:: >>> # ----------------------------------------------------------------- >>> # RAM: 12.61 GB >>> # Allocation-Size 0.10 GB >>> # ----------------------------------------------------------------- >>> # Byte Initialization: 29.35 MB/s >>> # Memcpy: 5.41 GB/s >>> # Triple-XOR: 4.38 GB/s >>> # ----------------------------------------------------------------- >>> # Fastest RAID6 2.72 GB/s liber8tion:k=3D06:m=3D2:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Fastest Triple Failure 0.96 GB/s cauchy_good:k=3D06:m=3D3:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Fastest Quadr. Failure 0.66 GB/s cauchy_good:k=3D06:m=3D4:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # ----------------------------------------------------------------- >>> # ................................................................. >>> # Top 1 RAID6 2.72 GB/s liber8tion:k=3D06:m=3D2:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 2 RAID6 2.72 GB/s liber8tion:k=3D06:m=3D2:w=3D8:lp=3D= 0:packet=3D16384:size=3D50000000 >>> # Top 3 RAID6 2.64 GB/s liber8tion:k=3D06:m=3D2:w=3D8:lp=3D= 0:packet=3D65536:size=3D50000000 >>> # Top 4 RAID6 2.60 GB/s liberation:k=3D07:m=3D2:w=3D7:lp=3D= 0:packet=3D16384:size=3D50000000 >>> # Top 5 RAID6 2.59 GB/s liberation:k=3D05:m=3D2:w=3D7:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # ................................................................. >>> # Top 1 Triple 0.96 GB/s cauchy_good:k=3D06:m=3D3:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 2 Triple 0.94 GB/s cauchy_good:k=3D06:m=3D3:w=3D8:lp=3D= 0:packet=3D16384:size=3D50000000 >>> # Top 3 Triple 0.93 GB/s cauchy_good:k=3D06:m=3D3:w=3D8:lp=3D= 0:packet=3D65536:size=3D50000000 >>> # Top 4 Triple 0.89 GB/s cauchy_good:k=3D07:m=3D3:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 5 Triple 0.87 GB/s cauchy_good:k=3D05:m=3D3:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # ................................................................. >>> # Top 1 Quadr. 0.66 GB/s cauchy_good:k=3D06:m=3D4:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 2 Quadr. 0.65 GB/s cauchy_good:k=3D07:m=3D4:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 3 Quadr. 0.64 GB/s cauchy_good:k=3D06:m=3D4:w=3D8:lp=3D= 0:packet=3D16384:size=3D50000000 >>> # Top 4 Quadr. 0.64 GB/s cauchy_good:k=3D05:m=3D4:w=3D8:lp=3D= 0:packet=3D04096:size=3D50000000 >>> # Top 5 Quadr. 0.64 GB/s cauchy_good:k=3D06:m=3D4:w=3D8:lp=3D= 0:packet=3D65536:size=3D50000000 >>> # ................................................................. >>> >>> It takes around 30 second on my box.=20 >> >> >> That looks great :-) If I understand correctly, it means https://githu= b.com/ceph/ceph/pull/740 will no longer have benchmarks as they are moved= to a separate program. Correct ? >> >>> I will add a measurement how the XOR and the 3 top algorithms scale w= ith the number of cores and make the object-size configurable from the co= mmand line. Anything else ?=20 >> >> It would be convenient to run this from a "workunit" ( i.e. a script i= n ceph/qa/workunits/ ) so that it can later be run by teuthology integrat= ion tests. That could be used to show regression. >> >> Shall I add the possiblity to test a single user specified configurati= on via command line arguments? >>> >> I would need to play with it to comment usefully. >> >> Cheers >> >=20 --=20 Lo=EFc Dachary, Artisan Logiciel Libre --EfiU0jOswqKi7jaXK9K8Q02K45ECWjEdn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKm1BoACgkQ8dLMyEl6F20kFACgi8+nkrNch6OI7ocUgS/YGQ7v kykAoL5UMwVjWfO+l5JTyYaXBD8MPjj0 =R6E4 -----END PGP SIGNATURE----- --EfiU0jOswqKi7jaXK9K8Q02K45ECWjEdn--