From mboxrd@z Thu Jan 1 00:00:00 1970 From: jszhang@marvell.com (Jisheng Zhang) Date: Mon, 18 Jul 2016 20:15:49 +0800 Subject: lib/GCD.c regression on arm In-Reply-To: <20160715135109.GA2657@linux-Precision-WorkStation-T5500> References: <20160715135109.GA2657@linux-Precision-WorkStation-T5500> Message-ID: <20160718201549.61f135c8@xhacker> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Dear Cheah, On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote: > Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean") > replaced the Euclidean algorithm totally with the Binary algorithm. > Two variants were provided and selected via Kconfig depending on whether > a fast __ffs (find least significant set bit) instruction is available. > > For arm v5 and above the fast __ffs version is used as evident in > arch/arm/mm/Kconfig. > > I benchmarked the gcd performance using the code provided in the commit > with a Cortex-A9 based Mediatek MT6577. Three runs at different settings > were used. > > The performance with fast __ffs Binary algo is slower than the Euclidean > algo. Using the non ffs version [even/odd variant] gives a comparable > performance as the Euclidean algo. Interesting, using the code in the commit, I get the following result on A CA53 platform build with aarch64 toolchain, -O2 -mcpu=cortex-a53 ~ # /a53 -r 500000 -n 10 gcd0: elapsed 10170 gcd1: elapsed 11340 gcd2: elapsed 13590 gcd3: elapsed 11700 gcd4: elapsed 14230 PASS build with armhf toolchain, -O2 -mcpu=cortex-a53 ~ # /a53_32 -r 500000 -n 10 gcd0: elapsed 9490 gcd1: elapsed 10220 gcd2: elapsed 10790 gcd3: elapsed 10270 gcd4: elapsed 10850 PASS > > Will be interesting to see whether this is also true for other platforms > with arm v5 and above? Hopefully others will do some testing. > If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our > Kconfig. > > Thanks. > Best Regards, > Cheah > > cross compiled with '-O2' > > Euclidean Binary with ffs Binary no ffs > > > gcd -r 50000 -n 10 > > gcd0: elapsed 25766 gcd0: elapsed 25766 gcd0: elapsed 25765 > gcd1: elapsed 19994 gcd1: elapsed 20224 gcd1: elapsed 19843 > gcd2: elapsed 20071 gcd2: elapsed 20533 gcd2: elapsed 20151 > gcd3: elapsed 20070 gcd3: elapsed 20380 gcd3: elapsed 19919 > gcd4: elapsed 20148 gcd4: elapsed 20610 gcd4: elapsed 20151 > PASS PASS PASS > > gcd0: elapsed 26690 gcd0: elapsed 26612 gcd0: elapsed 24381 > gcd1: elapsed 20224 gcd1: elapsed 20379 gcd1: elapsed 19765 > gcd2: elapsed 20224 gcd2: elapsed 20304 gcd2: elapsed 19842 > gcd3: elapsed 20148 gcd3: elapsed 20302 gcd3: elapsed 19919 > gcd4: elapsed 20301 gcd4: elapsed 20302 gcd4: elapsed 19919 > PASS PASS PASS > > gcd0: elapsed 25842 gcd0: elapsed 26459 gcd0: elapsed 25457 > gcd1: elapsed 20454 gcd1: elapsed 20532 gcd1: elapsed 20225 > gcd2: elapsed 20378 gcd2: elapsed 20762 gcd2: elapsed 20226 > gcd3: elapsed 20378 gcd3: elapsed 20378 gcd3: elapsed 20148 > gcd4: elapsed 20532 gcd4: elapsed 20918 gcd4: elapsed 20301 > PASS PASS PASS > > > gcd -r 1000 -n 100 > > gcd0: elapsed 245873 gcd0: elapsed 252957 gcd0: elapsed 245571 > gcd1: elapsed 191290 gcd1: elapsed 198345 gcd1: elapsed 192513 > gcd2: elapsed 192672 gcd2: elapsed 199579 gcd2: elapsed 192978 > gcd3: elapsed 191366 gcd3: elapsed 198728 gcd3: elapsed 192283 > gcd4: elapsed 193134 gcd4: elapsed 200884 gcd4: elapsed 193669 > PASS PASS PASS > > gcd0: elapsed 245180 gcd0: elapsed 251113 gcd0: elapsed 250573 > gcd1: elapsed 191755 gcd1: elapsed 196800 gcd1: elapsed 194729 > gcd2: elapsed 192286 gcd2: elapsed 198654 gcd2: elapsed 195574 > gcd3: elapsed 191601 gcd3: elapsed 197344 gcd3: elapsed 194965 > gcd4: elapsed 193135 gcd4: elapsed 200268 gcd4: elapsed 197037 > PASS PASS PASS > > gcd0: elapsed 243412 gcd0: elapsed 252189 gcd0: elapsed 247876 > gcd1: elapsed 190447 gcd1: elapsed 197192 gcd1: elapsed 193355 > gcd2: elapsed 192288 gcd2: elapsed 199042 gcd2: elapsed 193437 > gcd3: elapsed 190755 gcd3: elapsed 198957 gcd3: elapsed 193660 > gcd4: elapsed 192672 gcd4: elapsed 200346 gcd4: elapsed 194586 > PASS PASS PASS > > > gcd -n 1000 > > gcd0: elapsed 2636655 gcd0: elapsed 2701340 gcd0: elapsed 2622109 > gcd1: elapsed 2055411 gcd1: elapsed 2153446 gcd1: elapsed 2053342 > gcd2: elapsed 2064420 gcd2: elapsed 2162496 gcd2: elapsed 2066503 > gcd3: elapsed 2055151 gcd3: elapsed 2163201 gcd3: elapsed 2055161 > gcd4: elapsed 2071591 gcd4: elapsed 2171636 gcd4: elapsed 2074488 > PASS PASS PASS > > gcd0: elapsed 2636512 gcd0: elapsed 2719436 gcd0: elapsed 2613575 > gcd1: elapsed 2060157 gcd1: elapsed 2159284 gcd1: elapsed 2046187 > gcd2: elapsed 2069242 gcd2: elapsed 2163944 gcd2: elapsed 2056430 > gcd3: elapsed 2060436 gcd3: elapsed 2166796 gcd3: elapsed 2046933 > gcd4: elapsed 2074188 gcd4: elapsed 2176243 gcd4: elapsed 2065170 > PASS PASS PASS > > gcd0: elapsed 2614949 gcd0: elapsed 2708342 gcd0: elapsed 2632962 > gcd1: elapsed 2044957 gcd1: elapsed 2157985 gcd1: elapsed 2055475 > gcd2: elapsed 2054496 gcd2: elapsed 2170720 gcd2: elapsed 2068926 > gcd3: elapsed 2044838 gcd3: elapsed 2167954 gcd3: elapsed 2055305 > gcd4: elapsed 2059033 gcd4: elapsed 2176002 gcd4: elapsed 2079856 > PASS PASS PASS > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel