From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:60778) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RpNvT-0006lG-SP for qemu-devel@nongnu.org; Mon, 23 Jan 2012 12:43:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RpNvM-00043x-Ja for qemu-devel@nongnu.org; Mon, 23 Jan 2012 12:43:27 -0500 Received: from cantor2.suse.de ([195.135.220.15]:50269 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RpNvM-00043Y-Al for qemu-devel@nongnu.org; Mon, 23 Jan 2012 12:43:20 -0500 Message-ID: <4F1D9BB8.5070306@suse.de> Date: Mon, 23 Jan 2012 18:41:12 +0100 From: =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= MIME-Version: 1.0 References: <1326674823-13069-1-git-send-email-afaerber@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 00/14] softfloat: Use POSIX integer types - benchmarked List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: Blue Swirl , malc , qemu-devel@nongnu.org, =?UTF-8?B?QXVyw6lsaWVuIEphcm5v?= , Alexander Graf Am 20.01.2012 14:05, schrieb Peter Maydell: > On 16 January 2012 00:46, Andreas F=C3=A4rber wrote: >> For a loop count of 100,000 and 5 runs I got the following results: >> >> current: 138.9-204.1 Whetstone-MIPS >> [u]int*_t: 185.2-188.7 Whetstone-MIPS >> [u]int_fast*_t: 285.7-294.1 Whetstone-MIPS >> >> Toshiba AC100: 833.3-909.1 Whetstone-MIPS >> >> These results seem to indicate that the "fast" POSIX types are indeed >> somewhat faster, both compared to exact-size POSIX types and to the >> current state. >=20 > OTOH I did a run of scimark2 and got: > current tree: > ** ** > ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** > ** for details. (Results can be submitted to pozo@nist.gov) ** > ** ** > Using 2.00 seconds min time per kenel. > Composite Score: 12.98 > FFT Mflops: 7.66 (N=3D1024) > SOR Mflops: 19.49 (100 x 100) > MonteCarlo: Mflops: 6.12 > Sparse matmult Mflops: 15.34 (N=3D1000, nz=3D5000) > LU Mflops: 16.28 (M=3D100, N=3D100) >=20 > with patches (yours and mine): > ** ** > ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** > ** for details. (Results can be submitted to pozo@nist.gov) ** > ** ** > Using 2.00 seconds min time per kenel. > Composite Score: 11.87 > FFT Mflops: 7.12 (N=3D1024) > SOR Mflops: 17.66 (100 x 100) > MonteCarlo: Mflops: 5.75 > Sparse matmult Mflops: 14.03 (N=3D1000, nz=3D5000) > LU Mflops: 14.81 (M=3D100, N=3D100) One difference between our test environments comes to mind: I had tested only typedefs for the int types, whereas my series also converts flag. The fixes for lm32, sparc and qemu-tool shouldn't matter here. Your patches "degrade" some variables from fast types to exact types of course= . Anyway, here's my Whetstone and CoreMark results for 520c0d8d2772ccc9f9275bd934e13ec9b15774e4 (target-sparc: Fix mixup of uint64 and uint64_t) plus our patches. The margin has shrunk for Whetstone, and for CoreMark I see a slight degradation of the max. (I also note a slight Whetstone degradation between Natty and Oneiric.) Have you tried benchmarking after our preceding patches but before the actual fast conversion for comparison? Andreas master: C Converted Double Precision Whetstones: 200.0 MIPS C Converted Double Precision Whetstones: 204.1 MIPS CoreMark 1.0 : 1287.747086 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1336.094595 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1339.943722 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap master + PMM + fast: C Converted Double Precision Whetstones: 204.1 MIPS C Converted Double Precision Whetstones: 208.3 MIPS CoreMark 1.0 : 1297.690112 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1299.629606 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1309.071868 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1315.270288 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1318.913216 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1319.870653 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 1321.527686 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap AC100 (Ubuntu Oneiric): C Converted Double Precision Whetstones: 769.2 MIPS C Converted Double Precision Whetstones: 833.3 MIPS CoreMark 1.0 : 2257.506208 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2303.086135 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2326.122354 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2349.624060 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2350.360389 / GCC4.6.1 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap Pandaboard (openSUSE Factory): C Converted Double Precision Whetstones: 833.3 MIPS CoreMark 1.0 : 2304.855562 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2305.209774 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2306.273063 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap CoreMark 1.0 : 2306.627710 / GCC4.6.2 -O2 -static -DPERFORMANCE_RUN=3D1 -lrt / Heap Results are sorted, duplicates removed. whetstone.c was compiled with -mfloat-abi=3Dhard this time, using GCC (SUSE Linux) 4.6.2, and so was CoreMark except on the AC100 with GCC (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1. coremark.exe was run with parameters 0x0 0x0 0x66 0. --=20 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imend=C3=B6rffer; HRB 16746 AG N=C3=BC= rnberg