From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:47741) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gjogS-0006dE-HW for qemu-devel@nongnu.org; Wed, 16 Jan 2019 12:08:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gjogR-0004uu-OW for qemu-devel@nongnu.org; Wed, 16 Jan 2019 12:08:56 -0500 Received: from mail-wm1-x344.google.com ([2a00:1450:4864:20::344]:37163) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gjogR-0004tU-9Y for qemu-devel@nongnu.org; Wed, 16 Jan 2019 12:08:55 -0500 Received: by mail-wm1-x344.google.com with SMTP id g67so2796759wmd.2 for ; Wed, 16 Jan 2019 09:08:54 -0800 (PST) References: <30917d5b-f8cb-e799-6c3e-3202195122b4@redhat.com> <871s5fp54s.fsf@linaro.org> <87zhs3nk1m.fsf@linaro.org> <87y37monyr.fsf@linaro.org> <87won6nfl1.fsf@linaro.org> <6cb80b50-0352-430e-0c46-85ed69f95c88@redhat.com> <87va2poqoz.fsf@linaro.org> <20190115200527.GB7844@flamenco> <479044cb-345a-0faa-795a-f67da0077198@redhat.com> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <479044cb-345a-0faa-795a-f67da0077198@redhat.com> Date: Wed, 16 Jan 2019 17:08:51 +0000 Message-ID: <87a7k0zg0s.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] include/fpu/softfloat: Fix compilation with Clang on s390x List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Huth Cc: "Emilio G. Cota" , Peter Maydell , Cornelia Huck , QEMU Developers , qemu-s390x , Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , Aurelien Jarno , Richard Henderson Thomas Huth writes: > On 2019-01-15 21:05, Emilio G. Cota wrote: >> On Tue, Jan 15, 2019 at 16:01:32 +0000, Alex Benn=C3=A9e wrote: >>> Ahh I should have mentioned we already have the technology for this ;-) >>> >>> If you build the fpu/next tree on a s390x you can then run: >>> >>> ./tests/fp/fp-bench f64_div >>> >>> with and without the CONFIG_128 path. To get an idea of the real world >>> impact you can compile a foreign binary and run it on a s390x system >>> with: >>> >>> $QEMU ./tests/fp/fp-bench f64_div -t host >>> >>> And that will give you the peak performance assuming your program is >>> doing nothing but f64_div operations. If the two QEMU's are basically in >>> the same ballpark then it doesn't make enough difference. That said: >> >> I think you mean here `tests/fp/fp-bench -o div -p double', otherwise >> you'll get the default op (-o add). > > I tried that now, too, and -o div -p double does not really seem to > exercise this function at all. How do you mean? It should do because by default it should be calling the softfloat implementations. > Here are my results (disclaimer: that system is likely not really usable > for benchmarks since it's CPUs are shared with other LPARs, but I ran > all the tests at least twice and got similar results): > > > With the DGLR inline assembly: > > time ./fp-bench -o div -p double > 204.98 MFlops > With the "#else" default 64-bit code: > > time ./fp-bench -o div -p double > 205.41 MFlops > With the new CONFIG_INT128 code: > > time ./fp-bench -o div -p double > 205.17 MFlops > > > =3D=3D> The new CONFIG_INT128 code is really worse than the 64-bit code, = so > I don't think we should include this yet (unless we know a system where > the compiler can create optimized assembly code without libgcc here). I mean to me that looks like it is easily in the noise range and that the dglr instruction didn't actually beat the unrolled 64 bit code - which is just weird. -- Alex Benn=C3=A9e