From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:46550) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gk2Pq-0006dk-Fo for qemu-devel@nongnu.org; Thu, 17 Jan 2019 02:48:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gk2Pp-0000gD-7m for qemu-devel@nongnu.org; Thu, 17 Jan 2019 02:48:42 -0500 Received: from mail-wr1-x443.google.com ([2a00:1450:4864:20::443]:40473) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gk2Pp-0000a8-0w for qemu-devel@nongnu.org; Thu, 17 Jan 2019 02:48:41 -0500 Received: by mail-wr1-x443.google.com with SMTP id p4so9816467wrt.7 for ; Wed, 16 Jan 2019 23:48:33 -0800 (PST) References: <20190116202349.29272-1-alex.bennee@linaro.org> <20190116202349.29272-5-alex.bennee@linaro.org> <1bef2aae-06dc-5062-4ce6-8e2e9adefb46@linaro.org> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <1bef2aae-06dc-5062-4ce6-8e2e9adefb46@linaro.org> Date: Thu, 17 Jan 2019 07:48:30 +0000 Message-ID: <875zunzpv5.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 4/7] softfloat: fallback to __int128 maths for s390x and others List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org, Peter Maydell , Thomas Huth , cohuck@redhat.com, "open list:S390" , Aurelien Jarno Richard Henderson writes: > On 1/17/19 7:23 AM, Alex Benn=C3=A9e wrote: >> Apparently some versions of clang can't handle inline assembly with >> __int128 parameters, especially on s390. Instead of hand-coding the >> s390 divide provide a generic fallback for anything that provides >> __int128 capable maths. >> >> Signed-off-by: Alex Benn=C3=A9e >> Cc: Thomas Huth >> --- >> include/fpu/softfloat-macros.h | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macr= os.h >> index b1d772e6d4..1a43609eef 100644 >> --- a/include/fpu/softfloat-macros.h >> +++ b/include/fpu/softfloat-macros.h >> @@ -641,12 +641,6 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint= 64_t n1, >> uint64_t q; >> asm("divq %4" : "=3Da"(q), "=3Dd"(*r) : "0"(n0), "1"(n1), "rm"(d)); >> return q; >> -#elif defined(__s390x__) >> - /* Need to use a TImode type to get an even register pair for DLGR.= */ >> - unsigned __int128 n =3D (unsigned __int128)n1 << 64 | n0; >> - asm("dlgr %0, %1" : "+r"(n) : "r"(d)); >> - *r =3D n >> 64; >> - return n; >> #elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7) >> /* From Power ISA 2.06, programming note for divdeu. */ >> uint64_t q1, q2, Q, r1, r2, R; >> @@ -663,6 +657,10 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint= 64_t n1, >> } >> *r =3D R; >> return Q; >> +#elif defined(CONFIG_INT128) >> + unsigned __int128 n =3D (unsigned __int128)n1 << 64 | n0; >> + *r =3D n % d; >> + return n / d; >> #else > > I thought that we'd shown that, at least at present, no compiler is taking > advantage of hardware insns for this, and is promoting this to a full 128= -bit > divide. And further that the version using 64-bit arithmetic was competi= tive > with the hardware insn. Yeah it seems so. While Thomas' numbers weren't convincing the CONFIG_INT128 fallback did trigger on my SynQuacer an knocked off about 2 MFlops of it's admittedly slow performance. Amusingly of course it's faster under translation because of the hardware fall back: 07:44:44 [alex@idun:~/l/q/t/fp] (8973c1e5=E2=80=A6) + ./fp-bench -o div -p = double 13.28 MFlops 07:44:49 [alex@idun:~/l/q/t/fp] (8973c1e5=E2=80=A6) + ./fp-bench -o div -p = double -t host 498.20 MFlops 07:44:53 [alex@idun:~/l/q/t/fp] (8973c1e5=E2=80=A6) + ../../aarch64-linux-u= ser/qemu-aarch64 ./fp-bench -o div -p double -t host 52.71 MFlops I'll drop this and use Thomas' #elif defined(__s390x__) && !defined(__clang__) version in the pull-request. -- Alex Benn=C3=A9e