* [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division
@ 2018-10-04 17:56 Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 1/4] " Richard Henderson
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Richard Henderson @ 2018-10-04 17:56 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, cota
Changes from v2:
* Add shift128Left. I had been using shortShift128Left,
with a shift of 64, which lead to undefined behaviour.
Which I suspect is exactly the Heisenbug Alex saw.
I did keep the R-b tags I had already applied.
r~
Richard Henderson (4):
softfloat: Fix division
softfloat: Specialize udiv_qrnnd for x86_64
softfloat: Specialize udiv_qrnnd for s390x
softfloat: Specialize udiv_qrnnd for ppc64
include/fpu/softfloat-macros.h | 62 +++++++++++++++++++++++++++++-----
fpu/softfloat.c | 35 ++++++++++++++-----
2 files changed, 80 insertions(+), 17 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 1/4] softfloat: Fix division
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
@ 2018-10-04 17:56 ` Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 2/4] softfloat: Specialize udiv_qrnnd for x86_64 Richard Henderson
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2018-10-04 17:56 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, cota
The __udiv_qrnnd primitive that we nicked from gmp requires its
inputs to be normalized. We were not doing that. Because the
inputs are nearly normalized already, finishing that is trivial.
Replace div128to64 with a "proper" udiv_qrnnd, so that this
remains a reusable primitive.
Fixes: cf07323d494
Fixes: https://bugs.launchpad.net/qemu/+bug/1793119
Tested-by: Emilio G. Cota <cota@braap.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 34 ++++++++++++++++++++++++---------
fpu/softfloat.c | 35 ++++++++++++++++++++++++++--------
2 files changed, 52 insertions(+), 17 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index edc682139e..a1d99c730d 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -329,15 +329,30 @@ static inline void
| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'.
*----------------------------------------------------------------------------*/
-static inline void
- shortShift128Left(
- uint64_t a0, uint64_t a1, int count, uint64_t *z0Ptr, uint64_t *z1Ptr)
+static inline void shortShift128Left(uint64_t a0, uint64_t a1, int count,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
+ *z1Ptr = a1 << count;
+ *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
+}
- *z1Ptr = a1<<count;
- *z0Ptr =
- ( count == 0 ) ? a0 : ( a0<<count ) | ( a1>>( ( - count ) & 63 ) );
+/*----------------------------------------------------------------------------
+| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by the
+| number of bits given in `count'. Any bits shifted off are lost. The value
+| of `count' may be greater than 64. The result is broken into two 64-bit
+| pieces which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'.
+*----------------------------------------------------------------------------*/
+static inline void shift128Left(uint64_t a0, uint64_t a1, int count,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
+{
+ if (count < 64) {
+ *z1Ptr = a1 << count;
+ *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
+ } else {
+ *z1Ptr = 0;
+ *z0Ptr = a1 << (count - 64);
+ }
}
/*----------------------------------------------------------------------------
@@ -619,7 +634,8 @@ static inline uint64_t estimateDiv128To64(uint64_t a0, uint64_t a1, uint64_t b)
*
* Licensed under the GPLv2/LGPLv3
*/
-static inline uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
+static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
+ uint64_t n0, uint64_t d)
{
uint64_t d0, d1, q0, q1, r1, r0, m;
@@ -658,8 +674,8 @@ static inline uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
}
r0 -= m;
- /* Return remainder in LSB */
- return (q1 << 32) | q0 | (r0 != 0);
+ *r = r0;
+ return (q1 << 32) | q0;
}
/*----------------------------------------------------------------------------
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 71da0f68bb..46ae206172 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1112,19 +1112,38 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
bool sign = a.sign ^ b.sign;
if (a.cls == float_class_normal && b.cls == float_class_normal) {
- uint64_t temp_lo, temp_hi;
+ uint64_t n0, n1, q, r;
int exp = a.exp - b.exp;
+
+ /*
+ * We want a 2*N / N-bit division to produce exactly an N-bit
+ * result, so that we do not lose any precision and so that we
+ * do not have to renormalize afterward. If A.frac < B.frac,
+ * then division would produce an (N-1)-bit result; shift A left
+ * by one to produce the an N-bit result, and decrement the
+ * exponent to match.
+ *
+ * The udiv_qrnnd algorithm that we're using requires normalization,
+ * i.e. the msb of the denominator must be set. Since we know that
+ * DECOMPOSED_BINARY_POINT is msb-1, the inputs must be shifted left
+ * by one (more), and the remainder must be shifted right by one.
+ */
if (a.frac < b.frac) {
exp -= 1;
- shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1,
- &temp_hi, &temp_lo);
+ shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 2, &n1, &n0);
} else {
- shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT,
- &temp_hi, &temp_lo);
+ shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1, &n1, &n0);
}
- /* LSB of quot is set if inexact which roundandpack will use
- * to set flags. Yet again we re-use a for the result */
- a.frac = div128To64(temp_lo, temp_hi, b.frac);
+ q = udiv_qrnnd(&r, n1, n0, b.frac << 1);
+
+ /*
+ * Set lsb if there is a remainder, to set inexact.
+ * As mentioned above, to find the actual value of the remainder we
+ * would need to shift right, but (1) we are only concerned about
+ * non-zero-ness, and (2) the remainder will always be even because
+ * both inputs to the division primitive are even.
+ */
+ a.frac = q | (r != 0);
a.sign = sign;
a.exp = exp;
return a;
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 2/4] softfloat: Specialize udiv_qrnnd for x86_64
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 1/4] " Richard Henderson
@ 2018-10-04 17:56 ` Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 3/4] softfloat: Specialize udiv_qrnnd for s390x Richard Henderson
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2018-10-04 17:56 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, cota
The ISA has a 128/64-bit division instruction.
Tested-by: Emilio G. Cota <cota@braap.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a1d99c730d..39eb08b4f1 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -637,6 +637,11 @@ static inline uint64_t estimateDiv128To64(uint64_t a0, uint64_t a1, uint64_t b)
static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
uint64_t n0, uint64_t d)
{
+#if defined(__x86_64__)
+ uint64_t q;
+ asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
+ return q;
+#else
uint64_t d0, d1, q0, q1, r1, r0, m;
d0 = (uint32_t)d;
@@ -676,6 +681,7 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
*r = r0;
return (q1 << 32) | q0;
+#endif
}
/*----------------------------------------------------------------------------
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 3/4] softfloat: Specialize udiv_qrnnd for s390x
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 1/4] " Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 2/4] softfloat: Specialize udiv_qrnnd for x86_64 Richard Henderson
@ 2018-10-04 17:56 ` Richard Henderson
2018-10-04 17:57 ` [Qemu-devel] [PATCH v3 4/4] softfloat: Specialize udiv_qrnnd for ppc64 Richard Henderson
2018-10-05 15:59 ` [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Alex Bennée
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2018-10-04 17:56 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, cota
The ISA has a 128/64-bit division instruction.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 39eb08b4f1..eafc68932b 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -641,6 +641,12 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
uint64_t q;
asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
return q;
+#elif defined(__s390x__)
+ /* Need to use a TImode type to get an even register pair for DLGR. */
+ unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
+ asm("dlgr %0, %1" : "+r"(n) : "r"(d));
+ *r = n >> 64;
+ return n;
#else
uint64_t d0, d1, q0, q1, r1, r0, m;
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v3 4/4] softfloat: Specialize udiv_qrnnd for ppc64
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
` (2 preceding siblings ...)
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 3/4] softfloat: Specialize udiv_qrnnd for s390x Richard Henderson
@ 2018-10-04 17:57 ` Richard Henderson
2018-10-05 15:59 ` [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Alex Bennée
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2018-10-04 17:57 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, cota
The ISA has a 128/64-bit division instruction, though it assumes the
low 64-bits of the numerator are 0, and so requires a bit more fixup
than a full 128-bit division insn.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index eafc68932b..c86687fa5e 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -647,6 +647,22 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
asm("dlgr %0, %1" : "+r"(n) : "r"(d));
*r = n >> 64;
return n;
+#elif defined(_ARCH_PPC64)
+ /* From Power ISA 3.0B, programming note for divdeu. */
+ uint64_t q1, q2, Q, r1, r2, R;
+ asm("divdeu %0,%2,%4; divdu %1,%3,%4"
+ : "=&r"(q1), "=r"(q2)
+ : "r"(n1), "r"(n0), "r"(d));
+ r1 = -(q1 * d); /* low part of (n1<<64) - (q1 * d) */
+ r2 = n0 - (q2 * d);
+ Q = q1 + q2;
+ R = r1 + r2;
+ if (R >= d || R < r2) { /* overflow implies R > d */
+ Q += 1;
+ R -= d;
+ }
+ *r = R;
+ return Q;
#else
uint64_t d0, d1, q0, q1, r1, r0, m;
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
` (3 preceding siblings ...)
2018-10-04 17:57 ` [Qemu-devel] [PATCH v3 4/4] softfloat: Specialize udiv_qrnnd for ppc64 Richard Henderson
@ 2018-10-05 15:59 ` Alex Bennée
4 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2018-10-05 15:59 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, cota
Richard Henderson <richard.henderson@linaro.org> writes:
> Changes from v2:
> * Add shift128Left. I had been using shortShift128Left,
> with a shift of 64, which lead to undefined behaviour.
> Which I suspect is exactly the Heisenbug Alex saw.
No weirdness this time around
>
> I did keep the R-b tags I had already applied.
You already have mine but this version also:
Tested-by: Alex Bennée <alex.bennee@linaro.org>
>
>
> r~
>
>
> Richard Henderson (4):
> softfloat: Fix division
> softfloat: Specialize udiv_qrnnd for x86_64
> softfloat: Specialize udiv_qrnnd for s390x
> softfloat: Specialize udiv_qrnnd for ppc64
>
> include/fpu/softfloat-macros.h | 62 +++++++++++++++++++++++++++++-----
> fpu/softfloat.c | 35 ++++++++++++++-----
> 2 files changed, 80 insertions(+), 17 deletions(-)
--
Alex Bennée
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-10-05 15:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-04 17:56 [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 1/4] " Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 2/4] softfloat: Specialize udiv_qrnnd for x86_64 Richard Henderson
2018-10-04 17:56 ` [Qemu-devel] [PATCH v3 3/4] softfloat: Specialize udiv_qrnnd for s390x Richard Henderson
2018-10-04 17:57 ` [Qemu-devel] [PATCH v3 4/4] softfloat: Specialize udiv_qrnnd for ppc64 Richard Henderson
2018-10-05 15:59 ` [Qemu-devel] [PATCH v3 0/4] softfloat: Fix division Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).