[RFC PATCH 00/15] softfloat: alternate conversion of float128

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
@ 2020-10-21  4:51 Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
                   ` (16 more replies)
  0 siblings, 17 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Hi Alex,

Here's my first adjustment to your conversion for 128-bit floats.

The Idea is to use a set of macros and an include file so that we
can re-use the same large chunk of code that performs the basic
operations on various fraction lengths.  It's ugly, but without
proper language support it seems to be less ugly than most.

While I've just gone and added lots of stuff to int128... I have
had another idea, half-baked because I'm tired and it's late:

typedef struct {
    FloatClass cls;
    int exp;
    bool sign;
    uint64_t frac[];
} FloatPartsBase;

typedef struct {
    FloatPartsBase base;
    uint64_t frac;
} FloatParts64;

typedef struct {
    FloatPartsBase base;
    uint64_t frac_hi, frac_lo;
} FloatParts128;

typedef struct {
    FloatPartsBase base;
    uint64_t frac[4]; /* big endian word ordering */
} FloatParts256;

This layout, with the big-endian ordering, means that storage
can be shared between them, just by ignoring the least significant
words of the fraction as needed.  Which may make muladd more
understandable.

E.g.

static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
                            FloatParts64 *b, FloatParts128 *c, ...)
{
    // handle nans
    // produce 128-bit product into r
    // handle p vs c special cases.
    // zero-extend c to 128-bits
    c->frac[1] = 0;
    // perform 128-bit fractional addition
    addsub_floats128(r, c, ...);
    // fold 128-bit fraction to 64-bit sticky bit.
    r->frac[0] |= r->frac[1] != 0;
}

float64 float64_muladd(float64 a, float64 b, float64 c, ...)
{
    FloatParts64 pa, pb;
    FloatParts128 pc, pr;

    float64_unpack_canonical(&pa.base, a, status);
    float64_unpack_canonical(&pb.base, b, status);
    float64_unpack_canonical(&pc.base, c, status);
    muladd_floats64(&pr, &pa, &pb, &pc, flags, status);

    return float64_round_pack_canonical(&pr.base, status);
}

Similarly, muladd_floats128 would use addsub_floats256.

However, the big-endian word ordering means that Int128
cannot be used directly; so a set of wrappers are needed.
If added the Int128 routine just for use here, then it's
probably easier to bypass Int128 and just code it here.

Thoughts?


r~


Richard Henderson (15):
  qemu/int128: Add int128_or
  qemu/int128: Add int128_clz, int128_ctz
  qemu/int128: Rename int128_rshift, int128_lshift
  qemu/int128: Add int128_shr
  qemu/int128: Add int128_geu
  softfloat: Use mulu64 for mul64To128
  softfloat: Use int128.h for some operations
  softfloat: Tidy a * b + inf return
  softfloat: Add float_cmask and constants
  softfloat: Inline float_raise
  Test split to softfloat-parts.c.inc
  softfloat: Streamline FloatFmt
  Test float128_addsub
  softfloat: Use float_cmask for addsub_floats
  softfloat: Improve subtraction of equal exponent

 include/fpu/softfloat-macros.h |  89 ++--
 include/fpu/softfloat.h        |   5 +-
 include/qemu/int128.h          |  61 ++-
 fpu/softfloat.c                | 802 ++++++++++-----------------------
 softmmu/physmem.c              |   4 +-
 target/ppc/int_helper.c        |   4 +-
 tests/test-int128.c            |  44 +-
 fpu/softfloat-parts.c.inc      | 339 ++++++++++++++
 fpu/softfloat-specialize.c.inc |  45 +-
 9 files changed, 716 insertions(+), 677 deletions(-)
 create mode 100644 fpu/softfloat-parts.c.inc

-- 
2.25.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 01/15] qemu/int128: Add int128_or
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21 17:13   ` Alex Bennée
                     ` (3 more replies)
  2020-10-21  4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
                   ` (15 subsequent siblings)
  16 siblings, 4 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 76ea405922..52fc238421 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -58,6 +58,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
     return a & b;
 }
 
+static inline Int128 int128_or(Int128 a, Int128 b)
+{
+    return a | b;
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
     return a >> n;
@@ -208,6 +213,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
     return (Int128) { a.lo & b.lo, a.hi & b.hi };
 }
 
+static inline Int128 int128_or(Int128 a, Int128 b)
+{
+    return (Int128) { a.lo | b.lo, a.hi | b.hi };
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
     int64_t h;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
@ 2020-10-21 17:13   ` Alex Bennée
  2020-10-29 15:01   ` Taylor Simpson
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [RFC PATCH 01/15] qemu/int128: Add int128_or
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
  2020-10-21 17:13   ` Alex Bennée
@ 2020-10-29 15:01   ` Taylor Simpson
  2020-10-29 18:09   ` Philippe Mathieu-Daudé
  2021-02-14 18:17   ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 30+ messages in thread
From: Taylor Simpson @ 2020-10-29 15:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel@nongnu.org; +Cc: alex.bennee@linaro.org

Reviewed by: Taylor Simpson <tsimpson@quicinc.com>

> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+tsimpson=quicinc.com@nongnu.org> On Behalf Of Richard
> Henderson
> Sent: Tuesday, October 20, 2020 11:52 PM
> To: qemu-devel@nongnu.org
> Cc: alex.bennee@linaro.org
> Subject: [RFC PATCH 01/15] qemu/int128: Add int128_or
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 76ea405922..52fc238421 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -58,6 +58,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
>      return a & b;
>  }
>
> +static inline Int128 int128_or(Int128 a, Int128 b)
> +{
> +    return a | b;
> +}
> +
>  static inline Int128 int128_rshift(Int128 a, int n)
>  {
>      return a >> n;
> @@ -208,6 +213,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
>      return (Int128) { a.lo & b.lo, a.hi & b.hi };
>  }
>
> +static inline Int128 int128_or(Int128 a, Int128 b)
> +{
> +    return (Int128) { a.lo | b.lo, a.hi | b.hi };
> +}
> +
>  static inline Int128 int128_rshift(Int128 a, int n)
>  {
>      int64_t h;
> --
> 2.25.1
>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
  2020-10-21 17:13   ` Alex Bennée
  2020-10-29 15:01   ` Taylor Simpson
@ 2020-10-29 18:09   ` Philippe Mathieu-Daudé
  2021-02-14 18:17   ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-29 18:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
                     ` (2 preceding siblings ...)
  2020-10-29 18:09   ` Philippe Mathieu-Daudé
@ 2021-02-14 18:17   ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21 17:13   ` Alex Bennée
  2021-02-14 18:17   ` Philippe Mathieu-Daudé
  2020-10-21  4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 52fc238421..055f202d08 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -1,9 +1,9 @@
 #ifndef INT128_H
 #define INT128_H
 
-#ifdef CONFIG_INT128
-#include "qemu/bswap.h"
+#include "qemu/host-utils.h"
 
+#ifdef CONFIG_INT128
 typedef __int128_t Int128;
 
 static inline Int128 int128_make64(uint64_t a)
@@ -328,4 +328,17 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
 }
 
 #endif /* CONFIG_INT128 */
+
+static inline int int128_clz(Int128 a)
+{
+    uint64_t h = int128_gethi(a);
+    return h ? clz64(h) : 64 + clz64(int128_getlo(a));
+}
+
+static inline int int128_ctz(Int128 a)
+{
+    uint64_t l = int128_getlo(a);
+    return l ? ctz64(l) : 64 + ctz64(int128_gethi(a));
+}
+
 #endif /* INT128_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
  2020-10-21  4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
@ 2020-10-21 17:13   ` Alex Bennée
  2021-02-14 18:17   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
  2020-10-21  4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
  2020-10-21 17:13   ` Alex Bennée
@ 2021-02-14 18:17   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21 17:14   ` Alex Bennée
  2021-02-14 18:18   ` Philippe Mathieu-Daudé
  2020-10-21  4:51 ` [RFC PATCH 04/15] qemu/int128: Add int128_shr Richard Henderson
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Change these to sar/shl to emphasize the signed shift.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h   |  8 ++++----
 softmmu/physmem.c       |  4 ++--
 target/ppc/int_helper.c |  4 ++--
 tests/test-int128.c     | 44 ++++++++++++++++++++---------------------
 4 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 055f202d08..167f13ae10 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -63,12 +63,12 @@ static inline Int128 int128_or(Int128 a, Int128 b)
     return a | b;
 }
 
-static inline Int128 int128_rshift(Int128 a, int n)
+static inline Int128 int128_sar(Int128 a, int n)
 {
     return a >> n;
 }
 
-static inline Int128 int128_lshift(Int128 a, int n)
+static inline Int128 int128_shl(Int128 a, int n)
 {
     return a << n;
 }
@@ -218,7 +218,7 @@ static inline Int128 int128_or(Int128 a, Int128 b)
     return (Int128) { a.lo | b.lo, a.hi | b.hi };
 }
 
-static inline Int128 int128_rshift(Int128 a, int n)
+static inline Int128 int128_sar(Int128 a, int n)
 {
     int64_t h;
     if (!n) {
@@ -232,7 +232,7 @@ static inline Int128 int128_rshift(Int128 a, int n)
     }
 }
 
-static inline Int128 int128_lshift(Int128 a, int n)
+static inline Int128 int128_shl(Int128 a, int n)
 {
     uint64_t l = a.lo << (n & 63);
     if (n >= 64) {
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index e319fb2a1e..7f6e98e7b0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1156,8 +1156,8 @@ static void register_multipage(FlatView *fv,
     AddressSpaceDispatch *d = flatview_to_dispatch(fv);
     hwaddr start_addr = section->offset_within_address_space;
     uint16_t section_index = phys_section_add(&d->map, section);
-    uint64_t num_pages = int128_get64(int128_rshift(section->size,
-                                                    TARGET_PAGE_BITS));
+    uint64_t num_pages = int128_get64(int128_sar(section->size,
+                                                 TARGET_PAGE_BITS));
 
     assert(num_pages);
     phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index b45626f44c..fe569590b4 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1444,7 +1444,7 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
         } else {                                                    \
             index = ((15 - (a & 0xf) + 1) * 8) - size;              \
         }                                                           \
-        return int128_getlo(int128_rshift(b->s128, index)) &        \
+        return int128_getlo(int128_sar(b->s128, index)) &           \
             MAKE_64BIT_MASK(0, size);                               \
     }
 #else
@@ -1457,7 +1457,7 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
         } else {                                                    \
             index = (a & 0xf) * 8;                                  \
         }                                                           \
-        return int128_getlo(int128_rshift(b->s128, index)) &        \
+        return int128_getlo(int128_sar(b->s128, index)) &           \
             MAKE_64BIT_MASK(0, size);                               \
     }
 #endif
diff --git a/tests/test-int128.c b/tests/test-int128.c
index b86a3c76e6..9bd6cb59ec 100644
--- a/tests/test-int128.c
+++ b/tests/test-int128.c
@@ -176,34 +176,34 @@ static void test_gt(void)
 /* Make sure to test undefined behavior at runtime! */
 
 static void __attribute__((__noinline__)) ATTRIBUTE_NOCLONE
-test_rshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
+test_sar_one(uint32_t x, int n, uint64_t h, uint64_t l)
 {
     Int128 a = expand(x);
-    Int128 r = int128_rshift(a, n);
+    Int128 r = int128_sar(a, n);
     g_assert_cmpuint(int128_getlo(r), ==, l);
     g_assert_cmpuint(int128_gethi(r), ==, h);
 }
 
-static void test_rshift(void)
+static void test_sar(void)
 {
-    test_rshift_one(0x00010000U, 64, 0x0000000000000000ULL, 0x0000000000000001ULL);
-    test_rshift_one(0x80010000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0x8000000000000001ULL);
-    test_rshift_one(0x7FFE0000U, 64, 0x0000000000000000ULL, 0x7FFFFFFFFFFFFFFEULL);
-    test_rshift_one(0xFFFE0000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFEULL);
-    test_rshift_one(0x00010000U, 60, 0x0000000000000000ULL, 0x0000000000000010ULL);
-    test_rshift_one(0x80010000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000010ULL);
-    test_rshift_one(0x00018000U, 60, 0x0000000000000000ULL, 0x0000000000000018ULL);
-    test_rshift_one(0x80018000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000018ULL);
-    test_rshift_one(0x7FFE0000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE0ULL);
-    test_rshift_one(0xFFFE0000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE0ULL);
-    test_rshift_one(0x7FFE8000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE8ULL);
-    test_rshift_one(0xFFFE8000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE8ULL);
-    test_rshift_one(0x00018000U,  0, 0x0000000000000001ULL, 0x8000000000000000ULL);
-    test_rshift_one(0x80018000U,  0, 0x8000000000000001ULL, 0x8000000000000000ULL);
-    test_rshift_one(0x7FFE0000U,  0, 0x7FFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
-    test_rshift_one(0xFFFE0000U,  0, 0xFFFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
-    test_rshift_one(0x7FFE8000U,  0, 0x7FFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
-    test_rshift_one(0xFFFE8000U,  0, 0xFFFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
+    test_sar_one(0x00010000U, 64, 0x0000000000000000ULL, 0x0000000000000001ULL);
+    test_sar_one(0x80010000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0x8000000000000001ULL);
+    test_sar_one(0x7FFE0000U, 64, 0x0000000000000000ULL, 0x7FFFFFFFFFFFFFFEULL);
+    test_sar_one(0xFFFE0000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFEULL);
+    test_sar_one(0x00010000U, 60, 0x0000000000000000ULL, 0x0000000000000010ULL);
+    test_sar_one(0x80010000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000010ULL);
+    test_sar_one(0x00018000U, 60, 0x0000000000000000ULL, 0x0000000000000018ULL);
+    test_sar_one(0x80018000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000018ULL);
+    test_sar_one(0x7FFE0000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE0ULL);
+    test_sar_one(0xFFFE0000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE0ULL);
+    test_sar_one(0x7FFE8000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE8ULL);
+    test_sar_one(0xFFFE8000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE8ULL);
+    test_sar_one(0x00018000U,  0, 0x0000000000000001ULL, 0x8000000000000000ULL);
+    test_sar_one(0x80018000U,  0, 0x8000000000000001ULL, 0x8000000000000000ULL);
+    test_sar_one(0x7FFE0000U,  0, 0x7FFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
+    test_sar_one(0xFFFE0000U,  0, 0xFFFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
+    test_sar_one(0x7FFE8000U,  0, 0x7FFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
+    test_sar_one(0xFFFE8000U,  0, 0xFFFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
 }
 
 int main(int argc, char **argv)
@@ -218,6 +218,6 @@ int main(int argc, char **argv)
     g_test_add_func("/int128/int128_lt", test_lt);
     g_test_add_func("/int128/int128_ge", test_ge);
     g_test_add_func("/int128/int128_gt", test_gt);
-    g_test_add_func("/int128/int128_rshift", test_rshift);
+    g_test_add_func("/int128/int128_sar", test_sar);
     return g_test_run();
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
  2020-10-21  4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
@ 2020-10-21 17:14   ` Alex Bennée
  2021-02-14 18:18   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Change these to sar/shl to emphasize the signed shift.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
  2020-10-21  4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
  2020-10-21 17:14   ` Alex Bennée
@ 2021-02-14 18:18   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Change these to sar/shl to emphasize the signed shift.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h   |  8 ++++----
>  softmmu/physmem.c       |  4 ++--
>  target/ppc/int_helper.c |  4 ++--
>  tests/test-int128.c     | 44 ++++++++++++++++++++---------------------
>  4 files changed, 30 insertions(+), 30 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 04/15] qemu/int128: Add int128_shr
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (2 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Add unsigned right shift as an operation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 167f13ae10..c53002039a 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -68,6 +68,11 @@ static inline Int128 int128_sar(Int128 a, int n)
     return a >> n;
 }
 
+static inline Int128 int128_shr(Int128 a, int n)
+{
+    return (__uint128_t)a >> n;
+}
+
 static inline Int128 int128_shl(Int128 a, int n)
 {
     return a << n;
@@ -232,6 +237,17 @@ static inline Int128 int128_sar(Int128 a, int n)
     }
 }
 
+static inline Int128 int128_shr(Int128 a, int n)
+{
+    uint64_t h = (uint64_t)a.hi >> (n & 63);
+    if (n >= 64) {
+        return int128_make64(h);
+    } else if (n > 0) {
+        return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h);
+    }
+    return a;
+}
+
 static inline Int128 int128_shl(Int128 a, int n)
 {
     uint64_t l = a.lo << (n & 63);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 05/15] qemu/int128: Add int128_geu
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (3 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 04/15] qemu/int128: Add int128_shr Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2021-02-14 18:19   ` Philippe Mathieu-Daudé
  2020-10-21  4:51 ` [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128 Richard Henderson
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Add an unsigned inequality operation.  Do not fill in all of
the variations until we have a call for them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index c53002039a..1f95792a29 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -113,6 +113,11 @@ static inline bool int128_ge(Int128 a, Int128 b)
     return a >= b;
 }
 
+static inline bool int128_geu(Int128 a, Int128 b)
+{
+    return (__uint128_t)a >= (__uint128_t)b;
+}
+
 static inline bool int128_lt(Int128 a, Int128 b)
 {
     return a < b;
@@ -303,6 +308,11 @@ static inline bool int128_ge(Int128 a, Int128 b)
     return a.hi > b.hi || (a.hi == b.hi && a.lo >= b.lo);
 }
 
+static inline bool int128_geu(Int128 a, Int128 b)
+{
+    return (uint64_t)a.hi > (uint64_t)b.hi || (a.hi == b.hi && a.lo >= b.lo);
+}
+
 static inline bool int128_lt(Int128 a, Int128 b)
 {
     return !int128_ge(a, b);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 05/15] qemu/int128: Add int128_geu
  2020-10-21  4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
@ 2021-02-14 18:19   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Add an unsigned inequality operation.  Do not fill in all of
> the variations until we have a call for them.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/int128.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (4 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 07/15] softfloat: Use int128.h for some operations Richard Henderson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Hildenbrand

Via host-utils.h, we use a host widening multiply for
64-bit hosts, and a common subroutine for 32-bit hosts.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a35ec2893a..57845f8af0 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -83,6 +83,7 @@ this code that are retained.
 #define FPU_SOFTFLOAT_MACROS_H
 
 #include "fpu/softfloat-types.h"
+#include "qemu/host-utils.h"
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -515,27 +516,10 @@ static inline void
 | `z0Ptr' and `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void
+mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint32_t aHigh, aLow, bHigh, bLow;
-    uint64_t z0, zMiddleA, zMiddleB, z1;
-
-    aLow = a;
-    aHigh = a>>32;
-    bLow = b;
-    bHigh = b>>32;
-    z1 = ( (uint64_t) aLow ) * bLow;
-    zMiddleA = ( (uint64_t) aLow ) * bHigh;
-    zMiddleB = ( (uint64_t) aHigh ) * bLow;
-    z0 = ( (uint64_t) aHigh ) * bHigh;
-    zMiddleA += zMiddleB;
-    z0 += ( ( (uint64_t) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 );
-    zMiddleA <<= 32;
-    z1 += zMiddleA;
-    z0 += ( z1 < zMiddleA );
-    *z1Ptr = z1;
-    *z0Ptr = z0;
-
+    mulu64(z1Ptr, z0Ptr, a, b);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 07/15] softfloat: Use int128.h for some operations
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (5 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 08/15] softfloat: Tidy a * b + inf return Richard Henderson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Hildenbrand

Use our Int128, which wraps the compiler's __int128_t, instead
of open-coding shifts and arithmetic.  We'd need to extend Int128
to to replace more than these four.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 65 ++++++++++++++--------------------
 1 file changed, 26 insertions(+), 39 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 57845f8af0..e6f05c048e 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -84,6 +84,7 @@ this code that are retained.
 
 #include "fpu/softfloat-types.h"
 #include "qemu/host-utils.h"
+#include "qemu/int128.h"
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -191,28 +192,14 @@ static inline void
 | which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- shift128Right(
-     uint64_t a0, uint64_t a1, int count, uint64_t *z0Ptr, uint64_t *z1Ptr)
+static inline void shift128Right(uint64_t a0, uint64_t a1, int count,
+                                 uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint64_t z0, z1;
-    int8_t negCount = ( - count ) & 63;
-
-    if ( count == 0 ) {
-        z1 = a1;
-        z0 = a0;
-    }
-    else if ( count < 64 ) {
-        z1 = ( a0<<negCount ) | ( a1>>count );
-        z0 = a0>>count;
-    }
-    else {
-        z1 = (count < 128) ? (a0 >> (count & 63)) : 0;
-        z0 = 0;
-    }
-    *z1Ptr = z1;
-    *z0Ptr = z0;
+    Int128 a = int128_make128(a1, a0);
+    Int128 z = int128_shr(a, count);
 
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
@@ -352,13 +339,11 @@ static inline void shortShift128Left(uint64_t a0, uint64_t a1, int count,
 static inline void shift128Left(uint64_t a0, uint64_t a1, int count,
                                 uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    if (count < 64) {
-        *z1Ptr = a1 << count;
-        *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
-    } else {
-        *z1Ptr = 0;
-        *z0Ptr = a1 << (count - 64);
-    }
+    Int128 a = int128_make128(a1, a0);
+    Int128 z = int128_shl(a, count);
+
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
@@ -405,15 +390,15 @@ static inline void
 *----------------------------------------------------------------------------*/
 
 static inline void
- add128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+add128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+       uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint64_t z1;
-
-    z1 = a1 + b1;
-    *z1Ptr = z1;
-    *z0Ptr = a0 + b0 + ( z1 < a1 );
+    Int128 a = int128_make128(a1, a0);
+    Int128 b = int128_make128(b1, b0);
+    Int128 z = int128_add(a, b);
 
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
@@ -463,13 +448,15 @@ static inline void
 *----------------------------------------------------------------------------*/
 
 static inline void
- sub128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+sub128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+       uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
+    Int128 a = int128_make128(a1, a0);
+    Int128 b = int128_make128(b1, b0);
+    Int128 z = int128_sub(a, b);
 
-    *z1Ptr = a1 - b1;
-    *z0Ptr = a0 - b0 - ( a1 < b1 );
-
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 08/15] softfloat: Tidy a * b + inf return
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (6 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 07/15] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 09/15] softfloat: Add float_cmask and constants Richard Henderson
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, Philippe Mathieu-Daudé, David Hildenbrand

No reason to set values in 'a', when we already
have float_class_inf in 'c', and can flip that sign.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 67cfa0fd82..9db55d2b11 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1380,9 +1380,8 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
             s->float_exception_flags |= float_flag_invalid;
             return parts_default_nan(s);
         } else {
-            a.cls = float_class_inf;
-            a.sign = c.sign ^ sign_flip;
-            return a;
+            c.sign ^= sign_flip;
+            return c;
         }
     }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 09/15] softfloat: Add float_cmask and constants
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (7 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 08/15] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Hildenbrand

Testing more than one class at a time is better done with masks.
This reduces the static branch count.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9db55d2b11..3e625c47cd 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -469,6 +469,20 @@ typedef enum __attribute__ ((__packed__)) {
     float_class_snan,
 } FloatClass;
 
+#define float_cmask(bit)  (1u << (bit))
+
+enum {
+    float_cmask_zero    = float_cmask(float_class_zero),
+    float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_inf     = float_cmask(float_class_inf),
+    float_cmask_qnan    = float_cmask(float_class_qnan),
+    float_cmask_snan    = float_cmask(float_class_snan),
+
+    float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+    float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+};
+
+
 /* Simple helpers for checking if, or what kind of, NaN we have */
 static inline __attribute__((unused)) bool is_nan(FloatClass c)
 {
@@ -1335,24 +1349,27 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
                                 int flags, float_status *s)
 {
-    bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
-                    ((1 << float_class_inf) | (1 << float_class_zero));
-    bool p_sign;
+    bool inf_zero, p_sign;
     bool sign_flip = flags & float_muladd_negate_result;
     FloatClass p_class;
     uint64_t hi, lo;
     int p_exp;
+    int ab_mask, abc_mask;
+
+    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+    abc_mask = float_cmask(c.cls) | ab_mask;
+    inf_zero = ab_mask == float_cmask_infzero;
 
     /* It is implementation-defined whether the cases of (0,inf,qnan)
      * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
      * they return if they do), so we have to hand this information
      * off to the target-specific pick-a-NaN routine.
      */
-    if (is_nan(a.cls) || is_nan(b.cls) || is_nan(c.cls)) {
+    if (unlikely(abc_mask & float_cmask_anynan)) {
         return pick_nan_muladd(a, b, c, inf_zero, s);
     }
 
-    if (inf_zero) {
+    if (unlikely(inf_zero)) {
         s->float_exception_flags |= float_flag_invalid;
         return parts_default_nan(s);
     }
@@ -1367,9 +1384,9 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
         p_sign ^= 1;
     }
 
-    if (a.cls == float_class_inf || b.cls == float_class_inf) {
+    if (ab_mask & float_cmask_inf) {
         p_class = float_class_inf;
-    } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+    } else if (ab_mask & float_cmask_zero) {
         p_class = float_class_zero;
     } else {
         p_class = float_class_normal;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 10/15] softfloat: Inline float_raise
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (8 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 09/15] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21 17:16   ` Alex Bennée
  2021-02-14 18:20   ` Philippe Mathieu-Daudé
  2020-10-21  4:51 ` [RFC PATCH 11/15] Test split to softfloat-parts.c.inc Richard Henderson
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h        |  5 ++++-
 fpu/softfloat-specialize.c.inc | 12 ------------
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 78ad5ca738..019c2ec66d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -100,7 +100,10 @@ typedef enum {
 | Routine to raise any or all of the software IEC/IEEE floating-point
 | exception flags.
 *----------------------------------------------------------------------------*/
-void float_raise(uint8_t flags, float_status *status);
+static inline void float_raise(uint8_t flags, float_status *status)
+{
+    status->float_exception_flags |= flags;
+}
 
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index c2f87addb2..0fe8ce408d 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -225,18 +225,6 @@ floatx80 floatx80_default_nan(float_status *status)
 const floatx80 floatx80_infinity
     = make_floatx80_init(floatx80_infinity_high, floatx80_infinity_low);
 
-/*----------------------------------------------------------------------------
-| Raises the exceptions specified by `flags'.  Floating-point traps can be
-| defined here if desired.  It is currently not possible for such a trap
-| to substitute a result value.  If traps are not implemented, this routine
-| should be simply `float_exception_flags |= flags;'.
-*----------------------------------------------------------------------------*/
-
-void float_raise(uint8_t flags, float_status *status)
-{
-    status->float_exception_flags |= flags;
-}
-
 /*----------------------------------------------------------------------------
 | Internal canonical NaN format.
 *----------------------------------------------------------------------------*/
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 10/15] softfloat: Inline float_raise
  2020-10-21  4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
@ 2020-10-21 17:16   ` Alex Bennée
  2021-02-14 18:20   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 10/15] softfloat: Inline float_raise
  2020-10-21  4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
  2020-10-21 17:16   ` Alex Bennée
@ 2021-02-14 18:20   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:20 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/fpu/softfloat.h        |  5 ++++-
>  fpu/softfloat-specialize.c.inc | 12 ------------
>  2 files changed, 4 insertions(+), 13 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 11/15] Test split to softfloat-parts.c.inc
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (9 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 12/15] softfloat: Streamline FloatFmt Richard Henderson
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

---
 fpu/softfloat.c           | 438 ++++++++------------------------------
 fpu/softfloat-parts.c.inc | 327 ++++++++++++++++++++++++++++
 2 files changed, 421 insertions(+), 344 deletions(-)
 create mode 100644 fpu/softfloat-parts.c.inc

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..3651f4525d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -651,191 +651,109 @@ static inline float64 float64_pack_raw(FloatParts p)
 *----------------------------------------------------------------------------*/
 #include "softfloat-specialize.c.inc"
 
-/* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
-                                  float_status *status)
+static FloatParts return_nan(FloatParts a, float_status *s)
 {
-    if (part.exp == parm->exp_max && !parm->arm_althp) {
-        if (part.frac == 0) {
-            part.cls = float_class_inf;
-        } else {
-            part.frac <<= parm->frac_shift;
-            part.cls = (parts_is_snan_frac(part.frac, status)
-                        ? float_class_snan : float_class_qnan);
-        }
-    } else if (part.exp == 0) {
-        if (likely(part.frac == 0)) {
-            part.cls = float_class_zero;
-        } else if (status->flush_inputs_to_zero) {
-            float_raise(float_flag_input_denormal, status);
-            part.cls = float_class_zero;
-            part.frac = 0;
-        } else {
-            int shift = clz64(part.frac) - 1;
-            part.cls = float_class_normal;
-            part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
-            part.frac <<= shift;
-        }
-    } else {
-        part.cls = float_class_normal;
-        part.exp -= parm->exp_bias;
-        part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
-    }
-    return part;
-}
-
-/* Round and uncanonicalize a floating-point number by parts. There
- * are FRAC_SHIFT bits that may require rounding at the bottom of the
- * fraction; these bits will be removed. The exponent will be biased
- * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
- */
-
-static FloatParts round_canonical(FloatParts p, float_status *s,
-                                  const FloatFmt *parm)
-{
-    const uint64_t frac_lsb = parm->frac_lsb;
-    const uint64_t frac_lsbm1 = parm->frac_lsbm1;
-    const uint64_t round_mask = parm->round_mask;
-    const uint64_t roundeven_mask = parm->roundeven_mask;
-    const int exp_max = parm->exp_max;
-    const int frac_shift = parm->frac_shift;
-    uint64_t frac, inc;
-    int exp, flags = 0;
-    bool overflow_norm;
-
-    frac = p.frac;
-    exp = p.exp;
-
-    switch (p.cls) {
-    case float_class_normal:
-        switch (s->float_rounding_mode) {
-        case float_round_nearest_even:
-            overflow_norm = false;
-            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-            break;
-        case float_round_ties_away:
-            overflow_norm = false;
-            inc = frac_lsbm1;
-            break;
-        case float_round_to_zero:
-            overflow_norm = true;
-            inc = 0;
-            break;
-        case float_round_up:
-            inc = p.sign ? 0 : round_mask;
-            overflow_norm = p.sign;
-            break;
-        case float_round_down:
-            inc = p.sign ? round_mask : 0;
-            overflow_norm = !p.sign;
-            break;
-        case float_round_to_odd:
-            overflow_norm = true;
-            inc = frac & frac_lsb ? 0 : round_mask;
-            break;
-        default:
-            g_assert_not_reached();
-        }
-
-        exp += parm->exp_bias;
-        if (likely(exp > 0)) {
-            if (frac & round_mask) {
-                flags |= float_flag_inexact;
-                frac += inc;
-                if (frac & DECOMPOSED_OVERFLOW_BIT) {
-                    frac >>= 1;
-                    exp++;
-                }
-            }
-            frac >>= frac_shift;
-
-            if (parm->arm_althp) {
-                /* ARM Alt HP eschews Inf and NaN for a wider exponent.  */
-                if (unlikely(exp > exp_max)) {
-                    /* Overflow.  Return the maximum normal.  */
-                    flags = float_flag_invalid;
-                    exp = exp_max;
-                    frac = -1;
-                }
-            } else if (unlikely(exp >= exp_max)) {
-                flags |= float_flag_overflow | float_flag_inexact;
-                if (overflow_norm) {
-                    exp = exp_max - 1;
-                    frac = -1;
-                } else {
-                    p.cls = float_class_inf;
-                    goto do_inf;
-                }
-            }
-        } else if (s->flush_to_zero) {
-            flags |= float_flag_output_denormal;
-            p.cls = float_class_zero;
-            goto do_zero;
-        } else {
-            bool is_tiny = s->tininess_before_rounding
-                        || (exp < 0)
-                        || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
-
-            shift64RightJamming(frac, 1 - exp, &frac);
-            if (frac & round_mask) {
-                /* Need to recompute round-to-even.  */
-                switch (s->float_rounding_mode) {
-                case float_round_nearest_even:
-                    inc = ((frac & roundeven_mask) != frac_lsbm1
-                           ? frac_lsbm1 : 0);
-                    break;
-                case float_round_to_odd:
-                    inc = frac & frac_lsb ? 0 : round_mask;
-                    break;
-                default:
-                    break;
-                }
-                flags |= float_flag_inexact;
-                frac += inc;
-            }
-
-            exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
-            frac >>= frac_shift;
-
-            if (is_tiny && (flags & float_flag_inexact)) {
-                flags |= float_flag_underflow;
-            }
-            if (exp == 0 && frac == 0) {
-                p.cls = float_class_zero;
-            }
-        }
-        break;
-
-    case float_class_zero:
-    do_zero:
-        exp = 0;
-        frac = 0;
-        break;
-
-    case float_class_inf:
-    do_inf:
-        assert(!parm->arm_althp);
-        exp = exp_max;
-        frac = 0;
-        break;
-
-    case float_class_qnan:
+    switch (a.cls) {
     case float_class_snan:
-        assert(!parm->arm_althp);
-        exp = exp_max;
-        frac >>= parm->frac_shift;
+        s->float_exception_flags |= float_flag_invalid;
+        a = parts_silence_nan(a, s);
+        /* fall through */
+    case float_class_qnan:
+        if (s->default_nan_mode) {
+            return parts_default_nan(s);
+        }
         break;
 
     default:
         g_assert_not_reached();
     }
-
-    float_raise(flags, s);
-    p.exp = exp;
-    p.frac = frac;
-    return p;
+    return a;
 }
 
+static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
+                                  bool inf_zero, float_status *s)
+{
+    int which;
+
+    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
+        s->float_exception_flags |= float_flag_invalid;
+    }
+
+    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+
+    if (s->default_nan_mode) {
+        /* Note that this check is after pickNaNMulAdd so that function
+         * has an opportunity to set the Invalid flag.
+         */
+        which = 3;
+    }
+
+    switch (which) {
+    case 0:
+        break;
+    case 1:
+        a = b;
+        break;
+    case 2:
+        a = c;
+        break;
+    case 3:
+        return parts_default_nan(s);
+    default:
+        g_assert_not_reached();
+    }
+
+    if (is_snan(a.cls)) {
+        return parts_silence_nan(a, s);
+    }
+    return a;
+}
+
+#define FUNC(X)           X
+#define FRAC_TYPE         uint64_t
+#define PARTS_TYPE        FloatParts
+
+#define HI(P)             (P)
+#define LO(P)             (P)
+#define ZERO              0
+#define ONE               1
+#define MONE              -1
+
+#define ADD(P1, P2)       ((P1) + (P2))
+#define ADDI(P, I)        ((P) + (I))
+#define CLZ(P)            clz64(P)
+#define EQ0(P)            ((P) == 0)
+#define EQ(P1, P2)        ((P1) == (P2))
+#define GEU(P1, P2)       ((P1) >= (P2))
+#define OR(P1, P2)        ((P1) | (P2))
+#define SHL(P, C)         ((P) << (C))
+#define SHR(P, C)         ((P) >> (C))
+#define SHR_JAM(P, C) \
+    ({ uint64_t _r; shift64RightJamming((P), (C), &_r); _r; })
+#define SUB(P1, P2)       ((P1) - (P2))
+
+#include "softfloat-parts.c.inc"
+
+#undef FUNC
+#undef FRAC_TYPE
+#undef PARTS_TYPE
+#undef HI
+#undef LO
+#undef ZERO
+#undef MONE
+#undef ONE
+#undef ADD
+#undef ADDI
+#undef CLZ
+#undef EQ0
+#undef EQ
+#undef GEU
+#undef OR
+#undef SHL
+#undef SHR
+#undef SHR_JAM
+#undef SUB
+
 /* Explicit FloatFmt version */
 static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
                                             const FloatFmt *params)
@@ -889,174 +807,6 @@ static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
     return float64_pack_raw(round_canonical(p, s, &float64_params));
 }
 
-static FloatParts return_nan(FloatParts a, float_status *s)
-{
-    switch (a.cls) {
-    case float_class_snan:
-        s->float_exception_flags |= float_flag_invalid;
-        a = parts_silence_nan(a, s);
-        /* fall through */
-    case float_class_qnan:
-        if (s->default_nan_mode) {
-            return parts_default_nan(s);
-        }
-        break;
-
-    default:
-        g_assert_not_reached();
-    }
-    return a;
-}
-
-static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
-{
-    if (is_snan(a.cls) || is_snan(b.cls)) {
-        s->float_exception_flags |= float_flag_invalid;
-    }
-
-    if (s->default_nan_mode) {
-        return parts_default_nan(s);
-    } else {
-        if (pickNaN(a.cls, b.cls,
-                    a.frac > b.frac ||
-                    (a.frac == b.frac && a.sign < b.sign), s)) {
-            a = b;
-        }
-        if (is_snan(a.cls)) {
-            return parts_silence_nan(a, s);
-        }
-    }
-    return a;
-}
-
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
-                                  bool inf_zero, float_status *s)
-{
-    int which;
-
-    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
-        s->float_exception_flags |= float_flag_invalid;
-    }
-
-    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
-
-    if (s->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        which = 3;
-    }
-
-    switch (which) {
-    case 0:
-        break;
-    case 1:
-        a = b;
-        break;
-    case 2:
-        a = c;
-        break;
-    case 3:
-        return parts_default_nan(s);
-    default:
-        g_assert_not_reached();
-    }
-
-    if (is_snan(a.cls)) {
-        return parts_silence_nan(a, s);
-    }
-    return a;
-}
-
-/*
- * Returns the result of adding or subtracting the values of the
- * floating-point values `a' and `b'. The operation is performed
- * according to the IEC/IEEE Standard for Binary Floating-Point
- * Arithmetic.
- */
-
-static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
-                                float_status *s)
-{
-    bool a_sign = a.sign;
-    bool b_sign = b.sign ^ subtract;
-
-    if (a_sign != b_sign) {
-        /* Subtraction */
-
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
-            if (a.exp > b.exp || (a.exp == b.exp && a.frac >= b.frac)) {
-                shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
-                a.frac = a.frac - b.frac;
-            } else {
-                shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
-                a.frac = b.frac - a.frac;
-                a.exp = b.exp;
-                a_sign ^= 1;
-            }
-
-            if (a.frac == 0) {
-                a.cls = float_class_zero;
-                a.sign = s->float_rounding_mode == float_round_down;
-            } else {
-                int shift = clz64(a.frac) - 1;
-                a.frac = a.frac << shift;
-                a.exp = a.exp - shift;
-                a.sign = a_sign;
-            }
-            return a;
-        }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return pick_nan(a, b, s);
-        }
-        if (a.cls == float_class_inf) {
-            if (b.cls == float_class_inf) {
-                float_raise(float_flag_invalid, s);
-                return parts_default_nan(s);
-            }
-            return a;
-        }
-        if (a.cls == float_class_zero && b.cls == float_class_zero) {
-            a.sign = s->float_rounding_mode == float_round_down;
-            return a;
-        }
-        if (a.cls == float_class_zero || b.cls == float_class_inf) {
-            b.sign = a_sign ^ 1;
-            return b;
-        }
-        if (b.cls == float_class_zero) {
-            return a;
-        }
-    } else {
-        /* Addition */
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
-            if (a.exp > b.exp) {
-                shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
-            } else if (a.exp < b.exp) {
-                shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
-                a.exp = b.exp;
-            }
-            a.frac += b.frac;
-            if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
-                shift64RightJamming(a.frac, 1, &a.frac);
-                a.exp += 1;
-            }
-            return a;
-        }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return pick_nan(a, b, s);
-        }
-        if (a.cls == float_class_inf || b.cls == float_class_zero) {
-            return a;
-        }
-        if (b.cls == float_class_inf || a.cls == float_class_zero) {
-            b.sign = b_sign;
-            return b;
-        }
-    }
-    g_assert_not_reached();
-}
-
 /*
  * Returns the result of adding or subtracting the floating-point
  * values `a' and `b'. The operation is performed according to the
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
new file mode 100644
index 0000000000..49bde45521
--- /dev/null
+++ b/fpu/softfloat-parts.c.inc
@@ -0,0 +1,327 @@
+/*
+ * QEMU float support
+ *
+ * The code in this source file is derived from release 2a of the SoftFloat
+ * IEC/IEEE Floating-point Arithmetic Package. Those parts of the code (and
+ * some later contributions) are provided under that license, as detailed below.
+ * It has subsequently been modified by contributors to the QEMU Project,
+ * so some portions are provided under:
+ *  the SoftFloat-2a license
+ *  the BSD license
+ *  GPL-v2-or-later
+ *
+ * Any future contributions to this file after December 1st 2014 will be
+ * taken to be licensed under the Softfloat-2a license unless specifically
+ * indicated otherwise.
+ */
+
+static PARTS_TYPE
+FUNC(pick_nan)(PARTS_TYPE a, PARTS_TYPE b, float_status *status)
+{
+    bool a_larger_sig;
+
+    if (is_snan(a.cls) || is_snan(b.cls)) {
+        float_raise(float_flag_invalid, status);
+    }
+
+    if (status->default_nan_mode) {
+        return FUNC(parts_default_nan)(status);
+    }
+
+    if (EQ(a.frac, b.frac)) {
+        a_larger_sig = a.sign < b.sign;
+    } else {
+        a_larger_sig = GEU(a.frac, b.frac);
+    }
+
+    if (pickNaN(a.cls, b.cls, a_larger_sig, status)) {
+        a = b;
+    }
+    if (is_snan(a.cls)) {
+        return FUNC(parts_silence_nan)(a, status);
+    }
+    return a;
+}
+
+/* Canonicalize EXP and FRAC, setting CLS.  */
+static PARTS_TYPE
+FUNC(sf_canonicalize)(PARTS_TYPE p, const FloatFmt *parm, float_status *status)
+{
+    if (p.exp == 0) {
+        if (likely(EQ0(p.frac))) {
+            p.cls = float_class_zero;
+        } else if (status->flush_inputs_to_zero) {
+            float_raise(float_flag_input_denormal, status);
+            p.cls = float_class_zero;
+            p.frac = ZERO;
+        } else {
+            int shift = CLZ(p.frac) - 1;
+            p.cls = float_class_normal;
+            p.exp = parm->frac_shift - parm->exp_bias - shift + 1;
+            p.frac = SHL(p.frac, shift);
+        }
+    } else if (likely(p.exp < parm->exp_max) || parm->arm_althp) {
+        p.cls = float_class_normal;
+        p.exp -= parm->exp_bias;
+        /* Set implicit bit. */
+        p.frac = OR(p.frac, SHL(ONE, parm->frac_size));
+        p.frac = SHL(p.frac, parm->frac_shift);
+    } else if (likely(EQ0(p.frac))) {
+        p.cls = float_class_inf;
+    } else {
+        p.frac = SHL(p.frac, parm->frac_shift);
+        p.cls = (parts_is_snan_frac(HI(p.frac), status)
+                 ? float_class_snan : float_class_qnan);
+    }
+    return p;
+}
+
+/* Round and uncanonicalize a floating-point number by parts. There
+ * are FRAC_SHIFT bits that may require rounding at the bottom of the
+ * fraction; these bits will be removed. The exponent will be biased
+ * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
+ */
+
+static PARTS_TYPE
+FUNC(round_canonical)(PARTS_TYPE p, float_status *s, const FloatFmt *parm)
+{
+    const int exp_max = parm->exp_max;
+    const int frac_shift = parm->frac_shift;
+    const uint64_t frac_lsb = 1ull << frac_shift;
+    const uint64_t frac_lsbm1 = 1ull << (frac_shift - 1);
+    const uint64_t round_mask = frac_lsb - 1;
+    const uint64_t roundeven_mask = round_mask | frac_lsb;
+    int flags = 0;
+
+    switch (p.cls) {
+    case float_class_normal:
+        {
+            bool overflow_norm;
+            uint64_t inc, frac_lo;
+            int exp;
+
+            frac_lo = LO(p.frac);
+            switch (s->float_rounding_mode) {
+            case float_round_nearest_even:
+                overflow_norm = false;
+                inc = ((frac_lo & roundeven_mask) != frac_lsbm1
+                       ? frac_lsbm1 : 0);
+                break;
+            case float_round_ties_away:
+                overflow_norm = false;
+                inc = frac_lsbm1;
+                break;
+            case float_round_to_zero:
+                overflow_norm = true;
+                inc = 0;
+                break;
+            case float_round_up:
+                inc = p.sign ? 0 : round_mask;
+                overflow_norm = p.sign;
+                break;
+            case float_round_down:
+                inc = p.sign ? round_mask : 0;
+                overflow_norm = !p.sign;
+                break;
+            case float_round_to_odd:
+                overflow_norm = true;
+                inc = frac_lo & frac_lsb ? 0 : round_mask;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            exp = p.exp + parm->exp_bias;
+            if (likely(exp > 0)) {
+                if (frac_lo & round_mask) {
+                    flags |= float_flag_inexact;
+                    p.frac = ADDI(p.frac, inc);
+                    if (HI(p.frac) & DECOMPOSED_OVERFLOW_BIT) {
+                        p.frac = SHR(p.frac, 1);
+                        exp++;
+                    }
+                }
+                p.frac = SHR(p.frac, frac_shift);
+
+                if (parm->arm_althp) {
+                    /* ARM Alt HP eschews Inf and NaN for a wider exponent.  */
+                    if (unlikely(exp > exp_max)) {
+                        /* Overflow.  Return the maximum normal.  */
+                        flags = float_flag_invalid;
+                        exp = exp_max;
+                        p.frac = MONE;
+                    }
+                } else if (unlikely(exp >= exp_max)) {
+                    flags |= float_flag_overflow | float_flag_inexact;
+                    if (overflow_norm) {
+                        exp = exp_max - 1;
+                        p.frac = MONE;
+                    } else {
+                        p.cls = float_class_inf;
+                        goto do_inf;
+                    }
+                }
+            } else if (s->flush_to_zero) {
+                flags |= float_flag_output_denormal;
+                p.cls = float_class_zero;
+                goto do_zero;
+            } else {
+                bool is_tiny = s->tininess_before_rounding || exp < 0;
+                if (!is_tiny) {
+                    FRAC_TYPE frac_inc = ADDI(p.frac, inc);
+                    if (HI(frac_inc) & DECOMPOSED_OVERFLOW_BIT) {
+                        is_tiny = true;
+                    }
+                }
+
+                p.frac = SHR_JAM(p.frac, 1 - exp);
+                frac_lo = LO(p.frac);
+
+                if (frac_lo & round_mask) {
+                    /* Need to recompute round-to-even / round-to-odd.  */
+                    switch (s->float_rounding_mode) {
+                    case float_round_nearest_even:
+                        inc = ((frac_lo & roundeven_mask) != frac_lsbm1
+                               ? frac_lsbm1 : 0);
+                        break;
+                    case float_round_to_odd:
+                        inc = frac_lo & frac_lsb ? 0 : round_mask;
+                        break;
+                    default:
+                        break;
+                    }
+                    flags |= float_flag_inexact;
+                    p.frac = ADDI(p.frac, inc);
+                }
+
+                exp = (HI(p.frac) & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
+                p.frac = SHR(p.frac, frac_shift);
+
+                if (is_tiny && (flags & float_flag_inexact)) {
+                    flags |= float_flag_underflow;
+                }
+                if (exp == 0 && EQ0(p.frac)) {
+                    p.cls = float_class_zero;
+                }
+            }
+            p.exp = exp;
+        }
+        break;
+
+    case float_class_zero:
+    do_zero:
+        p.exp = 0;
+        p.frac = ZERO;
+        break;
+
+    case float_class_inf:
+    do_inf:
+        g_assert(!parm->arm_althp);
+        p.exp = exp_max;
+        p.frac = ZERO;
+        break;
+
+    case float_class_qnan:
+    case float_class_snan:
+        g_assert(!parm->arm_althp);
+        p.exp = exp_max;
+        p.frac = SHR(p.frac, parm->frac_shift);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    float_raise(flags, s);
+    return p;
+}
+
+/*
+ * Returns the result of adding or subtracting the values of the
+ * floating-point values `a' and `b'. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ */
+
+static PARTS_TYPE
+FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
+                    bool subtract, float_status *s)
+{
+    bool a_sign = a.sign;
+    bool b_sign = b.sign ^ subtract;
+
+    if (a_sign != b_sign) {
+        /* Subtraction */
+
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
+                b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+                a.frac = SUB(a.frac, b.frac);
+            } else {
+                a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+                a.frac = SUB(b.frac, a.frac);
+                a.exp = b.exp;
+                a_sign ^= 1;
+            }
+
+            if (EQ0(a.frac)) {
+                a.cls = float_class_zero;
+                a.sign = s->float_rounding_mode == float_round_down;
+            } else {
+                int shift = CLZ(a.frac) - 1;
+                a.frac = SHL(a.frac, shift);
+                a.exp = a.exp - shift;
+                a.sign = a_sign;
+            }
+            return a;
+        }
+        if (is_nan(a.cls) || is_nan(b.cls)) {
+            return FUNC(pick_nan)(a, b, s);
+        }
+        if (a.cls == float_class_inf) {
+            if (b.cls == float_class_inf) {
+                float_raise(float_flag_invalid, s);
+                return FUNC(parts_default_nan)(s);
+            }
+            return a;
+        }
+        if (a.cls == float_class_zero && b.cls == float_class_zero) {
+            a.sign = s->float_rounding_mode == float_round_down;
+            return a;
+        }
+        if (a.cls == float_class_zero || b.cls == float_class_inf) {
+            b.sign = a_sign ^ 1;
+            return b;
+        }
+        if (b.cls == float_class_zero) {
+            return a;
+        }
+    } else {
+        /* Addition */
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            if (a.exp > b.exp) {
+                b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+            } else if (a.exp < b.exp) {
+                a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+                a.exp = b.exp;
+            }
+            a.frac = ADD(a.frac, b.frac);
+            if (HI(a.frac) & DECOMPOSED_OVERFLOW_BIT) {
+                a.frac = SHR_JAM(a.frac, 1);
+                a.exp += 1;
+            }
+            return a;
+        }
+        if (is_nan(a.cls) || is_nan(b.cls)) {
+            return FUNC(pick_nan)(a, b, s);
+        }
+        if (a.cls == float_class_inf || b.cls == float_class_zero) {
+            return a;
+        }
+        if (b.cls == float_class_inf || a.cls == float_class_zero) {
+            b.sign = b_sign;
+            return b;
+        }
+    }
+    g_assert_not_reached();
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 12/15] softfloat: Streamline FloatFmt
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (10 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 11/15] Test split to softfloat-parts.c.inc Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 13/15] Test float128_addsub Richard Henderson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The fields being removed are now computed in round_canonical.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3651f4525d..1bd21435e7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -527,10 +527,6 @@ typedef struct {
  *   exp_max: the maximum normalised exponent
  *   frac_size: the size of the fraction field
  *   frac_shift: shift to normalise the fraction with DECOMPOSED_BINARY_POINT
- * The following are computed based the size of fraction
- *   frac_lsb: least significant bit of fraction
- *   frac_lsbm1: the bit below the least significant bit (for rounding)
- *   round_mask/roundeven_mask: masks used for rounding
  * The following optional modifiers are available:
  *   arm_althp: handle ARM Alternative Half Precision
  */
@@ -540,10 +536,6 @@ typedef struct {
     int exp_max;
     int frac_size;
     int frac_shift;
-    uint64_t frac_lsb;
-    uint64_t frac_lsbm1;
-    uint64_t round_mask;
-    uint64_t roundeven_mask;
     bool arm_althp;
 } FloatFmt;
 
@@ -553,11 +545,7 @@ typedef struct {
     .exp_bias       = ((1 << E) - 1) >> 1,                           \
     .exp_max        = (1 << E) - 1,                                  \
     .frac_size      = F,                                             \
-    .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
-    .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
-    .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
-    .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
-    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
+    .frac_shift     = DECOMPOSED_BINARY_POINT - F
 
 static const FloatFmt float16_params = {
     FLOAT_PARAMS(5, 10)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 13/15] Test float128_addsub
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (11 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 12/15] softfloat: Streamline FloatFmt Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats Richard Henderson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

---
 fpu/softfloat.c                | 310 +++++++++++----------------------
 fpu/softfloat-specialize.c.inc |  33 ++++
 2 files changed, 137 insertions(+), 206 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1bd21435e7..294c573fb9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -517,6 +517,14 @@ typedef struct {
     bool sign;
 } FloatParts;
 
+/* Similar for float128.  */
+typedef struct {
+    Int128 frac;
+    int32_t exp;
+    FloatClass cls;
+    bool sign;
+} FloatParts128;
+
 #define DECOMPOSED_BINARY_POINT    (64 - 2)
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
 #define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
@@ -540,13 +548,20 @@ typedef struct {
 } FloatFmt;
 
 /* Expand fields based on the size of exponent and fraction */
-#define FLOAT_PARAMS(E, F)                                           \
+#define FLOAT_PARAMS1(E, F)                                          \
     .exp_size       = E,                                             \
     .exp_bias       = ((1 << E) - 1) >> 1,                           \
     .exp_max        = (1 << E) - 1,                                  \
-    .frac_size      = F,                                             \
+    .frac_size      = F
+
+#define FLOAT_PARAMS(E, F)                                           \
+    FLOAT_PARAMS1(E, F),                                             \
     .frac_shift     = DECOMPOSED_BINARY_POINT - F
 
+#define FLOAT128_PARAMS(E, F)                                        \
+    FLOAT_PARAMS1(E, F),                                             \
+    .frac_shift     = DECOMPOSED_BINARY_POINT + 64 - F
+
 static const FloatFmt float16_params = {
     FLOAT_PARAMS(5, 10)
 };
@@ -568,6 +583,10 @@ static const FloatFmt float64_params = {
     FLOAT_PARAMS(11, 52)
 };
 
+static const FloatFmt float128_params = {
+    FLOAT128_PARAMS(15, 112)
+};
+
 /* Unpack a float to parts, but do not canonicalize.  */
 static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
 {
@@ -742,6 +761,51 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 #undef SHR_JAM
 #undef SUB
 
+#define FUNC(X)           X##128
+#define FRAC_TYPE         Int128
+#define PARTS_TYPE        FloatParts128
+
+#define HI(P)             int128_gethi(P)
+#define LO(P)             int128_getlo(P)
+#define ZERO              int128_zero()
+#define MONE              int128_make128(-1, -1)
+#define ONE               int128_one()
+
+#define ADD(P1, P2)       int128_add(P1, P2)
+#define ADDI(P, I)        int128_add(P, int128_make64(I))
+#define CLZ(P)            int128_clz(P)
+#define EQ0(P)            (!int128_nz(P))
+#define EQ(P1, P2)        int128_eq(P1, P2)
+#define GEU(P1, P2)       int128_geu(P1, P2)
+#define OR(P1, P2)        int128_or(P1, P2)
+#define SHL(P, C)         int128_shl(P, C)
+#define SHR(P, C)         int128_shr(P, C)
+#define SHR_JAM(P, C)     \
+    ({ uint64_t _h, _l; shift128RightJamming(HI(P), LO(P), C, &_h, &_l); \
+       int128_make128(_l, _h); })
+#define SUB(P1, P2)       int128_sub(P1, P2)
+
+#include "softfloat-parts.c.inc"
+
+#undef FUNC
+#undef FRAC_TYPE
+#undef PARTS_TYPE
+#undef HI
+#undef LO
+#undef ZERO
+#undef MONE
+#undef ONE
+#undef ADD
+#undef ADDI
+#undef CLZ
+#undef EQ0
+#undef EQ
+#undef GEU
+#undef SHL
+#undef SHR
+#undef SHR_JAM
+#undef SUB
+
 /* Explicit FloatFmt version */
 static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
                                             const FloatFmt *params)
@@ -6664,225 +6728,59 @@ float128 float128_round_to_int(float128 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the quadruple-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 addFloat128Sigs(float128 a, float128 b, bool zSign,
-                                float_status *status)
+static FloatParts128 float128_unpack_raw(float128 f)
 {
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2;
-    int32_t expDiff;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    expDiff = aExp - bExp;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0x7FFF ) {
-            if (aSig0 | aSig1) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig0 |= UINT64_C(0x0001000000000000);
-        }
-        shift128ExtraRightJamming(
-            bSig0, bSig1, 0, expDiff, &bSig0, &bSig1, &zSig2 );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0x7FFF ) {
-            if (bSig0 | bSig1) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return packFloat128( zSign, 0x7FFF, 0, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig0 |= UINT64_C(0x0001000000000000);
-        }
-        shift128ExtraRightJamming(
-            aSig0, aSig1, 0, - expDiff, &aSig0, &aSig1, &zSig2 );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0x7FFF ) {
-            if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return a;
-        }
-        add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (zSig0 | zSig1) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat128(zSign, 0, 0, 0);
-            }
-            return packFloat128( zSign, 0, zSig0, zSig1 );
-        }
-        zSig2 = 0;
-        zSig0 |= UINT64_C(0x0002000000000000);
-        zExp = aExp;
-        goto shiftRight1;
-    }
-    aSig0 |= UINT64_C(0x0001000000000000);
-    add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-    --zExp;
-    if ( zSig0 < UINT64_C(0x0002000000000000) ) goto roundAndPack;
-    ++zExp;
- shiftRight1:
-    shift128ExtraRightJamming(
-        zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 );
- roundAndPack:
-    return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
+    const int f_size = float128_params.frac_size;
+    const int e_size = float128_params.exp_size;
 
+    return (FloatParts128) {
+        .cls = float_class_unclassified,
+        .sign = extract64(f.high, f_size + e_size - 64, 1),
+        .exp = extract64(f.high, f_size - 64, e_size),
+        .frac = int128_make128(f.low, extract64(f.high, 0, f_size - 64))
+    };
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the quadruple-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 subFloat128Sigs(float128 a, float128 b, bool zSign,
-                                float_status *status)
+static float128 float128_pack_raw(FloatParts128 p)
 {
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1;
-    int32_t expDiff;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    expDiff = aExp - bExp;
-    shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 );
-    shortShift128Left( bSig0, bSig1, 14, &bSig0, &bSig1 );
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float128_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig0 < aSig0 ) goto aBigger;
-    if ( aSig0 < bSig0 ) goto bBigger;
-    if ( bSig1 < aSig1 ) goto aBigger;
-    if ( aSig1 < bSig1 ) goto bBigger;
-    return packFloat128(status->float_rounding_mode == float_round_down,
-                        0, 0, 0);
- bExpBigger:
-    if ( bExp == 0x7FFF ) {
-        if (bSig0 | bSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return packFloat128( zSign ^ 1, 0x7FFF, 0, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig0 |= UINT64_C(0x4000000000000000);
-    }
-    shift128RightJamming( aSig0, aSig1, - expDiff, &aSig0, &aSig1 );
-    bSig0 |= UINT64_C(0x4000000000000000);
- bBigger:
-    sub128( bSig0, bSig1, aSig0, aSig1, &zSig0, &zSig1 );
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0x7FFF ) {
-        if (aSig0 | aSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig0 |= UINT64_C(0x4000000000000000);
-    }
-    shift128RightJamming( bSig0, bSig1, expDiff, &bSig0, &bSig1 );
-    aSig0 |= UINT64_C(0x4000000000000000);
- aBigger:
-    sub128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat128(zSign, zExp - 14, zSig0, zSig1,
-                                         status);
+    const int f_size = float128_params.frac_size;
+    const int e_size = float128_params.exp_size;
+    uint64_t h = int128_gethi(p.frac);
+    uint64_t l = int128_getlo(p.frac);
 
+    h = deposit64(h, f_size - 64, e_size, p.exp);
+    h = deposit64(h, f_size + e_size - 64, 1, p.sign);
+    return make_float128(h, l);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the quadruple-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
+static FloatParts128 float128_unpack_canonical(float128 f, float_status *s)
+{
+    return sf_canonicalize128(float128_unpack_raw(f), &float128_params, s);
+}
+
+static float128 float128_round_pack_canonical(FloatParts128 p, float_status *s)
+{
+    return float128_pack_raw(round_canonical128(p, s, &float128_params));
+}
+
+static float128 QEMU_FLATTEN
+float128_addsub(float128 a, float128 b, float_status *status, bool subtract)
+{
+    FloatParts128 pa = float128_unpack_canonical(a, status);
+    FloatParts128 pb = float128_unpack_canonical(b, status);
+    FloatParts128 pr = addsub_floats128(pa, pb, subtract, status);
+
+    return float128_round_pack_canonical(pr, status);
+}
 
 float128 float128_add(float128 a, float128 b, float_status *status)
 {
-    bool aSign, bSign;
-
-    aSign = extractFloat128Sign( a );
-    bSign = extractFloat128Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat128Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat128Sigs(a, b, aSign, status);
-    }
-
+    return float128_addsub(a, b, status, false);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the quadruple-precision floating-point
-| values `a' and `b'.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
 float128 float128_sub(float128 a, float128 b, float_status *status)
 {
-    bool aSign, bSign;
-
-    aSign = extractFloat128Sign( a );
-    bSign = extractFloat128Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat128Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat128Sigs(a, b, aSign, status);
-    }
-
+    return float128_addsub(a, b, status, true);
 }
 
 /*----------------------------------------------------------------------------
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 0fe8ce408d..404d38071a 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -169,6 +169,23 @@ static FloatParts parts_default_nan(float_status *status)
     };
 }
 
+static FloatParts128 parts_default_nan128(float_status *status)
+{
+    FloatParts p = parts_default_nan(status);
+
+    /*
+     * Extrapolate from the choices made by parts_default_nan to fill
+     * in the quad-floating format.  Copy the high bits across unchanged,
+     * and replicate the lsb to all lower bits.
+     */
+    return (FloatParts128) {
+        .cls = float_class_qnan,
+        .sign = p.sign,
+        .exp = INT_MAX,
+        .frac = int128_make128(-(p.frac & 1), p.frac)
+    };
+}
+
 /*----------------------------------------------------------------------------
 | Returns a quiet NaN from a signalling NaN for the deconstructed
 | floating-point parts.
@@ -191,6 +208,22 @@ static FloatParts parts_silence_nan(FloatParts a, float_status *status)
     return a;
 }
 
+static FloatParts128 parts_silence_nan128(FloatParts128 a, float_status *s)
+{
+    g_assert(!no_signaling_nans(s));
+#if defined(TARGET_HPPA)
+    g_assert_not_reached();
+#endif
+    if (snan_bit_is_one(s)) {
+        return parts_default_nan128(s);
+    } else {
+        Int128 t = int128_make128(0, 1ULL << (DECOMPOSED_BINARY_POINT - 1));
+        a.frac = int128_or(a.frac, t);
+    }
+    a.cls = float_class_qnan;
+    return a;
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated extended double-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (12 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 13/15] Test float128_addsub Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  4:51 ` [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent Richard Henderson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Testing more than one class at a time is better done with masks.
Sort a few case combinations before the NaN check, which should
be assumed to be least probable.  Share the pick_nan call between
the add and subtract cases.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-parts.c.inc | 70 +++++++++++++++++++++------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 49bde45521..d2b6454903 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -247,13 +247,13 @@ static PARTS_TYPE
 FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
                     bool subtract, float_status *s)
 {
-    bool a_sign = a.sign;
     bool b_sign = b.sign ^ subtract;
+    int ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
 
-    if (a_sign != b_sign) {
+    if (a.sign != b_sign) {
         /* Subtraction */
 
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        if (likely(ab_mask == float_cmask_normal)) {
             if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
                 b.frac = SHR_JAM(b.frac, a.exp - b.exp);
                 a.frac = SUB(a.frac, b.frac);
@@ -261,7 +261,7 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
                 a.frac = SHR_JAM(a.frac, b.exp - a.exp);
                 a.frac = SUB(b.frac, a.frac);
                 a.exp = b.exp;
-                a_sign ^= 1;
+                a.sign ^= 1;
             }
 
             if (EQ0(a.frac)) {
@@ -270,35 +270,37 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
             } else {
                 int shift = CLZ(a.frac) - 1;
                 a.frac = SHL(a.frac, shift);
-                a.exp = a.exp - shift;
-                a.sign = a_sign;
+                a.exp -= shift;
             }
             return a;
         }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return FUNC(pick_nan)(a, b, s);
-        }
-        if (a.cls == float_class_inf) {
-            if (b.cls == float_class_inf) {
-                float_raise(float_flag_invalid, s);
-                return FUNC(parts_default_nan)(s);
-            }
-            return a;
-        }
-        if (a.cls == float_class_zero && b.cls == float_class_zero) {
+
+        /* 0 - 0 */
+        if (ab_mask == float_cmask_zero) {
             a.sign = s->float_rounding_mode == float_round_down;
             return a;
         }
-        if (a.cls == float_class_zero || b.cls == float_class_inf) {
-            b.sign = a_sign ^ 1;
-            return b;
+
+        /* Inf - Inf */
+        if (unlikely(ab_mask == float_cmask_inf)) {
+            float_raise(float_flag_invalid, s);
+            return FUNC(parts_default_nan)(s);
         }
-        if (b.cls == float_class_zero) {
-            return a;
+
+        if (!(ab_mask & float_cmask_anynan)) {
+            if (a.cls == float_class_inf || b.cls == float_class_zero) {
+                return a;
+            }
+            if (b.cls == float_class_inf || a.cls == float_class_zero) {
+                b.sign = a.sign ^ 1;
+                return b;
+            }
+            g_assert_not_reached();
         }
     } else {
         /* Addition */
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+
+        if (likely(ab_mask == float_cmask_normal)) {
             if (a.exp > b.exp) {
                 b.frac = SHR_JAM(b.frac, a.exp - b.exp);
             } else if (a.exp < b.exp) {
@@ -312,16 +314,18 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
             }
             return a;
         }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return FUNC(pick_nan)(a, b, s);
-        }
-        if (a.cls == float_class_inf || b.cls == float_class_zero) {
-            return a;
-        }
-        if (b.cls == float_class_inf || a.cls == float_class_zero) {
-            b.sign = b_sign;
-            return b;
+
+        if (!(ab_mask & float_cmask_anynan)) {
+            if (a.cls == float_class_inf || b.cls == float_class_zero) {
+                return a;
+            }
+            if (b.cls == float_class_inf || a.cls == float_class_zero) {
+                b.sign = b_sign;
+                return b;
+            }
+            g_assert_not_reached();
         }
     }
-    g_assert_not_reached();
+
+    return FUNC(pick_nan)(a, b, s);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (13 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats Richard Henderson
@ 2020-10-21  4:51 ` Richard Henderson
  2020-10-21  5:12 ` [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub no-reply
  2020-10-21 17:46 ` Alex Bennée
  16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21  4:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Rather than compare the fractions before subtracting, do the
subtract and examine the result, possibly negating it.

Looking toward re-using addsub_floats(N**2) for the addition
stage of muladd_floats(N), this will important because of the
longer fraction sizes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           |  4 ++++
 fpu/softfloat-parts.c.inc | 32 ++++++++++++++++++++------------
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 294c573fb9..bf808a1b74 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -732,6 +732,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 #define EQ0(P)            ((P) == 0)
 #define EQ(P1, P2)        ((P1) == (P2))
 #define GEU(P1, P2)       ((P1) >= (P2))
+#define NEG(P)            (-(P))
 #define OR(P1, P2)        ((P1) | (P2))
 #define SHL(P, C)         ((P) << (C))
 #define SHR(P, C)         ((P) >> (C))
@@ -755,6 +756,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 #undef EQ0
 #undef EQ
 #undef GEU
+#undef NEG
 #undef OR
 #undef SHL
 #undef SHR
@@ -777,6 +779,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 #define EQ0(P)            (!int128_nz(P))
 #define EQ(P1, P2)        int128_eq(P1, P2)
 #define GEU(P1, P2)       int128_geu(P1, P2)
+#define NEG(P)            int128_neg(P)
 #define OR(P1, P2)        int128_or(P1, P2)
 #define SHL(P, C)         int128_shl(P, C)
 #define SHR(P, C)         int128_shr(P, C)
@@ -801,6 +804,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 #undef EQ0
 #undef EQ
 #undef GEU
+#undef NEG
 #undef SHL
 #undef SHR
 #undef SHR_JAM
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index d2b6454903..9762cf8b66 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -254,29 +254,37 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
         /* Subtraction */
 
         if (likely(ab_mask == float_cmask_normal)) {
-            if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
-                b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+            int shift, diff_exp = a.exp - b.exp;
+
+            if (diff_exp > 0) {
+                b.frac = SHR_JAM(b.frac, diff_exp);
                 a.frac = SUB(a.frac, b.frac);
-            } else {
-                a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+            } else if (diff_exp < 0) {
+                a.frac = SHR_JAM(a.frac, -diff_exp);
                 a.frac = SUB(b.frac, a.frac);
                 a.exp = b.exp;
                 a.sign ^= 1;
+            } else {
+                a.frac = SUB(b.frac, a.frac);
+                /* a.frac < b.frac results in carry into the overflow bit. */
+                if (HI(a.frac) & DECOMPOSED_OVERFLOW_BIT) {
+                    a.frac = NEG(a.frac);
+                    a.sign ^= 1;
+                } else if (EQ0(a.frac)) {
+                    a.cls = float_class_zero;
+                    goto sub_zero;
+                }
             }
 
-            if (EQ0(a.frac)) {
-                a.cls = float_class_zero;
-                a.sign = s->float_rounding_mode == float_round_down;
-            } else {
-                int shift = CLZ(a.frac) - 1;
-                a.frac = SHL(a.frac, shift);
-                a.exp -= shift;
-            }
+            shift = CLZ(a.frac) - 1;
+            a.frac = SHL(a.frac, shift);
+            a.exp -= shift;
             return a;
         }
 
         /* 0 - 0 */
         if (ab_mask == float_cmask_zero) {
+ sub_zero:
             a.sign = s->float_rounding_mode == float_round_down;
             return a;
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (14 preceding siblings ...)
  2020-10-21  4:51 ` [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent Richard Henderson
@ 2020-10-21  5:12 ` no-reply
  2020-10-21 17:46 ` Alex Bennée
  16 siblings, 0 replies; 30+ messages in thread
From: no-reply @ 2020-10-21  5:12 UTC (permalink / raw)
  To: richard.henderson; +Cc: alex.bennee, qemu-devel

Patchew URL: https://patchew.org/QEMU/20201021045149.1582203-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201021045149.1582203-1-richard.henderson@linaro.org
Subject: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20201021045149.1582203-1-richard.henderson@linaro.org -> patchew/20201021045149.1582203-1-richard.henderson@linaro.org
Switched to a new branch 'test'
ad120c1 softfloat: Improve subtraction of equal exponent
12eb5a4 softfloat: Use float_cmask for addsub_floats
a04ff7d Test float128_addsub
fc9537e softfloat: Streamline FloatFmt
979beb6 Test split to softfloat-parts.c.inc
4317991 softfloat: Inline float_raise
4689bd2 softfloat: Add float_cmask and constants
b1141ee softfloat: Tidy a * b + inf return
197273c softfloat: Use int128.h for some operations
aa4afa2 softfloat: Use mulu64 for mul64To128
017d276 qemu/int128: Add int128_geu
71dd5f1 qemu/int128: Add int128_shr
9144df9 qemu/int128: Rename int128_rshift, int128_lshift
b6c9afb qemu/int128: Add int128_clz, int128_ctz
0ceff9a qemu/int128: Add int128_or

=== OUTPUT BEGIN ===
1/15 Checking commit 0ceff9a14aa6 (qemu/int128: Add int128_or)
2/15 Checking commit b6c9afb58357 (qemu/int128: Add int128_clz, int128_ctz)
3/15 Checking commit 9144df990b17 (qemu/int128: Rename int128_rshift, int128_lshift)
4/15 Checking commit 71dd5f157a39 (qemu/int128: Add int128_shr)
5/15 Checking commit 017d27627112 (qemu/int128: Add int128_geu)
6/15 Checking commit aa4afa22ee78 (softfloat: Use mulu64 for mul64To128)
7/15 Checking commit 197273c0aeda (softfloat: Use int128.h for some operations)
8/15 Checking commit b1141eecc368 (softfloat: Tidy a * b + inf return)
9/15 Checking commit 4689bd26fd66 (softfloat: Add float_cmask and constants)
10/15 Checking commit 4317991dcbd8 (softfloat: Inline float_raise)
11/15 Checking commit 979beb676e89 (Test split to softfloat-parts.c.inc)
WARNING: Block comments use a leading /* on a separate line
#557: FILE: fpu/softfloat.c:685:
+        /* Note that this check is after pickNaNMulAdd so that function

ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 786 lines checked

Patch 11/15 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/15 Checking commit fc9537e66b73 (softfloat: Streamline FloatFmt)
13/15 Checking commit a04ff7dcb003 (Test float128_addsub)
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 405 lines checked

Patch 13/15 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

14/15 Checking commit 12eb5a486720 (softfloat: Use float_cmask for addsub_floats)
15/15 Checking commit ad120c1d1ae8 (softfloat: Improve subtraction of equal exponent)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201021045149.1582203-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
  2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
                   ` (15 preceding siblings ...)
  2020-10-21  5:12 ` [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub no-reply
@ 2020-10-21 17:46 ` Alex Bennée
  2020-10-21 17:53   ` Richard Henderson
  16 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Hi Alex,
>
> Here's my first adjustment to your conversion for 128-bit floats.
>
> The Idea is to use a set of macros and an include file so that we
> can re-use the same large chunk of code that performs the basic
> operations on various fraction lengths.  It's ugly, but without
> proper language support it seems to be less ugly than most.
>
> While I've just gone and added lots of stuff to int128... I have
> had another idea, half-baked because I'm tired and it's late:
>
> typedef struct {
>     FloatClass cls;
>     int exp;
>     bool sign;
>     uint64_t frac[];
> } FloatPartsBase;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac;
> } FloatParts64;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac_hi, frac_lo;
> } FloatParts128;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac[4]; /* big endian word ordering */
> } FloatParts256;
>
> This layout, with the big-endian ordering, means that storage
> can be shared between them, just by ignoring the least significant
> words of the fraction as needed.  Which may make muladd more
> understandable.

Would the big-endian formatting hamper the compiler on x86 where it can
do extra wide operations?

I am still seeing a multi MFlop drop in performance when converting the
float128_addsub to the new code. If this allows the compiler to do
better on the code I can live with it.

>
> E.g.
>
> static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
>                             FloatParts64 *b, FloatParts128 *c, ...)
> {
>     // handle nans
>     // produce 128-bit product into r
>     // handle p vs c special cases.
>     // zero-extend c to 128-bits
>     c->frac[1] = 0;
>     // perform 128-bit fractional addition
>     addsub_floats128(r, c, ...);
>     // fold 128-bit fraction to 64-bit sticky bit.
>     r->frac[0] |= r->frac[1] != 0;
> }
>
> float64 float64_muladd(float64 a, float64 b, float64 c, ...)
> {
>     FloatParts64 pa, pb;
>     FloatParts128 pc, pr;
>
>     float64_unpack_canonical(&pa.base, a, status);
>     float64_unpack_canonical(&pb.base, b, status);
>     float64_unpack_canonical(&pc.base, c, status);
>     muladd_floats64(&pr, &pa, &pb, &pc, flags, status);
>
>     return float64_round_pack_canonical(&pr.base, status);
> }
>
> Similarly, muladd_floats128 would use addsub_floats256.
>
> However, the big-endian word ordering means that Int128
> cannot be used directly; so a set of wrappers are needed.
> If added the Int128 routine just for use here, then it's
> probably easier to bypass Int128 and just code it here.

Are you talking about all our operations? Will we still need to#ifdef
CONFIG_INT128 in the softfloat code?

>
> Thoughts?
>
>
> r~
>
>
> Richard Henderson (15):
>   qemu/int128: Add int128_or
>   qemu/int128: Add int128_clz, int128_ctz
>   qemu/int128: Rename int128_rshift, int128_lshift
>   qemu/int128: Add int128_shr
>   qemu/int128: Add int128_geu
>   softfloat: Use mulu64 for mul64To128
>   softfloat: Use int128.h for some operations
>   softfloat: Tidy a * b + inf return
>   softfloat: Add float_cmask and constants
>   softfloat: Inline float_raise
>   Test split to softfloat-parts.c.inc
>   softfloat: Streamline FloatFmt
>   Test float128_addsub
>   softfloat: Use float_cmask for addsub_floats
>   softfloat: Improve subtraction of equal exponent
>
>  include/fpu/softfloat-macros.h |  89 ++--
>  include/fpu/softfloat.h        |   5 +-
>  include/qemu/int128.h          |  61 ++-
>  fpu/softfloat.c                | 802 ++++++++++-----------------------
>  softmmu/physmem.c              |   4 +-
>  target/ppc/int_helper.c        |   4 +-
>  tests/test-int128.c            |  44 +-
>  fpu/softfloat-parts.c.inc      | 339 ++++++++++++++
>  fpu/softfloat-specialize.c.inc |  45 +-
>  9 files changed, 716 insertions(+), 677 deletions(-)
>  create mode 100644 fpu/softfloat-parts.c.inc


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
  2020-10-21 17:46 ` Alex Bennée
@ 2020-10-21 17:53   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 17:53 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 10/21/20 10:46 AM, Alex Bennée wrote:
>> This layout, with the big-endian ordering, means that storage
>> can be shared between them, just by ignoring the least significant
>> words of the fraction as needed.  Which may make muladd more
>> understandable.
> 
> Would the big-endian formatting hamper the compiler on x86 where it can
> do extra wide operations?

Well, you couldn't just use Int128 in the structure.  But you could write the
helpers via int128_make128/getlo/gethi, which would still get the compiler
expansion.


>> However, the big-endian word ordering means that Int128
>> cannot be used directly; so a set of wrappers are needed.
>> If added the Int128 routine just for use here, then it's
>> probably easier to bypass Int128 and just code it here.
> 
> Are you talking about all our operations? Will we still need to#ifdef
> CONFIG_INT128 in the softfloat code?

If we decline to put the operation into qemu/int128.h, because they're not
generally useful, then yes, we may put those ifdefs into our softfloat code.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-02-14 19:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-21  4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
2020-10-21 17:13   ` Alex Bennée
2020-10-29 15:01   ` Taylor Simpson
2020-10-29 18:09   ` Philippe Mathieu-Daudé
2021-02-14 18:17   ` Philippe Mathieu-Daudé
2020-10-21  4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
2020-10-21 17:13   ` Alex Bennée
2021-02-14 18:17   ` Philippe Mathieu-Daudé
2020-10-21  4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
2020-10-21 17:14   ` Alex Bennée
2021-02-14 18:18   ` Philippe Mathieu-Daudé
2020-10-21  4:51 ` [RFC PATCH 04/15] qemu/int128: Add int128_shr Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
2021-02-14 18:19   ` Philippe Mathieu-Daudé
2020-10-21  4:51 ` [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128 Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 07/15] softfloat: Use int128.h for some operations Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 08/15] softfloat: Tidy a * b + inf return Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 09/15] softfloat: Add float_cmask and constants Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
2020-10-21 17:16   ` Alex Bennée
2021-02-14 18:20   ` Philippe Mathieu-Daudé
2020-10-21  4:51 ` [RFC PATCH 11/15] Test split to softfloat-parts.c.inc Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 12/15] softfloat: Streamline FloatFmt Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 13/15] Test float128_addsub Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats Richard Henderson
2020-10-21  4:51 ` [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent Richard Henderson
2020-10-21  5:12 ` [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub no-reply
2020-10-21 17:46 ` Alex Bennée
2020-10-21 17:53   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).