* [RFC PATCH 01/15] qemu/int128: Add int128_or
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 17:13 ` Alex Bennée
` (3 more replies)
2020-10-21 4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
` (15 subsequent siblings)
16 siblings, 4 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/int128.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 76ea405922..52fc238421 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -58,6 +58,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
return a & b;
}
+static inline Int128 int128_or(Int128 a, Int128 b)
+{
+ return a | b;
+}
+
static inline Int128 int128_rshift(Int128 a, int n)
{
return a >> n;
@@ -208,6 +213,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
return (Int128) { a.lo & b.lo, a.hi & b.hi };
}
+static inline Int128 int128_or(Int128 a, Int128 b)
+{
+ return (Int128) { a.lo | b.lo, a.hi | b.hi };
+}
+
static inline Int128 int128_rshift(Int128 a, int n)
{
int64_t h;
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
@ 2020-10-21 17:13 ` Alex Bennée
2020-10-29 15:01 ` Taylor Simpson
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:13 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFC PATCH 01/15] qemu/int128: Add int128_or
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
2020-10-21 17:13 ` Alex Bennée
@ 2020-10-29 15:01 ` Taylor Simpson
2020-10-29 18:09 ` Philippe Mathieu-Daudé
2021-02-14 18:17 ` Philippe Mathieu-Daudé
3 siblings, 0 replies; 30+ messages in thread
From: Taylor Simpson @ 2020-10-29 15:01 UTC (permalink / raw)
To: Richard Henderson, qemu-devel@nongnu.org; +Cc: alex.bennee@linaro.org
Reviewed by: Taylor Simpson <tsimpson@quicinc.com>
> -----Original Message-----
> From: Qemu-devel <qemu-devel-
> bounces+tsimpson=quicinc.com@nongnu.org> On Behalf Of Richard
> Henderson
> Sent: Tuesday, October 20, 2020 11:52 PM
> To: qemu-devel@nongnu.org
> Cc: alex.bennee@linaro.org
> Subject: [RFC PATCH 01/15] qemu/int128: Add int128_or
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/int128.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 76ea405922..52fc238421 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -58,6 +58,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
> return a & b;
> }
>
> +static inline Int128 int128_or(Int128 a, Int128 b)
> +{
> + return a | b;
> +}
> +
> static inline Int128 int128_rshift(Int128 a, int n)
> {
> return a >> n;
> @@ -208,6 +213,11 @@ static inline Int128 int128_and(Int128 a, Int128 b)
> return (Int128) { a.lo & b.lo, a.hi & b.hi };
> }
>
> +static inline Int128 int128_or(Int128 a, Int128 b)
> +{
> + return (Int128) { a.lo | b.lo, a.hi | b.hi };
> +}
> +
> static inline Int128 int128_rshift(Int128 a, int n)
> {
> int64_t h;
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
2020-10-21 17:13 ` Alex Bennée
2020-10-29 15:01 ` Taylor Simpson
@ 2020-10-29 18:09 ` Philippe Mathieu-Daudé
2021-02-14 18:17 ` Philippe Mathieu-Daudé
3 siblings, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-29 18:09 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee
On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/int128.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 01/15] qemu/int128: Add int128_or
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
` (2 preceding siblings ...)
2020-10-29 18:09 ` Philippe Mathieu-Daudé
@ 2021-02-14 18:17 ` Philippe Mathieu-Daudé
3 siblings, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:17 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee
On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/int128.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 17:13 ` Alex Bennée
2021-02-14 18:17 ` Philippe Mathieu-Daudé
2020-10-21 4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
` (14 subsequent siblings)
16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/int128.h | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 52fc238421..055f202d08 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -1,9 +1,9 @@
#ifndef INT128_H
#define INT128_H
-#ifdef CONFIG_INT128
-#include "qemu/bswap.h"
+#include "qemu/host-utils.h"
+#ifdef CONFIG_INT128
typedef __int128_t Int128;
static inline Int128 int128_make64(uint64_t a)
@@ -328,4 +328,17 @@ static inline void int128_subfrom(Int128 *a, Int128 b)
}
#endif /* CONFIG_INT128 */
+
+static inline int int128_clz(Int128 a)
+{
+ uint64_t h = int128_gethi(a);
+ return h ? clz64(h) : 64 + clz64(int128_getlo(a));
+}
+
+static inline int int128_ctz(Int128 a)
+{
+ uint64_t l = int128_getlo(a);
+ return l ? ctz64(l) : 64 + ctz64(int128_gethi(a));
+}
+
#endif /* INT128_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
2020-10-21 4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
@ 2020-10-21 17:13 ` Alex Bennée
2021-02-14 18:17 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:13 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz
2020-10-21 4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
2020-10-21 17:13 ` Alex Bennée
@ 2021-02-14 18:17 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:17 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee
On 10/21/20 6:51 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/int128.h | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 01/15] qemu/int128: Add int128_or Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 02/15] qemu/int128: Add int128_clz, int128_ctz Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 17:14 ` Alex Bennée
2021-02-14 18:18 ` Philippe Mathieu-Daudé
2020-10-21 4:51 ` [RFC PATCH 04/15] qemu/int128: Add int128_shr Richard Henderson
` (13 subsequent siblings)
16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Change these to sar/shl to emphasize the signed shift.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/int128.h | 8 ++++----
softmmu/physmem.c | 4 ++--
target/ppc/int_helper.c | 4 ++--
tests/test-int128.c | 44 ++++++++++++++++++++---------------------
4 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 055f202d08..167f13ae10 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -63,12 +63,12 @@ static inline Int128 int128_or(Int128 a, Int128 b)
return a | b;
}
-static inline Int128 int128_rshift(Int128 a, int n)
+static inline Int128 int128_sar(Int128 a, int n)
{
return a >> n;
}
-static inline Int128 int128_lshift(Int128 a, int n)
+static inline Int128 int128_shl(Int128 a, int n)
{
return a << n;
}
@@ -218,7 +218,7 @@ static inline Int128 int128_or(Int128 a, Int128 b)
return (Int128) { a.lo | b.lo, a.hi | b.hi };
}
-static inline Int128 int128_rshift(Int128 a, int n)
+static inline Int128 int128_sar(Int128 a, int n)
{
int64_t h;
if (!n) {
@@ -232,7 +232,7 @@ static inline Int128 int128_rshift(Int128 a, int n)
}
}
-static inline Int128 int128_lshift(Int128 a, int n)
+static inline Int128 int128_shl(Int128 a, int n)
{
uint64_t l = a.lo << (n & 63);
if (n >= 64) {
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index e319fb2a1e..7f6e98e7b0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1156,8 +1156,8 @@ static void register_multipage(FlatView *fv,
AddressSpaceDispatch *d = flatview_to_dispatch(fv);
hwaddr start_addr = section->offset_within_address_space;
uint16_t section_index = phys_section_add(&d->map, section);
- uint64_t num_pages = int128_get64(int128_rshift(section->size,
- TARGET_PAGE_BITS));
+ uint64_t num_pages = int128_get64(int128_sar(section->size,
+ TARGET_PAGE_BITS));
assert(num_pages);
phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index b45626f44c..fe569590b4 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1444,7 +1444,7 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
} else { \
index = ((15 - (a & 0xf) + 1) * 8) - size; \
} \
- return int128_getlo(int128_rshift(b->s128, index)) & \
+ return int128_getlo(int128_sar(b->s128, index)) & \
MAKE_64BIT_MASK(0, size); \
}
#else
@@ -1457,7 +1457,7 @@ void helper_vlogefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
} else { \
index = (a & 0xf) * 8; \
} \
- return int128_getlo(int128_rshift(b->s128, index)) & \
+ return int128_getlo(int128_sar(b->s128, index)) & \
MAKE_64BIT_MASK(0, size); \
}
#endif
diff --git a/tests/test-int128.c b/tests/test-int128.c
index b86a3c76e6..9bd6cb59ec 100644
--- a/tests/test-int128.c
+++ b/tests/test-int128.c
@@ -176,34 +176,34 @@ static void test_gt(void)
/* Make sure to test undefined behavior at runtime! */
static void __attribute__((__noinline__)) ATTRIBUTE_NOCLONE
-test_rshift_one(uint32_t x, int n, uint64_t h, uint64_t l)
+test_sar_one(uint32_t x, int n, uint64_t h, uint64_t l)
{
Int128 a = expand(x);
- Int128 r = int128_rshift(a, n);
+ Int128 r = int128_sar(a, n);
g_assert_cmpuint(int128_getlo(r), ==, l);
g_assert_cmpuint(int128_gethi(r), ==, h);
}
-static void test_rshift(void)
+static void test_sar(void)
{
- test_rshift_one(0x00010000U, 64, 0x0000000000000000ULL, 0x0000000000000001ULL);
- test_rshift_one(0x80010000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0x8000000000000001ULL);
- test_rshift_one(0x7FFE0000U, 64, 0x0000000000000000ULL, 0x7FFFFFFFFFFFFFFEULL);
- test_rshift_one(0xFFFE0000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFEULL);
- test_rshift_one(0x00010000U, 60, 0x0000000000000000ULL, 0x0000000000000010ULL);
- test_rshift_one(0x80010000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000010ULL);
- test_rshift_one(0x00018000U, 60, 0x0000000000000000ULL, 0x0000000000000018ULL);
- test_rshift_one(0x80018000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000018ULL);
- test_rshift_one(0x7FFE0000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE0ULL);
- test_rshift_one(0xFFFE0000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE0ULL);
- test_rshift_one(0x7FFE8000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE8ULL);
- test_rshift_one(0xFFFE8000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE8ULL);
- test_rshift_one(0x00018000U, 0, 0x0000000000000001ULL, 0x8000000000000000ULL);
- test_rshift_one(0x80018000U, 0, 0x8000000000000001ULL, 0x8000000000000000ULL);
- test_rshift_one(0x7FFE0000U, 0, 0x7FFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
- test_rshift_one(0xFFFE0000U, 0, 0xFFFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
- test_rshift_one(0x7FFE8000U, 0, 0x7FFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
- test_rshift_one(0xFFFE8000U, 0, 0xFFFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
+ test_sar_one(0x00010000U, 64, 0x0000000000000000ULL, 0x0000000000000001ULL);
+ test_sar_one(0x80010000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0x8000000000000001ULL);
+ test_sar_one(0x7FFE0000U, 64, 0x0000000000000000ULL, 0x7FFFFFFFFFFFFFFEULL);
+ test_sar_one(0xFFFE0000U, 64, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFEULL);
+ test_sar_one(0x00010000U, 60, 0x0000000000000000ULL, 0x0000000000000010ULL);
+ test_sar_one(0x80010000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000010ULL);
+ test_sar_one(0x00018000U, 60, 0x0000000000000000ULL, 0x0000000000000018ULL);
+ test_sar_one(0x80018000U, 60, 0xFFFFFFFFFFFFFFF8ULL, 0x0000000000000018ULL);
+ test_sar_one(0x7FFE0000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE0ULL);
+ test_sar_one(0xFFFE0000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE0ULL);
+ test_sar_one(0x7FFE8000U, 60, 0x0000000000000007ULL, 0xFFFFFFFFFFFFFFE8ULL);
+ test_sar_one(0xFFFE8000U, 60, 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFE8ULL);
+ test_sar_one(0x00018000U, 0, 0x0000000000000001ULL, 0x8000000000000000ULL);
+ test_sar_one(0x80018000U, 0, 0x8000000000000001ULL, 0x8000000000000000ULL);
+ test_sar_one(0x7FFE0000U, 0, 0x7FFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
+ test_sar_one(0xFFFE0000U, 0, 0xFFFFFFFFFFFFFFFEULL, 0x0000000000000000ULL);
+ test_sar_one(0x7FFE8000U, 0, 0x7FFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
+ test_sar_one(0xFFFE8000U, 0, 0xFFFFFFFFFFFFFFFEULL, 0x8000000000000000ULL);
}
int main(int argc, char **argv)
@@ -218,6 +218,6 @@ int main(int argc, char **argv)
g_test_add_func("/int128/int128_lt", test_lt);
g_test_add_func("/int128/int128_ge", test_ge);
g_test_add_func("/int128/int128_gt", test_gt);
- g_test_add_func("/int128/int128_rshift", test_rshift);
+ g_test_add_func("/int128/int128_sar", test_sar);
return g_test_run();
}
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
2020-10-21 4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
@ 2020-10-21 17:14 ` Alex Bennée
2021-02-14 18:18 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Change these to sar/shl to emphasize the signed shift.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift
2020-10-21 4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
2020-10-21 17:14 ` Alex Bennée
@ 2021-02-14 18:18 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-02-14 18:18 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: alex.bennee
On 10/21/20 6:51 AM, Richard Henderson wrote:
> Change these to sar/shl to emphasize the signed shift.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/qemu/int128.h | 8 ++++----
> softmmu/physmem.c | 4 ++--
> target/ppc/int_helper.c | 4 ++--
> tests/test-int128.c | 44 ++++++++++++++++++++---------------------
> 4 files changed, 30 insertions(+), 30 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH 04/15] qemu/int128: Add int128_shr
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (2 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 03/15] qemu/int128: Rename int128_rshift, int128_lshift Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
` (12 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Add unsigned right shift as an operation.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/int128.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 167f13ae10..c53002039a 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -68,6 +68,11 @@ static inline Int128 int128_sar(Int128 a, int n)
return a >> n;
}
+static inline Int128 int128_shr(Int128 a, int n)
+{
+ return (__uint128_t)a >> n;
+}
+
static inline Int128 int128_shl(Int128 a, int n)
{
return a << n;
@@ -232,6 +237,17 @@ static inline Int128 int128_sar(Int128 a, int n)
}
}
+static inline Int128 int128_shr(Int128 a, int n)
+{
+ uint64_t h = (uint64_t)a.hi >> (n & 63);
+ if (n >= 64) {
+ return int128_make64(h);
+ } else if (n > 0) {
+ return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h);
+ }
+ return a;
+}
+
static inline Int128 int128_shl(Int128 a, int n)
{
uint64_t l = a.lo << (n & 63);
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 05/15] qemu/int128: Add int128_geu
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (3 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 04/15] qemu/int128: Add int128_shr Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2021-02-14 18:19 ` Philippe Mathieu-Daudé
2020-10-21 4:51 ` [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128 Richard Henderson
` (11 subsequent siblings)
16 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Add an unsigned inequality operation. Do not fill in all of
the variations until we have a call for them.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/qemu/int128.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index c53002039a..1f95792a29 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -113,6 +113,11 @@ static inline bool int128_ge(Int128 a, Int128 b)
return a >= b;
}
+static inline bool int128_geu(Int128 a, Int128 b)
+{
+ return (__uint128_t)a >= (__uint128_t)b;
+}
+
static inline bool int128_lt(Int128 a, Int128 b)
{
return a < b;
@@ -303,6 +308,11 @@ static inline bool int128_ge(Int128 a, Int128 b)
return a.hi > b.hi || (a.hi == b.hi && a.lo >= b.lo);
}
+static inline bool int128_geu(Int128 a, Int128 b)
+{
+ return (uint64_t)a.hi > (uint64_t)b.hi || (a.hi == b.hi && a.lo >= b.lo);
+}
+
static inline bool int128_lt(Int128 a, Int128 b)
{
return !int128_ge(a, b);
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (4 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 05/15] qemu/int128: Add int128_geu Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 07/15] softfloat: Use int128.h for some operations Richard Henderson
` (10 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, David Hildenbrand
Via host-utils.h, we use a host widening multiply for
64-bit hosts, and a common subroutine for 32-bit hosts.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 24 ++++--------------------
1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a35ec2893a..57845f8af0 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -83,6 +83,7 @@ this code that are retained.
#define FPU_SOFTFLOAT_MACROS_H
#include "fpu/softfloat-types.h"
+#include "qemu/host-utils.h"
/*----------------------------------------------------------------------------
| Shifts `a' right by the number of bits given in `count'. If any nonzero
@@ -515,27 +516,10 @@ static inline void
| `z0Ptr' and `z1Ptr'.
*----------------------------------------------------------------------------*/
-static inline void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void
+mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- uint32_t aHigh, aLow, bHigh, bLow;
- uint64_t z0, zMiddleA, zMiddleB, z1;
-
- aLow = a;
- aHigh = a>>32;
- bLow = b;
- bHigh = b>>32;
- z1 = ( (uint64_t) aLow ) * bLow;
- zMiddleA = ( (uint64_t) aLow ) * bHigh;
- zMiddleB = ( (uint64_t) aHigh ) * bLow;
- z0 = ( (uint64_t) aHigh ) * bHigh;
- zMiddleA += zMiddleB;
- z0 += ( ( (uint64_t) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 );
- zMiddleA <<= 32;
- z1 += zMiddleA;
- z0 += ( z1 < zMiddleA );
- *z1Ptr = z1;
- *z0Ptr = z0;
-
+ mulu64(z1Ptr, z0Ptr, a, b);
}
/*----------------------------------------------------------------------------
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 07/15] softfloat: Use int128.h for some operations
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (5 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 06/15] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 08/15] softfloat: Tidy a * b + inf return Richard Henderson
` (9 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, David Hildenbrand
Use our Int128, which wraps the compiler's __int128_t, instead
of open-coding shifts and arithmetic. We'd need to extend Int128
to to replace more than these four.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 65 ++++++++++++++--------------------
1 file changed, 26 insertions(+), 39 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 57845f8af0..e6f05c048e 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -84,6 +84,7 @@ this code that are retained.
#include "fpu/softfloat-types.h"
#include "qemu/host-utils.h"
+#include "qemu/int128.h"
/*----------------------------------------------------------------------------
| Shifts `a' right by the number of bits given in `count'. If any nonzero
@@ -191,28 +192,14 @@ static inline void
| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'.
*----------------------------------------------------------------------------*/
-static inline void
- shift128Right(
- uint64_t a0, uint64_t a1, int count, uint64_t *z0Ptr, uint64_t *z1Ptr)
+static inline void shift128Right(uint64_t a0, uint64_t a1, int count,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- uint64_t z0, z1;
- int8_t negCount = ( - count ) & 63;
-
- if ( count == 0 ) {
- z1 = a1;
- z0 = a0;
- }
- else if ( count < 64 ) {
- z1 = ( a0<<negCount ) | ( a1>>count );
- z0 = a0>>count;
- }
- else {
- z1 = (count < 128) ? (a0 >> (count & 63)) : 0;
- z0 = 0;
- }
- *z1Ptr = z1;
- *z0Ptr = z0;
+ Int128 a = int128_make128(a1, a0);
+ Int128 z = int128_shr(a, count);
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
@@ -352,13 +339,11 @@ static inline void shortShift128Left(uint64_t a0, uint64_t a1, int count,
static inline void shift128Left(uint64_t a0, uint64_t a1, int count,
uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- if (count < 64) {
- *z1Ptr = a1 << count;
- *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
- } else {
- *z1Ptr = 0;
- *z0Ptr = a1 << (count - 64);
- }
+ Int128 a = int128_make128(a1, a0);
+ Int128 z = int128_shl(a, count);
+
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
@@ -405,15 +390,15 @@ static inline void
*----------------------------------------------------------------------------*/
static inline void
- add128(
- uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+add128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- uint64_t z1;
-
- z1 = a1 + b1;
- *z1Ptr = z1;
- *z0Ptr = a0 + b0 + ( z1 < a1 );
+ Int128 a = int128_make128(a1, a0);
+ Int128 b = int128_make128(b1, b0);
+ Int128 z = int128_add(a, b);
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
@@ -463,13 +448,15 @@ static inline void
*----------------------------------------------------------------------------*/
static inline void
- sub128(
- uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+sub128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
+ Int128 a = int128_make128(a1, a0);
+ Int128 b = int128_make128(b1, b0);
+ Int128 z = int128_sub(a, b);
- *z1Ptr = a1 - b1;
- *z0Ptr = a0 - b0 - ( a1 < b1 );
-
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 08/15] softfloat: Tidy a * b + inf return
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (6 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 07/15] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 09/15] softfloat: Add float_cmask and constants Richard Henderson
` (8 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, Philippe Mathieu-Daudé, David Hildenbrand
No reason to set values in 'a', when we already
have float_class_inf in 'c', and can flip that sign.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 67cfa0fd82..9db55d2b11 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1380,9 +1380,8 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
s->float_exception_flags |= float_flag_invalid;
return parts_default_nan(s);
} else {
- a.cls = float_class_inf;
- a.sign = c.sign ^ sign_flip;
- return a;
+ c.sign ^= sign_flip;
+ return c;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 09/15] softfloat: Add float_cmask and constants
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (7 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 08/15] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
` (7 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, David Hildenbrand
Testing more than one class at a time is better done with masks.
This reduces the static branch count.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 31 ++++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9db55d2b11..3e625c47cd 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -469,6 +469,20 @@ typedef enum __attribute__ ((__packed__)) {
float_class_snan,
} FloatClass;
+#define float_cmask(bit) (1u << (bit))
+
+enum {
+ float_cmask_zero = float_cmask(float_class_zero),
+ float_cmask_normal = float_cmask(float_class_normal),
+ float_cmask_inf = float_cmask(float_class_inf),
+ float_cmask_qnan = float_cmask(float_class_qnan),
+ float_cmask_snan = float_cmask(float_class_snan),
+
+ float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+ float_cmask_anynan = float_cmask_qnan | float_cmask_snan,
+};
+
+
/* Simple helpers for checking if, or what kind of, NaN we have */
static inline __attribute__((unused)) bool is_nan(FloatClass c)
{
@@ -1335,24 +1349,27 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
int flags, float_status *s)
{
- bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
- ((1 << float_class_inf) | (1 << float_class_zero));
- bool p_sign;
+ bool inf_zero, p_sign;
bool sign_flip = flags & float_muladd_negate_result;
FloatClass p_class;
uint64_t hi, lo;
int p_exp;
+ int ab_mask, abc_mask;
+
+ ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+ abc_mask = float_cmask(c.cls) | ab_mask;
+ inf_zero = ab_mask == float_cmask_infzero;
/* It is implementation-defined whether the cases of (0,inf,qnan)
* and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
* they return if they do), so we have to hand this information
* off to the target-specific pick-a-NaN routine.
*/
- if (is_nan(a.cls) || is_nan(b.cls) || is_nan(c.cls)) {
+ if (unlikely(abc_mask & float_cmask_anynan)) {
return pick_nan_muladd(a, b, c, inf_zero, s);
}
- if (inf_zero) {
+ if (unlikely(inf_zero)) {
s->float_exception_flags |= float_flag_invalid;
return parts_default_nan(s);
}
@@ -1367,9 +1384,9 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
p_sign ^= 1;
}
- if (a.cls == float_class_inf || b.cls == float_class_inf) {
+ if (ab_mask & float_cmask_inf) {
p_class = float_class_inf;
- } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+ } else if (ab_mask & float_cmask_zero) {
p_class = float_class_zero;
} else {
p_class = float_class_normal;
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 10/15] softfloat: Inline float_raise
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (8 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 09/15] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 17:16 ` Alex Bennée
2021-02-14 18:20 ` Philippe Mathieu-Daudé
2020-10-21 4:51 ` [RFC PATCH 11/15] Test split to softfloat-parts.c.inc Richard Henderson
` (6 subsequent siblings)
16 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat.h | 5 ++++-
fpu/softfloat-specialize.c.inc | 12 ------------
2 files changed, 4 insertions(+), 13 deletions(-)
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 78ad5ca738..019c2ec66d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -100,7 +100,10 @@ typedef enum {
| Routine to raise any or all of the software IEC/IEEE floating-point
| exception flags.
*----------------------------------------------------------------------------*/
-void float_raise(uint8_t flags, float_status *status);
+static inline void float_raise(uint8_t flags, float_status *status)
+{
+ status->float_exception_flags |= flags;
+}
/*----------------------------------------------------------------------------
| If `a' is denormal and we are in flush-to-zero mode then set the
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index c2f87addb2..0fe8ce408d 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -225,18 +225,6 @@ floatx80 floatx80_default_nan(float_status *status)
const floatx80 floatx80_infinity
= make_floatx80_init(floatx80_infinity_high, floatx80_infinity_low);
-/*----------------------------------------------------------------------------
-| Raises the exceptions specified by `flags'. Floating-point traps can be
-| defined here if desired. It is currently not possible for such a trap
-| to substitute a result value. If traps are not implemented, this routine
-| should be simply `float_exception_flags |= flags;'.
-*----------------------------------------------------------------------------*/
-
-void float_raise(uint8_t flags, float_status *status)
-{
- status->float_exception_flags |= flags;
-}
-
/*----------------------------------------------------------------------------
| Internal canonical NaN format.
*----------------------------------------------------------------------------*/
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 11/15] Test split to softfloat-parts.c.inc
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (9 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 10/15] softfloat: Inline float_raise Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 12/15] softfloat: Streamline FloatFmt Richard Henderson
` (5 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
---
fpu/softfloat.c | 438 ++++++++------------------------------
fpu/softfloat-parts.c.inc | 327 ++++++++++++++++++++++++++++
2 files changed, 421 insertions(+), 344 deletions(-)
create mode 100644 fpu/softfloat-parts.c.inc
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..3651f4525d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -651,191 +651,109 @@ static inline float64 float64_pack_raw(FloatParts p)
*----------------------------------------------------------------------------*/
#include "softfloat-specialize.c.inc"
-/* Canonicalize EXP and FRAC, setting CLS. */
-static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
- float_status *status)
+static FloatParts return_nan(FloatParts a, float_status *s)
{
- if (part.exp == parm->exp_max && !parm->arm_althp) {
- if (part.frac == 0) {
- part.cls = float_class_inf;
- } else {
- part.frac <<= parm->frac_shift;
- part.cls = (parts_is_snan_frac(part.frac, status)
- ? float_class_snan : float_class_qnan);
- }
- } else if (part.exp == 0) {
- if (likely(part.frac == 0)) {
- part.cls = float_class_zero;
- } else if (status->flush_inputs_to_zero) {
- float_raise(float_flag_input_denormal, status);
- part.cls = float_class_zero;
- part.frac = 0;
- } else {
- int shift = clz64(part.frac) - 1;
- part.cls = float_class_normal;
- part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
- part.frac <<= shift;
- }
- } else {
- part.cls = float_class_normal;
- part.exp -= parm->exp_bias;
- part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
- }
- return part;
-}
-
-/* Round and uncanonicalize a floating-point number by parts. There
- * are FRAC_SHIFT bits that may require rounding at the bottom of the
- * fraction; these bits will be removed. The exponent will be biased
- * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
- */
-
-static FloatParts round_canonical(FloatParts p, float_status *s,
- const FloatFmt *parm)
-{
- const uint64_t frac_lsb = parm->frac_lsb;
- const uint64_t frac_lsbm1 = parm->frac_lsbm1;
- const uint64_t round_mask = parm->round_mask;
- const uint64_t roundeven_mask = parm->roundeven_mask;
- const int exp_max = parm->exp_max;
- const int frac_shift = parm->frac_shift;
- uint64_t frac, inc;
- int exp, flags = 0;
- bool overflow_norm;
-
- frac = p.frac;
- exp = p.exp;
-
- switch (p.cls) {
- case float_class_normal:
- switch (s->float_rounding_mode) {
- case float_round_nearest_even:
- overflow_norm = false;
- inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
- break;
- case float_round_ties_away:
- overflow_norm = false;
- inc = frac_lsbm1;
- break;
- case float_round_to_zero:
- overflow_norm = true;
- inc = 0;
- break;
- case float_round_up:
- inc = p.sign ? 0 : round_mask;
- overflow_norm = p.sign;
- break;
- case float_round_down:
- inc = p.sign ? round_mask : 0;
- overflow_norm = !p.sign;
- break;
- case float_round_to_odd:
- overflow_norm = true;
- inc = frac & frac_lsb ? 0 : round_mask;
- break;
- default:
- g_assert_not_reached();
- }
-
- exp += parm->exp_bias;
- if (likely(exp > 0)) {
- if (frac & round_mask) {
- flags |= float_flag_inexact;
- frac += inc;
- if (frac & DECOMPOSED_OVERFLOW_BIT) {
- frac >>= 1;
- exp++;
- }
- }
- frac >>= frac_shift;
-
- if (parm->arm_althp) {
- /* ARM Alt HP eschews Inf and NaN for a wider exponent. */
- if (unlikely(exp > exp_max)) {
- /* Overflow. Return the maximum normal. */
- flags = float_flag_invalid;
- exp = exp_max;
- frac = -1;
- }
- } else if (unlikely(exp >= exp_max)) {
- flags |= float_flag_overflow | float_flag_inexact;
- if (overflow_norm) {
- exp = exp_max - 1;
- frac = -1;
- } else {
- p.cls = float_class_inf;
- goto do_inf;
- }
- }
- } else if (s->flush_to_zero) {
- flags |= float_flag_output_denormal;
- p.cls = float_class_zero;
- goto do_zero;
- } else {
- bool is_tiny = s->tininess_before_rounding
- || (exp < 0)
- || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
-
- shift64RightJamming(frac, 1 - exp, &frac);
- if (frac & round_mask) {
- /* Need to recompute round-to-even. */
- switch (s->float_rounding_mode) {
- case float_round_nearest_even:
- inc = ((frac & roundeven_mask) != frac_lsbm1
- ? frac_lsbm1 : 0);
- break;
- case float_round_to_odd:
- inc = frac & frac_lsb ? 0 : round_mask;
- break;
- default:
- break;
- }
- flags |= float_flag_inexact;
- frac += inc;
- }
-
- exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
- frac >>= frac_shift;
-
- if (is_tiny && (flags & float_flag_inexact)) {
- flags |= float_flag_underflow;
- }
- if (exp == 0 && frac == 0) {
- p.cls = float_class_zero;
- }
- }
- break;
-
- case float_class_zero:
- do_zero:
- exp = 0;
- frac = 0;
- break;
-
- case float_class_inf:
- do_inf:
- assert(!parm->arm_althp);
- exp = exp_max;
- frac = 0;
- break;
-
- case float_class_qnan:
+ switch (a.cls) {
case float_class_snan:
- assert(!parm->arm_althp);
- exp = exp_max;
- frac >>= parm->frac_shift;
+ s->float_exception_flags |= float_flag_invalid;
+ a = parts_silence_nan(a, s);
+ /* fall through */
+ case float_class_qnan:
+ if (s->default_nan_mode) {
+ return parts_default_nan(s);
+ }
break;
default:
g_assert_not_reached();
}
-
- float_raise(flags, s);
- p.exp = exp;
- p.frac = frac;
- return p;
+ return a;
}
+static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
+ bool inf_zero, float_status *s)
+{
+ int which;
+
+ if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
+ s->float_exception_flags |= float_flag_invalid;
+ }
+
+ which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+
+ if (s->default_nan_mode) {
+ /* Note that this check is after pickNaNMulAdd so that function
+ * has an opportunity to set the Invalid flag.
+ */
+ which = 3;
+ }
+
+ switch (which) {
+ case 0:
+ break;
+ case 1:
+ a = b;
+ break;
+ case 2:
+ a = c;
+ break;
+ case 3:
+ return parts_default_nan(s);
+ default:
+ g_assert_not_reached();
+ }
+
+ if (is_snan(a.cls)) {
+ return parts_silence_nan(a, s);
+ }
+ return a;
+}
+
+#define FUNC(X) X
+#define FRAC_TYPE uint64_t
+#define PARTS_TYPE FloatParts
+
+#define HI(P) (P)
+#define LO(P) (P)
+#define ZERO 0
+#define ONE 1
+#define MONE -1
+
+#define ADD(P1, P2) ((P1) + (P2))
+#define ADDI(P, I) ((P) + (I))
+#define CLZ(P) clz64(P)
+#define EQ0(P) ((P) == 0)
+#define EQ(P1, P2) ((P1) == (P2))
+#define GEU(P1, P2) ((P1) >= (P2))
+#define OR(P1, P2) ((P1) | (P2))
+#define SHL(P, C) ((P) << (C))
+#define SHR(P, C) ((P) >> (C))
+#define SHR_JAM(P, C) \
+ ({ uint64_t _r; shift64RightJamming((P), (C), &_r); _r; })
+#define SUB(P1, P2) ((P1) - (P2))
+
+#include "softfloat-parts.c.inc"
+
+#undef FUNC
+#undef FRAC_TYPE
+#undef PARTS_TYPE
+#undef HI
+#undef LO
+#undef ZERO
+#undef MONE
+#undef ONE
+#undef ADD
+#undef ADDI
+#undef CLZ
+#undef EQ0
+#undef EQ
+#undef GEU
+#undef OR
+#undef SHL
+#undef SHR
+#undef SHR_JAM
+#undef SUB
+
/* Explicit FloatFmt version */
static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
const FloatFmt *params)
@@ -889,174 +807,6 @@ static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
return float64_pack_raw(round_canonical(p, s, &float64_params));
}
-static FloatParts return_nan(FloatParts a, float_status *s)
-{
- switch (a.cls) {
- case float_class_snan:
- s->float_exception_flags |= float_flag_invalid;
- a = parts_silence_nan(a, s);
- /* fall through */
- case float_class_qnan:
- if (s->default_nan_mode) {
- return parts_default_nan(s);
- }
- break;
-
- default:
- g_assert_not_reached();
- }
- return a;
-}
-
-static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
-{
- if (is_snan(a.cls) || is_snan(b.cls)) {
- s->float_exception_flags |= float_flag_invalid;
- }
-
- if (s->default_nan_mode) {
- return parts_default_nan(s);
- } else {
- if (pickNaN(a.cls, b.cls,
- a.frac > b.frac ||
- (a.frac == b.frac && a.sign < b.sign), s)) {
- a = b;
- }
- if (is_snan(a.cls)) {
- return parts_silence_nan(a, s);
- }
- }
- return a;
-}
-
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
- bool inf_zero, float_status *s)
-{
- int which;
-
- if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
- s->float_exception_flags |= float_flag_invalid;
- }
-
- which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
-
- if (s->default_nan_mode) {
- /* Note that this check is after pickNaNMulAdd so that function
- * has an opportunity to set the Invalid flag.
- */
- which = 3;
- }
-
- switch (which) {
- case 0:
- break;
- case 1:
- a = b;
- break;
- case 2:
- a = c;
- break;
- case 3:
- return parts_default_nan(s);
- default:
- g_assert_not_reached();
- }
-
- if (is_snan(a.cls)) {
- return parts_silence_nan(a, s);
- }
- return a;
-}
-
-/*
- * Returns the result of adding or subtracting the values of the
- * floating-point values `a' and `b'. The operation is performed
- * according to the IEC/IEEE Standard for Binary Floating-Point
- * Arithmetic.
- */
-
-static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
- float_status *s)
-{
- bool a_sign = a.sign;
- bool b_sign = b.sign ^ subtract;
-
- if (a_sign != b_sign) {
- /* Subtraction */
-
- if (a.cls == float_class_normal && b.cls == float_class_normal) {
- if (a.exp > b.exp || (a.exp == b.exp && a.frac >= b.frac)) {
- shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
- a.frac = a.frac - b.frac;
- } else {
- shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
- a.frac = b.frac - a.frac;
- a.exp = b.exp;
- a_sign ^= 1;
- }
-
- if (a.frac == 0) {
- a.cls = float_class_zero;
- a.sign = s->float_rounding_mode == float_round_down;
- } else {
- int shift = clz64(a.frac) - 1;
- a.frac = a.frac << shift;
- a.exp = a.exp - shift;
- a.sign = a_sign;
- }
- return a;
- }
- if (is_nan(a.cls) || is_nan(b.cls)) {
- return pick_nan(a, b, s);
- }
- if (a.cls == float_class_inf) {
- if (b.cls == float_class_inf) {
- float_raise(float_flag_invalid, s);
- return parts_default_nan(s);
- }
- return a;
- }
- if (a.cls == float_class_zero && b.cls == float_class_zero) {
- a.sign = s->float_rounding_mode == float_round_down;
- return a;
- }
- if (a.cls == float_class_zero || b.cls == float_class_inf) {
- b.sign = a_sign ^ 1;
- return b;
- }
- if (b.cls == float_class_zero) {
- return a;
- }
- } else {
- /* Addition */
- if (a.cls == float_class_normal && b.cls == float_class_normal) {
- if (a.exp > b.exp) {
- shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
- } else if (a.exp < b.exp) {
- shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
- a.exp = b.exp;
- }
- a.frac += b.frac;
- if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
- shift64RightJamming(a.frac, 1, &a.frac);
- a.exp += 1;
- }
- return a;
- }
- if (is_nan(a.cls) || is_nan(b.cls)) {
- return pick_nan(a, b, s);
- }
- if (a.cls == float_class_inf || b.cls == float_class_zero) {
- return a;
- }
- if (b.cls == float_class_inf || a.cls == float_class_zero) {
- b.sign = b_sign;
- return b;
- }
- }
- g_assert_not_reached();
-}
-
/*
* Returns the result of adding or subtracting the floating-point
* values `a' and `b'. The operation is performed according to the
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
new file mode 100644
index 0000000000..49bde45521
--- /dev/null
+++ b/fpu/softfloat-parts.c.inc
@@ -0,0 +1,327 @@
+/*
+ * QEMU float support
+ *
+ * The code in this source file is derived from release 2a of the SoftFloat
+ * IEC/IEEE Floating-point Arithmetic Package. Those parts of the code (and
+ * some later contributions) are provided under that license, as detailed below.
+ * It has subsequently been modified by contributors to the QEMU Project,
+ * so some portions are provided under:
+ * the SoftFloat-2a license
+ * the BSD license
+ * GPL-v2-or-later
+ *
+ * Any future contributions to this file after December 1st 2014 will be
+ * taken to be licensed under the Softfloat-2a license unless specifically
+ * indicated otherwise.
+ */
+
+static PARTS_TYPE
+FUNC(pick_nan)(PARTS_TYPE a, PARTS_TYPE b, float_status *status)
+{
+ bool a_larger_sig;
+
+ if (is_snan(a.cls) || is_snan(b.cls)) {
+ float_raise(float_flag_invalid, status);
+ }
+
+ if (status->default_nan_mode) {
+ return FUNC(parts_default_nan)(status);
+ }
+
+ if (EQ(a.frac, b.frac)) {
+ a_larger_sig = a.sign < b.sign;
+ } else {
+ a_larger_sig = GEU(a.frac, b.frac);
+ }
+
+ if (pickNaN(a.cls, b.cls, a_larger_sig, status)) {
+ a = b;
+ }
+ if (is_snan(a.cls)) {
+ return FUNC(parts_silence_nan)(a, status);
+ }
+ return a;
+}
+
+/* Canonicalize EXP and FRAC, setting CLS. */
+static PARTS_TYPE
+FUNC(sf_canonicalize)(PARTS_TYPE p, const FloatFmt *parm, float_status *status)
+{
+ if (p.exp == 0) {
+ if (likely(EQ0(p.frac))) {
+ p.cls = float_class_zero;
+ } else if (status->flush_inputs_to_zero) {
+ float_raise(float_flag_input_denormal, status);
+ p.cls = float_class_zero;
+ p.frac = ZERO;
+ } else {
+ int shift = CLZ(p.frac) - 1;
+ p.cls = float_class_normal;
+ p.exp = parm->frac_shift - parm->exp_bias - shift + 1;
+ p.frac = SHL(p.frac, shift);
+ }
+ } else if (likely(p.exp < parm->exp_max) || parm->arm_althp) {
+ p.cls = float_class_normal;
+ p.exp -= parm->exp_bias;
+ /* Set implicit bit. */
+ p.frac = OR(p.frac, SHL(ONE, parm->frac_size));
+ p.frac = SHL(p.frac, parm->frac_shift);
+ } else if (likely(EQ0(p.frac))) {
+ p.cls = float_class_inf;
+ } else {
+ p.frac = SHL(p.frac, parm->frac_shift);
+ p.cls = (parts_is_snan_frac(HI(p.frac), status)
+ ? float_class_snan : float_class_qnan);
+ }
+ return p;
+}
+
+/* Round and uncanonicalize a floating-point number by parts. There
+ * are FRAC_SHIFT bits that may require rounding at the bottom of the
+ * fraction; these bits will be removed. The exponent will be biased
+ * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
+ */
+
+static PARTS_TYPE
+FUNC(round_canonical)(PARTS_TYPE p, float_status *s, const FloatFmt *parm)
+{
+ const int exp_max = parm->exp_max;
+ const int frac_shift = parm->frac_shift;
+ const uint64_t frac_lsb = 1ull << frac_shift;
+ const uint64_t frac_lsbm1 = 1ull << (frac_shift - 1);
+ const uint64_t round_mask = frac_lsb - 1;
+ const uint64_t roundeven_mask = round_mask | frac_lsb;
+ int flags = 0;
+
+ switch (p.cls) {
+ case float_class_normal:
+ {
+ bool overflow_norm;
+ uint64_t inc, frac_lo;
+ int exp;
+
+ frac_lo = LO(p.frac);
+ switch (s->float_rounding_mode) {
+ case float_round_nearest_even:
+ overflow_norm = false;
+ inc = ((frac_lo & roundeven_mask) != frac_lsbm1
+ ? frac_lsbm1 : 0);
+ break;
+ case float_round_ties_away:
+ overflow_norm = false;
+ inc = frac_lsbm1;
+ break;
+ case float_round_to_zero:
+ overflow_norm = true;
+ inc = 0;
+ break;
+ case float_round_up:
+ inc = p.sign ? 0 : round_mask;
+ overflow_norm = p.sign;
+ break;
+ case float_round_down:
+ inc = p.sign ? round_mask : 0;
+ overflow_norm = !p.sign;
+ break;
+ case float_round_to_odd:
+ overflow_norm = true;
+ inc = frac_lo & frac_lsb ? 0 : round_mask;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ exp = p.exp + parm->exp_bias;
+ if (likely(exp > 0)) {
+ if (frac_lo & round_mask) {
+ flags |= float_flag_inexact;
+ p.frac = ADDI(p.frac, inc);
+ if (HI(p.frac) & DECOMPOSED_OVERFLOW_BIT) {
+ p.frac = SHR(p.frac, 1);
+ exp++;
+ }
+ }
+ p.frac = SHR(p.frac, frac_shift);
+
+ if (parm->arm_althp) {
+ /* ARM Alt HP eschews Inf and NaN for a wider exponent. */
+ if (unlikely(exp > exp_max)) {
+ /* Overflow. Return the maximum normal. */
+ flags = float_flag_invalid;
+ exp = exp_max;
+ p.frac = MONE;
+ }
+ } else if (unlikely(exp >= exp_max)) {
+ flags |= float_flag_overflow | float_flag_inexact;
+ if (overflow_norm) {
+ exp = exp_max - 1;
+ p.frac = MONE;
+ } else {
+ p.cls = float_class_inf;
+ goto do_inf;
+ }
+ }
+ } else if (s->flush_to_zero) {
+ flags |= float_flag_output_denormal;
+ p.cls = float_class_zero;
+ goto do_zero;
+ } else {
+ bool is_tiny = s->tininess_before_rounding || exp < 0;
+ if (!is_tiny) {
+ FRAC_TYPE frac_inc = ADDI(p.frac, inc);
+ if (HI(frac_inc) & DECOMPOSED_OVERFLOW_BIT) {
+ is_tiny = true;
+ }
+ }
+
+ p.frac = SHR_JAM(p.frac, 1 - exp);
+ frac_lo = LO(p.frac);
+
+ if (frac_lo & round_mask) {
+ /* Need to recompute round-to-even / round-to-odd. */
+ switch (s->float_rounding_mode) {
+ case float_round_nearest_even:
+ inc = ((frac_lo & roundeven_mask) != frac_lsbm1
+ ? frac_lsbm1 : 0);
+ break;
+ case float_round_to_odd:
+ inc = frac_lo & frac_lsb ? 0 : round_mask;
+ break;
+ default:
+ break;
+ }
+ flags |= float_flag_inexact;
+ p.frac = ADDI(p.frac, inc);
+ }
+
+ exp = (HI(p.frac) & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
+ p.frac = SHR(p.frac, frac_shift);
+
+ if (is_tiny && (flags & float_flag_inexact)) {
+ flags |= float_flag_underflow;
+ }
+ if (exp == 0 && EQ0(p.frac)) {
+ p.cls = float_class_zero;
+ }
+ }
+ p.exp = exp;
+ }
+ break;
+
+ case float_class_zero:
+ do_zero:
+ p.exp = 0;
+ p.frac = ZERO;
+ break;
+
+ case float_class_inf:
+ do_inf:
+ g_assert(!parm->arm_althp);
+ p.exp = exp_max;
+ p.frac = ZERO;
+ break;
+
+ case float_class_qnan:
+ case float_class_snan:
+ g_assert(!parm->arm_althp);
+ p.exp = exp_max;
+ p.frac = SHR(p.frac, parm->frac_shift);
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
+
+ float_raise(flags, s);
+ return p;
+}
+
+/*
+ * Returns the result of adding or subtracting the values of the
+ * floating-point values `a' and `b'. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ */
+
+static PARTS_TYPE
+FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
+ bool subtract, float_status *s)
+{
+ bool a_sign = a.sign;
+ bool b_sign = b.sign ^ subtract;
+
+ if (a_sign != b_sign) {
+ /* Subtraction */
+
+ if (a.cls == float_class_normal && b.cls == float_class_normal) {
+ if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
+ b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+ a.frac = SUB(a.frac, b.frac);
+ } else {
+ a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+ a.frac = SUB(b.frac, a.frac);
+ a.exp = b.exp;
+ a_sign ^= 1;
+ }
+
+ if (EQ0(a.frac)) {
+ a.cls = float_class_zero;
+ a.sign = s->float_rounding_mode == float_round_down;
+ } else {
+ int shift = CLZ(a.frac) - 1;
+ a.frac = SHL(a.frac, shift);
+ a.exp = a.exp - shift;
+ a.sign = a_sign;
+ }
+ return a;
+ }
+ if (is_nan(a.cls) || is_nan(b.cls)) {
+ return FUNC(pick_nan)(a, b, s);
+ }
+ if (a.cls == float_class_inf) {
+ if (b.cls == float_class_inf) {
+ float_raise(float_flag_invalid, s);
+ return FUNC(parts_default_nan)(s);
+ }
+ return a;
+ }
+ if (a.cls == float_class_zero && b.cls == float_class_zero) {
+ a.sign = s->float_rounding_mode == float_round_down;
+ return a;
+ }
+ if (a.cls == float_class_zero || b.cls == float_class_inf) {
+ b.sign = a_sign ^ 1;
+ return b;
+ }
+ if (b.cls == float_class_zero) {
+ return a;
+ }
+ } else {
+ /* Addition */
+ if (a.cls == float_class_normal && b.cls == float_class_normal) {
+ if (a.exp > b.exp) {
+ b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+ } else if (a.exp < b.exp) {
+ a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+ a.exp = b.exp;
+ }
+ a.frac = ADD(a.frac, b.frac);
+ if (HI(a.frac) & DECOMPOSED_OVERFLOW_BIT) {
+ a.frac = SHR_JAM(a.frac, 1);
+ a.exp += 1;
+ }
+ return a;
+ }
+ if (is_nan(a.cls) || is_nan(b.cls)) {
+ return FUNC(pick_nan)(a, b, s);
+ }
+ if (a.cls == float_class_inf || b.cls == float_class_zero) {
+ return a;
+ }
+ if (b.cls == float_class_inf || a.cls == float_class_zero) {
+ b.sign = b_sign;
+ return b;
+ }
+ }
+ g_assert_not_reached();
+}
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 12/15] softfloat: Streamline FloatFmt
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (10 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 11/15] Test split to softfloat-parts.c.inc Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 13/15] Test float128_addsub Richard Henderson
` (4 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
The fields being removed are now computed in round_canonical.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3651f4525d..1bd21435e7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -527,10 +527,6 @@ typedef struct {
* exp_max: the maximum normalised exponent
* frac_size: the size of the fraction field
* frac_shift: shift to normalise the fraction with DECOMPOSED_BINARY_POINT
- * The following are computed based the size of fraction
- * frac_lsb: least significant bit of fraction
- * frac_lsbm1: the bit below the least significant bit (for rounding)
- * round_mask/roundeven_mask: masks used for rounding
* The following optional modifiers are available:
* arm_althp: handle ARM Alternative Half Precision
*/
@@ -540,10 +536,6 @@ typedef struct {
int exp_max;
int frac_size;
int frac_shift;
- uint64_t frac_lsb;
- uint64_t frac_lsbm1;
- uint64_t round_mask;
- uint64_t roundeven_mask;
bool arm_althp;
} FloatFmt;
@@ -553,11 +545,7 @@ typedef struct {
.exp_bias = ((1 << E) - 1) >> 1, \
.exp_max = (1 << E) - 1, \
.frac_size = F, \
- .frac_shift = DECOMPOSED_BINARY_POINT - F, \
- .frac_lsb = 1ull << (DECOMPOSED_BINARY_POINT - F), \
- .frac_lsbm1 = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1), \
- .round_mask = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1, \
- .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
+ .frac_shift = DECOMPOSED_BINARY_POINT - F
static const FloatFmt float16_params = {
FLOAT_PARAMS(5, 10)
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 13/15] Test float128_addsub
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (11 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 12/15] softfloat: Streamline FloatFmt Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats Richard Henderson
` (3 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
---
fpu/softfloat.c | 310 +++++++++++----------------------
fpu/softfloat-specialize.c.inc | 33 ++++
2 files changed, 137 insertions(+), 206 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1bd21435e7..294c573fb9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -517,6 +517,14 @@ typedef struct {
bool sign;
} FloatParts;
+/* Similar for float128. */
+typedef struct {
+ Int128 frac;
+ int32_t exp;
+ FloatClass cls;
+ bool sign;
+} FloatParts128;
+
#define DECOMPOSED_BINARY_POINT (64 - 2)
#define DECOMPOSED_IMPLICIT_BIT (1ull << DECOMPOSED_BINARY_POINT)
#define DECOMPOSED_OVERFLOW_BIT (DECOMPOSED_IMPLICIT_BIT << 1)
@@ -540,13 +548,20 @@ typedef struct {
} FloatFmt;
/* Expand fields based on the size of exponent and fraction */
-#define FLOAT_PARAMS(E, F) \
+#define FLOAT_PARAMS1(E, F) \
.exp_size = E, \
.exp_bias = ((1 << E) - 1) >> 1, \
.exp_max = (1 << E) - 1, \
- .frac_size = F, \
+ .frac_size = F
+
+#define FLOAT_PARAMS(E, F) \
+ FLOAT_PARAMS1(E, F), \
.frac_shift = DECOMPOSED_BINARY_POINT - F
+#define FLOAT128_PARAMS(E, F) \
+ FLOAT_PARAMS1(E, F), \
+ .frac_shift = DECOMPOSED_BINARY_POINT + 64 - F
+
static const FloatFmt float16_params = {
FLOAT_PARAMS(5, 10)
};
@@ -568,6 +583,10 @@ static const FloatFmt float64_params = {
FLOAT_PARAMS(11, 52)
};
+static const FloatFmt float128_params = {
+ FLOAT128_PARAMS(15, 112)
+};
+
/* Unpack a float to parts, but do not canonicalize. */
static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
{
@@ -742,6 +761,51 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
#undef SHR_JAM
#undef SUB
+#define FUNC(X) X##128
+#define FRAC_TYPE Int128
+#define PARTS_TYPE FloatParts128
+
+#define HI(P) int128_gethi(P)
+#define LO(P) int128_getlo(P)
+#define ZERO int128_zero()
+#define MONE int128_make128(-1, -1)
+#define ONE int128_one()
+
+#define ADD(P1, P2) int128_add(P1, P2)
+#define ADDI(P, I) int128_add(P, int128_make64(I))
+#define CLZ(P) int128_clz(P)
+#define EQ0(P) (!int128_nz(P))
+#define EQ(P1, P2) int128_eq(P1, P2)
+#define GEU(P1, P2) int128_geu(P1, P2)
+#define OR(P1, P2) int128_or(P1, P2)
+#define SHL(P, C) int128_shl(P, C)
+#define SHR(P, C) int128_shr(P, C)
+#define SHR_JAM(P, C) \
+ ({ uint64_t _h, _l; shift128RightJamming(HI(P), LO(P), C, &_h, &_l); \
+ int128_make128(_l, _h); })
+#define SUB(P1, P2) int128_sub(P1, P2)
+
+#include "softfloat-parts.c.inc"
+
+#undef FUNC
+#undef FRAC_TYPE
+#undef PARTS_TYPE
+#undef HI
+#undef LO
+#undef ZERO
+#undef MONE
+#undef ONE
+#undef ADD
+#undef ADDI
+#undef CLZ
+#undef EQ0
+#undef EQ
+#undef GEU
+#undef SHL
+#undef SHR
+#undef SHR_JAM
+#undef SUB
+
/* Explicit FloatFmt version */
static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
const FloatFmt *params)
@@ -6664,225 +6728,59 @@ float128 float128_round_to_int(float128 a, float_status *status)
}
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the quadruple-precision
-| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated
-| before being returned. `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 addFloat128Sigs(float128 a, float128 b, bool zSign,
- float_status *status)
+static FloatParts128 float128_unpack_raw(float128 f)
{
- int32_t aExp, bExp, zExp;
- uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2;
- int32_t expDiff;
-
- aSig1 = extractFloat128Frac1( a );
- aSig0 = extractFloat128Frac0( a );
- aExp = extractFloat128Exp( a );
- bSig1 = extractFloat128Frac1( b );
- bSig0 = extractFloat128Frac0( b );
- bExp = extractFloat128Exp( b );
- expDiff = aExp - bExp;
- if ( 0 < expDiff ) {
- if ( aExp == 0x7FFF ) {
- if (aSig0 | aSig1) {
- return propagateFloat128NaN(a, b, status);
- }
- return a;
- }
- if ( bExp == 0 ) {
- --expDiff;
- }
- else {
- bSig0 |= UINT64_C(0x0001000000000000);
- }
- shift128ExtraRightJamming(
- bSig0, bSig1, 0, expDiff, &bSig0, &bSig1, &zSig2 );
- zExp = aExp;
- }
- else if ( expDiff < 0 ) {
- if ( bExp == 0x7FFF ) {
- if (bSig0 | bSig1) {
- return propagateFloat128NaN(a, b, status);
- }
- return packFloat128( zSign, 0x7FFF, 0, 0 );
- }
- if ( aExp == 0 ) {
- ++expDiff;
- }
- else {
- aSig0 |= UINT64_C(0x0001000000000000);
- }
- shift128ExtraRightJamming(
- aSig0, aSig1, 0, - expDiff, &aSig0, &aSig1, &zSig2 );
- zExp = bExp;
- }
- else {
- if ( aExp == 0x7FFF ) {
- if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
- return propagateFloat128NaN(a, b, status);
- }
- return a;
- }
- add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
- if ( aExp == 0 ) {
- if (status->flush_to_zero) {
- if (zSig0 | zSig1) {
- float_raise(float_flag_output_denormal, status);
- }
- return packFloat128(zSign, 0, 0, 0);
- }
- return packFloat128( zSign, 0, zSig0, zSig1 );
- }
- zSig2 = 0;
- zSig0 |= UINT64_C(0x0002000000000000);
- zExp = aExp;
- goto shiftRight1;
- }
- aSig0 |= UINT64_C(0x0001000000000000);
- add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
- --zExp;
- if ( zSig0 < UINT64_C(0x0002000000000000) ) goto roundAndPack;
- ++zExp;
- shiftRight1:
- shift128ExtraRightJamming(
- zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 );
- roundAndPack:
- return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
+ const int f_size = float128_params.frac_size;
+ const int e_size = float128_params.exp_size;
+ return (FloatParts128) {
+ .cls = float_class_unclassified,
+ .sign = extract64(f.high, f_size + e_size - 64, 1),
+ .exp = extract64(f.high, f_size - 64, e_size),
+ .frac = int128_make128(f.low, extract64(f.high, 0, f_size - 64))
+ };
}
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the quadruple-
-| precision floating-point values `a' and `b'. If `zSign' is 1, the
-| difference is negated before being returned. `zSign' is ignored if the
-| result is a NaN. The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 subFloat128Sigs(float128 a, float128 b, bool zSign,
- float_status *status)
+static float128 float128_pack_raw(FloatParts128 p)
{
- int32_t aExp, bExp, zExp;
- uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1;
- int32_t expDiff;
-
- aSig1 = extractFloat128Frac1( a );
- aSig0 = extractFloat128Frac0( a );
- aExp = extractFloat128Exp( a );
- bSig1 = extractFloat128Frac1( b );
- bSig0 = extractFloat128Frac0( b );
- bExp = extractFloat128Exp( b );
- expDiff = aExp - bExp;
- shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 );
- shortShift128Left( bSig0, bSig1, 14, &bSig0, &bSig1 );
- if ( 0 < expDiff ) goto aExpBigger;
- if ( expDiff < 0 ) goto bExpBigger;
- if ( aExp == 0x7FFF ) {
- if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
- return propagateFloat128NaN(a, b, status);
- }
- float_raise(float_flag_invalid, status);
- return float128_default_nan(status);
- }
- if ( aExp == 0 ) {
- aExp = 1;
- bExp = 1;
- }
- if ( bSig0 < aSig0 ) goto aBigger;
- if ( aSig0 < bSig0 ) goto bBigger;
- if ( bSig1 < aSig1 ) goto aBigger;
- if ( aSig1 < bSig1 ) goto bBigger;
- return packFloat128(status->float_rounding_mode == float_round_down,
- 0, 0, 0);
- bExpBigger:
- if ( bExp == 0x7FFF ) {
- if (bSig0 | bSig1) {
- return propagateFloat128NaN(a, b, status);
- }
- return packFloat128( zSign ^ 1, 0x7FFF, 0, 0 );
- }
- if ( aExp == 0 ) {
- ++expDiff;
- }
- else {
- aSig0 |= UINT64_C(0x4000000000000000);
- }
- shift128RightJamming( aSig0, aSig1, - expDiff, &aSig0, &aSig1 );
- bSig0 |= UINT64_C(0x4000000000000000);
- bBigger:
- sub128( bSig0, bSig1, aSig0, aSig1, &zSig0, &zSig1 );
- zExp = bExp;
- zSign ^= 1;
- goto normalizeRoundAndPack;
- aExpBigger:
- if ( aExp == 0x7FFF ) {
- if (aSig0 | aSig1) {
- return propagateFloat128NaN(a, b, status);
- }
- return a;
- }
- if ( bExp == 0 ) {
- --expDiff;
- }
- else {
- bSig0 |= UINT64_C(0x4000000000000000);
- }
- shift128RightJamming( bSig0, bSig1, expDiff, &bSig0, &bSig1 );
- aSig0 |= UINT64_C(0x4000000000000000);
- aBigger:
- sub128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
- zExp = aExp;
- normalizeRoundAndPack:
- --zExp;
- return normalizeRoundAndPackFloat128(zSign, zExp - 14, zSig0, zSig1,
- status);
+ const int f_size = float128_params.frac_size;
+ const int e_size = float128_params.exp_size;
+ uint64_t h = int128_gethi(p.frac);
+ uint64_t l = int128_getlo(p.frac);
+ h = deposit64(h, f_size - 64, e_size, p.exp);
+ h = deposit64(h, f_size + e_size - 64, 1, p.sign);
+ return make_float128(h, l);
}
-/*----------------------------------------------------------------------------
-| Returns the result of adding the quadruple-precision floating-point values
-| `a' and `b'. The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
+static FloatParts128 float128_unpack_canonical(float128 f, float_status *s)
+{
+ return sf_canonicalize128(float128_unpack_raw(f), &float128_params, s);
+}
+
+static float128 float128_round_pack_canonical(FloatParts128 p, float_status *s)
+{
+ return float128_pack_raw(round_canonical128(p, s, &float128_params));
+}
+
+static float128 QEMU_FLATTEN
+float128_addsub(float128 a, float128 b, float_status *status, bool subtract)
+{
+ FloatParts128 pa = float128_unpack_canonical(a, status);
+ FloatParts128 pb = float128_unpack_canonical(b, status);
+ FloatParts128 pr = addsub_floats128(pa, pb, subtract, status);
+
+ return float128_round_pack_canonical(pr, status);
+}
float128 float128_add(float128 a, float128 b, float_status *status)
{
- bool aSign, bSign;
-
- aSign = extractFloat128Sign( a );
- bSign = extractFloat128Sign( b );
- if ( aSign == bSign ) {
- return addFloat128Sigs(a, b, aSign, status);
- }
- else {
- return subFloat128Sigs(a, b, aSign, status);
- }
-
+ return float128_addsub(a, b, status, false);
}
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the quadruple-precision floating-point
-| values `a' and `b'. The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
float128 float128_sub(float128 a, float128 b, float_status *status)
{
- bool aSign, bSign;
-
- aSign = extractFloat128Sign( a );
- bSign = extractFloat128Sign( b );
- if ( aSign == bSign ) {
- return subFloat128Sigs(a, b, aSign, status);
- }
- else {
- return addFloat128Sigs(a, b, aSign, status);
- }
-
+ return float128_addsub(a, b, status, true);
}
/*----------------------------------------------------------------------------
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 0fe8ce408d..404d38071a 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -169,6 +169,23 @@ static FloatParts parts_default_nan(float_status *status)
};
}
+static FloatParts128 parts_default_nan128(float_status *status)
+{
+ FloatParts p = parts_default_nan(status);
+
+ /*
+ * Extrapolate from the choices made by parts_default_nan to fill
+ * in the quad-floating format. Copy the high bits across unchanged,
+ * and replicate the lsb to all lower bits.
+ */
+ return (FloatParts128) {
+ .cls = float_class_qnan,
+ .sign = p.sign,
+ .exp = INT_MAX,
+ .frac = int128_make128(-(p.frac & 1), p.frac)
+ };
+}
+
/*----------------------------------------------------------------------------
| Returns a quiet NaN from a signalling NaN for the deconstructed
| floating-point parts.
@@ -191,6 +208,22 @@ static FloatParts parts_silence_nan(FloatParts a, float_status *status)
return a;
}
+static FloatParts128 parts_silence_nan128(FloatParts128 a, float_status *s)
+{
+ g_assert(!no_signaling_nans(s));
+#if defined(TARGET_HPPA)
+ g_assert_not_reached();
+#endif
+ if (snan_bit_is_one(s)) {
+ return parts_default_nan128(s);
+ } else {
+ Int128 t = int128_make128(0, 1ULL << (DECOMPOSED_BINARY_POINT - 1));
+ a.frac = int128_or(a.frac, t);
+ }
+ a.cls = float_class_qnan;
+ return a;
+}
+
/*----------------------------------------------------------------------------
| The pattern for a default generated extended double-precision NaN.
*----------------------------------------------------------------------------*/
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (12 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 13/15] Test float128_addsub Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 4:51 ` [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent Richard Henderson
` (2 subsequent siblings)
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Testing more than one class at a time is better done with masks.
Sort a few case combinations before the NaN check, which should
be assumed to be least probable. Share the pick_nan call between
the add and subtract cases.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat-parts.c.inc | 70 +++++++++++++++++++++------------------
1 file changed, 37 insertions(+), 33 deletions(-)
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 49bde45521..d2b6454903 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -247,13 +247,13 @@ static PARTS_TYPE
FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
bool subtract, float_status *s)
{
- bool a_sign = a.sign;
bool b_sign = b.sign ^ subtract;
+ int ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
- if (a_sign != b_sign) {
+ if (a.sign != b_sign) {
/* Subtraction */
- if (a.cls == float_class_normal && b.cls == float_class_normal) {
+ if (likely(ab_mask == float_cmask_normal)) {
if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
b.frac = SHR_JAM(b.frac, a.exp - b.exp);
a.frac = SUB(a.frac, b.frac);
@@ -261,7 +261,7 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
a.frac = SHR_JAM(a.frac, b.exp - a.exp);
a.frac = SUB(b.frac, a.frac);
a.exp = b.exp;
- a_sign ^= 1;
+ a.sign ^= 1;
}
if (EQ0(a.frac)) {
@@ -270,35 +270,37 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
} else {
int shift = CLZ(a.frac) - 1;
a.frac = SHL(a.frac, shift);
- a.exp = a.exp - shift;
- a.sign = a_sign;
+ a.exp -= shift;
}
return a;
}
- if (is_nan(a.cls) || is_nan(b.cls)) {
- return FUNC(pick_nan)(a, b, s);
- }
- if (a.cls == float_class_inf) {
- if (b.cls == float_class_inf) {
- float_raise(float_flag_invalid, s);
- return FUNC(parts_default_nan)(s);
- }
- return a;
- }
- if (a.cls == float_class_zero && b.cls == float_class_zero) {
+
+ /* 0 - 0 */
+ if (ab_mask == float_cmask_zero) {
a.sign = s->float_rounding_mode == float_round_down;
return a;
}
- if (a.cls == float_class_zero || b.cls == float_class_inf) {
- b.sign = a_sign ^ 1;
- return b;
+
+ /* Inf - Inf */
+ if (unlikely(ab_mask == float_cmask_inf)) {
+ float_raise(float_flag_invalid, s);
+ return FUNC(parts_default_nan)(s);
}
- if (b.cls == float_class_zero) {
- return a;
+
+ if (!(ab_mask & float_cmask_anynan)) {
+ if (a.cls == float_class_inf || b.cls == float_class_zero) {
+ return a;
+ }
+ if (b.cls == float_class_inf || a.cls == float_class_zero) {
+ b.sign = a.sign ^ 1;
+ return b;
+ }
+ g_assert_not_reached();
}
} else {
/* Addition */
- if (a.cls == float_class_normal && b.cls == float_class_normal) {
+
+ if (likely(ab_mask == float_cmask_normal)) {
if (a.exp > b.exp) {
b.frac = SHR_JAM(b.frac, a.exp - b.exp);
} else if (a.exp < b.exp) {
@@ -312,16 +314,18 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
}
return a;
}
- if (is_nan(a.cls) || is_nan(b.cls)) {
- return FUNC(pick_nan)(a, b, s);
- }
- if (a.cls == float_class_inf || b.cls == float_class_zero) {
- return a;
- }
- if (b.cls == float_class_inf || a.cls == float_class_zero) {
- b.sign = b_sign;
- return b;
+
+ if (!(ab_mask & float_cmask_anynan)) {
+ if (a.cls == float_class_inf || b.cls == float_class_zero) {
+ return a;
+ }
+ if (b.cls == float_class_inf || a.cls == float_class_zero) {
+ b.sign = b_sign;
+ return b;
+ }
+ g_assert_not_reached();
}
}
- g_assert_not_reached();
+
+ return FUNC(pick_nan)(a, b, s);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (13 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 14/15] softfloat: Use float_cmask for addsub_floats Richard Henderson
@ 2020-10-21 4:51 ` Richard Henderson
2020-10-21 5:12 ` [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub no-reply
2020-10-21 17:46 ` Alex Bennée
16 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 4:51 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee
Rather than compare the fractions before subtracting, do the
subtract and examine the result, possibly negating it.
Looking toward re-using addsub_floats(N**2) for the addition
stage of muladd_floats(N), this will important because of the
longer fraction sizes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 4 ++++
fpu/softfloat-parts.c.inc | 32 ++++++++++++++++++++------------
2 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 294c573fb9..bf808a1b74 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -732,6 +732,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
#define EQ0(P) ((P) == 0)
#define EQ(P1, P2) ((P1) == (P2))
#define GEU(P1, P2) ((P1) >= (P2))
+#define NEG(P) (-(P))
#define OR(P1, P2) ((P1) | (P2))
#define SHL(P, C) ((P) << (C))
#define SHR(P, C) ((P) >> (C))
@@ -755,6 +756,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
#undef EQ0
#undef EQ
#undef GEU
+#undef NEG
#undef OR
#undef SHL
#undef SHR
@@ -777,6 +779,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
#define EQ0(P) (!int128_nz(P))
#define EQ(P1, P2) int128_eq(P1, P2)
#define GEU(P1, P2) int128_geu(P1, P2)
+#define NEG(P) int128_neg(P)
#define OR(P1, P2) int128_or(P1, P2)
#define SHL(P, C) int128_shl(P, C)
#define SHR(P, C) int128_shr(P, C)
@@ -801,6 +804,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
#undef EQ0
#undef EQ
#undef GEU
+#undef NEG
#undef SHL
#undef SHR
#undef SHR_JAM
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index d2b6454903..9762cf8b66 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -254,29 +254,37 @@ FUNC(addsub_floats)(PARTS_TYPE a, PARTS_TYPE b,
/* Subtraction */
if (likely(ab_mask == float_cmask_normal)) {
- if (a.exp > b.exp || (a.exp == b.exp && GEU(a.frac, b.frac))) {
- b.frac = SHR_JAM(b.frac, a.exp - b.exp);
+ int shift, diff_exp = a.exp - b.exp;
+
+ if (diff_exp > 0) {
+ b.frac = SHR_JAM(b.frac, diff_exp);
a.frac = SUB(a.frac, b.frac);
- } else {
- a.frac = SHR_JAM(a.frac, b.exp - a.exp);
+ } else if (diff_exp < 0) {
+ a.frac = SHR_JAM(a.frac, -diff_exp);
a.frac = SUB(b.frac, a.frac);
a.exp = b.exp;
a.sign ^= 1;
+ } else {
+ a.frac = SUB(b.frac, a.frac);
+ /* a.frac < b.frac results in carry into the overflow bit. */
+ if (HI(a.frac) & DECOMPOSED_OVERFLOW_BIT) {
+ a.frac = NEG(a.frac);
+ a.sign ^= 1;
+ } else if (EQ0(a.frac)) {
+ a.cls = float_class_zero;
+ goto sub_zero;
+ }
}
- if (EQ0(a.frac)) {
- a.cls = float_class_zero;
- a.sign = s->float_rounding_mode == float_round_down;
- } else {
- int shift = CLZ(a.frac) - 1;
- a.frac = SHL(a.frac, shift);
- a.exp -= shift;
- }
+ shift = CLZ(a.frac) - 1;
+ a.frac = SHL(a.frac, shift);
+ a.exp -= shift;
return a;
}
/* 0 - 0 */
if (ab_mask == float_cmask_zero) {
+ sub_zero:
a.sign = s->float_rounding_mode == float_round_down;
return a;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (14 preceding siblings ...)
2020-10-21 4:51 ` [RFC PATCH 15/15] softfloat: Improve subtraction of equal exponent Richard Henderson
@ 2020-10-21 5:12 ` no-reply
2020-10-21 17:46 ` Alex Bennée
16 siblings, 0 replies; 30+ messages in thread
From: no-reply @ 2020-10-21 5:12 UTC (permalink / raw)
To: richard.henderson; +Cc: alex.bennee, qemu-devel
Patchew URL: https://patchew.org/QEMU/20201021045149.1582203-1-richard.henderson@linaro.org/
Hi,
This series seems to have some coding style problems. See output below for
more information:
Type: series
Message-id: 20201021045149.1582203-1-richard.henderson@linaro.org
Subject: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
* [new tag] patchew/20201021045149.1582203-1-richard.henderson@linaro.org -> patchew/20201021045149.1582203-1-richard.henderson@linaro.org
Switched to a new branch 'test'
ad120c1 softfloat: Improve subtraction of equal exponent
12eb5a4 softfloat: Use float_cmask for addsub_floats
a04ff7d Test float128_addsub
fc9537e softfloat: Streamline FloatFmt
979beb6 Test split to softfloat-parts.c.inc
4317991 softfloat: Inline float_raise
4689bd2 softfloat: Add float_cmask and constants
b1141ee softfloat: Tidy a * b + inf return
197273c softfloat: Use int128.h for some operations
aa4afa2 softfloat: Use mulu64 for mul64To128
017d276 qemu/int128: Add int128_geu
71dd5f1 qemu/int128: Add int128_shr
9144df9 qemu/int128: Rename int128_rshift, int128_lshift
b6c9afb qemu/int128: Add int128_clz, int128_ctz
0ceff9a qemu/int128: Add int128_or
=== OUTPUT BEGIN ===
1/15 Checking commit 0ceff9a14aa6 (qemu/int128: Add int128_or)
2/15 Checking commit b6c9afb58357 (qemu/int128: Add int128_clz, int128_ctz)
3/15 Checking commit 9144df990b17 (qemu/int128: Rename int128_rshift, int128_lshift)
4/15 Checking commit 71dd5f157a39 (qemu/int128: Add int128_shr)
5/15 Checking commit 017d27627112 (qemu/int128: Add int128_geu)
6/15 Checking commit aa4afa22ee78 (softfloat: Use mulu64 for mul64To128)
7/15 Checking commit 197273c0aeda (softfloat: Use int128.h for some operations)
8/15 Checking commit b1141eecc368 (softfloat: Tidy a * b + inf return)
9/15 Checking commit 4689bd26fd66 (softfloat: Add float_cmask and constants)
10/15 Checking commit 4317991dcbd8 (softfloat: Inline float_raise)
11/15 Checking commit 979beb676e89 (Test split to softfloat-parts.c.inc)
WARNING: Block comments use a leading /* on a separate line
#557: FILE: fpu/softfloat.c:685:
+ /* Note that this check is after pickNaNMulAdd so that function
ERROR: Missing Signed-off-by: line(s)
total: 1 errors, 1 warnings, 786 lines checked
Patch 11/15 has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
12/15 Checking commit fc9537e66b73 (softfloat: Streamline FloatFmt)
13/15 Checking commit a04ff7dcb003 (Test float128_addsub)
ERROR: Missing Signed-off-by: line(s)
total: 1 errors, 0 warnings, 405 lines checked
Patch 13/15 has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
14/15 Checking commit 12eb5a486720 (softfloat: Use float_cmask for addsub_floats)
15/15 Checking commit ad120c1d1ae8 (softfloat: Improve subtraction of equal exponent)
=== OUTPUT END ===
Test command exited with code: 1
The full log is available at
http://patchew.org/logs/20201021045149.1582203-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
2020-10-21 4:51 [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Richard Henderson
` (15 preceding siblings ...)
2020-10-21 5:12 ` [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub no-reply
@ 2020-10-21 17:46 ` Alex Bennée
2020-10-21 17:53 ` Richard Henderson
16 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2020-10-21 17:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> Hi Alex,
>
> Here's my first adjustment to your conversion for 128-bit floats.
>
> The Idea is to use a set of macros and an include file so that we
> can re-use the same large chunk of code that performs the basic
> operations on various fraction lengths. It's ugly, but without
> proper language support it seems to be less ugly than most.
>
> While I've just gone and added lots of stuff to int128... I have
> had another idea, half-baked because I'm tired and it's late:
>
> typedef struct {
> FloatClass cls;
> int exp;
> bool sign;
> uint64_t frac[];
> } FloatPartsBase;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac;
> } FloatParts64;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac_hi, frac_lo;
> } FloatParts128;
>
> typedef struct {
> FloatPartsBase base;
> uint64_t frac[4]; /* big endian word ordering */
> } FloatParts256;
>
> This layout, with the big-endian ordering, means that storage
> can be shared between them, just by ignoring the least significant
> words of the fraction as needed. Which may make muladd more
> understandable.
Would the big-endian formatting hamper the compiler on x86 where it can
do extra wide operations?
I am still seeing a multi MFlop drop in performance when converting the
float128_addsub to the new code. If this allows the compiler to do
better on the code I can live with it.
>
> E.g.
>
> static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
> FloatParts64 *b, FloatParts128 *c, ...)
> {
> // handle nans
> // produce 128-bit product into r
> // handle p vs c special cases.
> // zero-extend c to 128-bits
> c->frac[1] = 0;
> // perform 128-bit fractional addition
> addsub_floats128(r, c, ...);
> // fold 128-bit fraction to 64-bit sticky bit.
> r->frac[0] |= r->frac[1] != 0;
> }
>
> float64 float64_muladd(float64 a, float64 b, float64 c, ...)
> {
> FloatParts64 pa, pb;
> FloatParts128 pc, pr;
>
> float64_unpack_canonical(&pa.base, a, status);
> float64_unpack_canonical(&pb.base, b, status);
> float64_unpack_canonical(&pc.base, c, status);
> muladd_floats64(&pr, &pa, &pb, &pc, flags, status);
>
> return float64_round_pack_canonical(&pr.base, status);
> }
>
> Similarly, muladd_floats128 would use addsub_floats256.
>
> However, the big-endian word ordering means that Int128
> cannot be used directly; so a set of wrappers are needed.
> If added the Int128 routine just for use here, then it's
> probably easier to bypass Int128 and just code it here.
Are you talking about all our operations? Will we still need to#ifdef
CONFIG_INT128 in the softfloat code?
>
> Thoughts?
>
>
> r~
>
>
> Richard Henderson (15):
> qemu/int128: Add int128_or
> qemu/int128: Add int128_clz, int128_ctz
> qemu/int128: Rename int128_rshift, int128_lshift
> qemu/int128: Add int128_shr
> qemu/int128: Add int128_geu
> softfloat: Use mulu64 for mul64To128
> softfloat: Use int128.h for some operations
> softfloat: Tidy a * b + inf return
> softfloat: Add float_cmask and constants
> softfloat: Inline float_raise
> Test split to softfloat-parts.c.inc
> softfloat: Streamline FloatFmt
> Test float128_addsub
> softfloat: Use float_cmask for addsub_floats
> softfloat: Improve subtraction of equal exponent
>
> include/fpu/softfloat-macros.h | 89 ++--
> include/fpu/softfloat.h | 5 +-
> include/qemu/int128.h | 61 ++-
> fpu/softfloat.c | 802 ++++++++++-----------------------
> softmmu/physmem.c | 4 +-
> target/ppc/int_helper.c | 4 +-
> tests/test-int128.c | 44 +-
> fpu/softfloat-parts.c.inc | 339 ++++++++++++++
> fpu/softfloat-specialize.c.inc | 45 +-
> 9 files changed, 716 insertions(+), 677 deletions(-)
> create mode 100644 fpu/softfloat-parts.c.inc
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
2020-10-21 17:46 ` Alex Bennée
@ 2020-10-21 17:53 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-10-21 17:53 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel
On 10/21/20 10:46 AM, Alex Bennée wrote:
>> This layout, with the big-endian ordering, means that storage
>> can be shared between them, just by ignoring the least significant
>> words of the fraction as needed. Which may make muladd more
>> understandable.
>
> Would the big-endian formatting hamper the compiler on x86 where it can
> do extra wide operations?
Well, you couldn't just use Int128 in the structure. But you could write the
helpers via int128_make128/getlo/gethi, which would still get the compiler
expansion.
>> However, the big-endian word ordering means that Int128
>> cannot be used directly; so a set of wrappers are needed.
>> If added the Int128 routine just for use here, then it's
>> probably easier to bypass Int128 and just code it here.
>
> Are you talking about all our operations? Will we still need to#ifdef
> CONFIG_INT128 in the softfloat code?
If we decline to put the operation into qemu/int128.h, because they're not
generally useful, then yes, we may put those ifdefs into our softfloat code.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread