From: "Alex Bennée" <alex.bennee@linaro.org>
To: richard.henderson@linaro.org
Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org,
qemu-arm@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
"Aurelien Jarno" <aurelien@aurel32.net>
Subject: [Qemu-devel] [RFC PATCH 15/30] softfloat: half-precision add/sub/mul/div support
Date: Fri, 13 Oct 2017 17:24:23 +0100 [thread overview]
Message-ID: <20171013162438.32458-16-alex.bennee@linaro.org> (raw)
In-Reply-To: <20171013162438.32458-1-alex.bennee@linaro.org>
Rather than following the SoftFloat3 implementation I've used the same
basic template as the rest of our softfloat code. One minor difference
is the 32bit intermediates end up with the binary point in the same
place as the 32 bit version so the change isn't totally mechanical.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
fpu/softfloat.c | 352 ++++++++++++++++++++++++++++++++++++++++++++++++
include/fpu/softfloat.h | 6 +
2 files changed, 358 insertions(+)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index cf7bf6d4f4..ff967f5525 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3532,6 +3532,358 @@ static void normalizeFloat16Subnormal(uint32_t aSig, int *zExpPtr,
*zExpPtr = 1 - shiftCount;
}
+/*----------------------------------------------------------------------------
+| Returns the result of adding the absolute values of the half-precision
+| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated
+| before being returned. `zSign' is ignored if the result is a NaN.
+| The addition is performed according to the IEC/IEEE Standard for Binary
+| Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+static float16 addFloat16Sigs(float16 a, float16 b, flag zSign,
+ float_status *status)
+{
+ int aExp, bExp, zExp;
+ uint16_t aSig, bSig, zSig;
+ int expDiff;
+
+ aSig = extractFloat16Frac( a );
+ aExp = extractFloat16Exp( a );
+ bSig = extractFloat16Frac( b );
+ bExp = extractFloat16Exp( b );
+ expDiff = aExp - bExp;
+ aSig <<= 3;
+ bSig <<= 3;
+ if ( 0 < expDiff ) {
+ if ( aExp == 0x1F ) {
+ if (aSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return a;
+ }
+ if ( bExp == 0 ) {
+ --expDiff;
+ }
+ else {
+ bSig |= 0x20000000;
+ }
+ shift16RightJamming( bSig, expDiff, &bSig );
+ zExp = aExp;
+ }
+ else if ( expDiff < 0 ) {
+ if ( bExp == 0x1F ) {
+ if (bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return packFloat16( zSign, 0x1F, 0 );
+ }
+ if ( aExp == 0 ) {
+ ++expDiff;
+ }
+ else {
+ aSig |= 0x0400;
+ }
+ shift16RightJamming( aSig, - expDiff, &aSig );
+ zExp = bExp;
+ }
+ else {
+ if ( aExp == 0x1F ) {
+ if (aSig | bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return a;
+ }
+ if ( aExp == 0 ) {
+ if (status->flush_to_zero) {
+ if (aSig | bSig) {
+ float_raise(float_flag_output_denormal, status);
+ }
+ return packFloat16(zSign, 0, 0);
+ }
+ return packFloat16( zSign, 0, ( aSig + bSig )>>3 );
+ }
+ zSig = 0x0400 + aSig + bSig;
+ zExp = aExp;
+ goto roundAndPack;
+ }
+ aSig |= 0x0400;
+ zSig = ( aSig + bSig )<<1;
+ --zExp;
+ if ( (int16_t) zSig < 0 ) {
+ zSig = aSig + bSig;
+ ++zExp;
+ }
+ roundAndPack:
+ return roundAndPackFloat16(zSign, zExp, zSig, true, status);
+
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of subtracting the absolute values of the half-
+| precision floating-point values `a' and `b'. If `zSign' is 1, the
+| difference is negated before being returned. `zSign' is ignored if the
+| result is a NaN. The subtraction is performed according to the IEC/IEEE
+| Standard for Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+static float16 subFloat16Sigs(float16 a, float16 b, flag zSign,
+ float_status *status)
+{
+ int aExp, bExp, zExp;
+ uint16_t aSig, bSig, zSig;
+ int expDiff;
+
+ aSig = extractFloat16Frac( a );
+ aExp = extractFloat16Exp( a );
+ bSig = extractFloat16Frac( b );
+ bExp = extractFloat16Exp( b );
+ expDiff = aExp - bExp;
+ aSig <<= 7;
+ bSig <<= 7;
+ if ( 0 < expDiff ) goto aExpBigger;
+ if ( expDiff < 0 ) goto bExpBigger;
+ if ( aExp == 0xFF ) {
+ if (aSig | bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ float_raise(float_flag_invalid, status);
+ return float16_default_nan(status);
+ }
+ if ( aExp == 0 ) {
+ aExp = 1;
+ bExp = 1;
+ }
+ if ( bSig < aSig ) goto aBigger;
+ if ( aSig < bSig ) goto bBigger;
+ return packFloat16(status->float_rounding_mode == float_round_down, 0, 0);
+ bExpBigger:
+ if ( bExp == 0xFF ) {
+ if (bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return packFloat16( zSign ^ 1, 0xFF, 0 );
+ }
+ if ( aExp == 0 ) {
+ ++expDiff;
+ }
+ else {
+ aSig |= 0x40000000;
+ }
+ shift16RightJamming( aSig, - expDiff, &aSig );
+ bSig |= 0x40000000;
+ bBigger:
+ zSig = bSig - aSig;
+ zExp = bExp;
+ zSign ^= 1;
+ goto normalizeRoundAndPack;
+ aExpBigger:
+ if ( aExp == 0xFF ) {
+ if (aSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return a;
+ }
+ if ( bExp == 0 ) {
+ --expDiff;
+ }
+ else {
+ bSig |= 0x40000000;
+ }
+ shift16RightJamming( bSig, expDiff, &bSig );
+ aSig |= 0x40000000;
+ aBigger:
+ zSig = aSig - bSig;
+ zExp = aExp;
+ normalizeRoundAndPack:
+ --zExp;
+ return normalizeRoundAndPackFloat16(zSign, zExp, zSig, status);
+
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of adding the half-precision floating-point values `a'
+| and `b'. The operation is performed according to the IEC/IEEE Standard for
+| Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+float16 float16_add(float16 a, float16 b, float_status *status)
+{
+ flag aSign, bSign;
+ a = float16_squash_input_denormal(a, status);
+ b = float16_squash_input_denormal(b, status);
+
+ aSign = extractFloat16Sign( a );
+ bSign = extractFloat16Sign( b );
+ if ( aSign == bSign ) {
+ return addFloat16Sigs(a, b, aSign, status);
+ }
+ else {
+ return subFloat16Sigs(a, b, aSign, status);
+ }
+
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of subtracting the half-precision floating-point values
+| `a' and `b'. The operation is performed according to the IEC/IEEE Standard
+| for Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+float16 float16_sub(float16 a, float16 b, float_status *status)
+{
+ flag aSign, bSign;
+ a = float16_squash_input_denormal(a, status);
+ b = float16_squash_input_denormal(b, status);
+
+ aSign = extractFloat16Sign( a );
+ bSign = extractFloat16Sign( b );
+ if ( aSign == bSign ) {
+ return subFloat16Sigs(a, b, aSign, status);
+ }
+ else {
+ return addFloat16Sigs(a, b, aSign, status);
+ }
+
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of multiplying the half-precision floating-point values
+| `a' and `b'. The operation is performed according to the IEC/IEEE Standard
+| for Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+float16 float16_mul(float16 a, float16 b, float_status *status)
+{
+ flag aSign, bSign, zSign;
+ int aExp, bExp, zExp;
+ uint32_t aSig, bSig;
+ uint32_t zSig32; /* no zSig as zSig32 passed into rp&f */
+
+ a = float16_squash_input_denormal(a, status);
+ b = float16_squash_input_denormal(b, status);
+
+ aSig = extractFloat16Frac( a );
+ aExp = extractFloat16Exp( a );
+ aSign = extractFloat16Sign( a );
+ bSig = extractFloat16Frac( b );
+ bExp = extractFloat16Exp( b );
+ bSign = extractFloat16Sign( b );
+ zSign = aSign ^ bSign;
+ if ( aExp == 0x1F ) {
+ if ( aSig || ( ( bExp == 0x1F ) && bSig ) ) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ if ( ( bExp | bSig ) == 0 ) {
+ float_raise(float_flag_invalid, status);
+ return float16_default_nan(status);
+ }
+ return packFloat16( zSign, 0x1F, 0 );
+ }
+ if ( bExp == 0x1F ) {
+ if (bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ if ( ( aExp | aSig ) == 0 ) {
+ float_raise(float_flag_invalid, status);
+ return float16_default_nan(status);
+ }
+ return packFloat16( zSign, 0x1F, 0 );
+ }
+ if ( aExp == 0 ) {
+ if ( aSig == 0 ) return packFloat16( zSign, 0, 0 );
+ normalizeFloat16Subnormal( aSig, &aExp, &aSig );
+ }
+ if ( bExp == 0 ) {
+ if ( bSig == 0 ) return packFloat16( zSign, 0, 0 );
+ normalizeFloat16Subnormal( bSig, &bExp, &bSig );
+ }
+ zExp = aExp + bExp - 0xF;
+ /* Add implicit bit */
+ aSig = ( aSig | 0x0400 )<<4;
+ bSig = ( bSig | 0x0400 )<<5;
+ /* Max (format " => 0x%x" (* (lsh #x400 4) (lsh #x400 5))) => 0x20000000
+ * So shift so binary point from 30/29 to 23/22
+ */
+ shift32RightJamming( ( (uint32_t) aSig ) * bSig, 7, &zSig32 );
+ /* At this point the significand is at the same point as
+ * float32_mul, so we can do the same test */
+ if ( 0 <= (int32_t) ( zSig32<<1 ) ) {
+ zSig32 <<= 1;
+ --zExp;
+ }
+ return roundAndPackFloat16(zSign, zExp, zSig32, true, status);
+}
+
+/*----------------------------------------------------------------------------
+| Returns the result of dividing the half-precision floating-point value `a'
+| by the corresponding value `b'. The operation is performed according to the
+| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+float16 float16_div(float16 a, float16 b, float_status *status)
+{
+ flag aSign, bSign, zSign;
+ int aExp, bExp, zExp;
+ uint32_t aSig, bSig, zSig;
+ a = float16_squash_input_denormal(a, status);
+ b = float16_squash_input_denormal(b, status);
+
+ aSig = extractFloat16Frac( a );
+ aExp = extractFloat16Exp( a );
+ aSign = extractFloat16Sign( a );
+ bSig = extractFloat16Frac( b );
+ bExp = extractFloat16Exp( b );
+ bSign = extractFloat16Sign( b );
+ zSign = aSign ^ bSign;
+ if ( aExp == 0xFF ) {
+ if (aSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ if ( bExp == 0xFF ) {
+ if (bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ float_raise(float_flag_invalid, status);
+ return float16_default_nan(status);
+ }
+ return packFloat16( zSign, 0xFF, 0 );
+ }
+ if ( bExp == 0xFF ) {
+ if (bSig) {
+ return propagateFloat16NaN(a, b, status);
+ }
+ return packFloat16( zSign, 0, 0 );
+ }
+ if ( bExp == 0 ) {
+ if ( bSig == 0 ) {
+ if ( ( aExp | aSig ) == 0 ) {
+ float_raise(float_flag_invalid, status);
+ return float16_default_nan(status);
+ }
+ float_raise(float_flag_divbyzero, status);
+ return packFloat16( zSign, 0xFF, 0 );
+ }
+ normalizeFloat16Subnormal( bSig, &bExp, &bSig );
+ }
+ if ( aExp == 0 ) {
+ if ( aSig == 0 ) return packFloat16( zSign, 0, 0 );
+ normalizeFloat16Subnormal( aSig, &aExp, &aSig );
+ }
+ zExp = aExp - bExp + 0x7D;
+ aSig = ( aSig | 0x00800000 )<<7;
+ bSig = ( bSig | 0x00800000 )<<8;
+ if ( bSig <= ( aSig + aSig ) ) {
+ aSig >>= 1;
+ ++zExp;
+ }
+ zSig = ( ( (uint64_t) aSig )<<16 ) / bSig;
+ if ( ( zSig & 0x3F ) == 0 ) {
+ zSig |= ( (uint64_t) bSig * zSig != ( (uint64_t) aSig )<<16 );
+ }
+ return roundAndPackFloat16(zSign, zExp, zSig, true, status);
+
+}
+
/* Half precision floats come in two formats: standard IEEE and "ARM" format.
The latter gains extra exponent range by omitting the NaN/Inf encodings. */
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index d89fdf7675..f1d79b6d03 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -345,6 +345,12 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
/*----------------------------------------------------------------------------
| Software half-precision operations.
*----------------------------------------------------------------------------*/
+
+float16 float16_add(float16, float16, float_status *status);
+float16 float16_sub(float16, float16, float_status *status);
+float16 float16_mul(float16, float16, float_status *status);
+float16 float16_div(float16, float16, float_status *status);
+
int float16_is_quiet_nan(float16, float_status *status);
int float16_is_signaling_nan(float16, float_status *status);
float16 float16_maybe_silence_nan(float16, float_status *status);
--
2.14.1
next prev parent reply other threads:[~2017-10-13 16:25 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-13 16:24 [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress) Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 01/30] linux-user/main: support dfilter Alex Bennée
2017-10-13 20:36 ` Richard Henderson
2017-10-14 9:58 ` Laurent Vivier
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 02/30] arm: introduce ARM_V8_FP16 feature bit Alex Bennée
2017-10-13 20:44 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 03/30] include/exec/helper-head.h: support f16 in helper calls Alex Bennée
2017-10-13 20:44 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 04/30] target/arm/cpu.h: update comment for half-precision values Alex Bennée
2017-10-13 20:44 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 05/30] softfloat: implement propagateFloat16NaN Alex Bennée
2017-10-13 20:49 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 06/30] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
2017-10-13 20:51 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 07/30] fpu/softfloat: implement float16_abs helper Alex Bennée
2017-10-13 20:51 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 08/30] softfloat: add half-precision expansions for MINMAX fns Alex Bennée
2017-10-13 20:52 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 09/30] softfloat: propagate signalling NaNs in MINMAX Alex Bennée
2017-10-15 16:13 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 10/30] softfloat: improve comments on ARM NaN propagation Alex Bennée
2017-10-15 16:14 ` Richard Henderson
2017-10-15 16:54 ` Peter Maydell
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 11/30] target/arm: implement half-precision F(MIN|MAX)(V|NMV) Alex Bennée
2017-10-16 20:10 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 12/30] target/arm/translate-a64.c: handle_3same_64 comment fix Alex Bennée
2017-10-15 16:28 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 13/30] target/arm/translate-a64.c: AdvSIMD scalar 3 Same FP16 initial decode Alex Bennée
2017-10-16 20:16 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing Alex Bennée
2017-10-15 18:02 ` Richard Henderson
2017-10-16 8:20 ` Alex Bennée
2017-10-13 16:24 ` Alex Bennée [this message]
2017-10-16 22:01 ` [Qemu-devel] [RFC PATCH 15/30] softfloat: half-precision add/sub/mul/div support Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub) Alex Bennée
2017-10-16 22:08 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 17/30] target/arm/translate-a64.c: add FP16 FMULX Alex Bennée
2017-10-16 22:24 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 18/30] target/arm/translate-a64.c: add AdvSIMD scalar two-reg misc skeleton Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc Alex Bennée
2017-10-16 23:47 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 20/30] softfloat: half-precision compare functions Alex Bennée
2017-10-17 0:06 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero) Alex Bennée
2017-10-17 0:36 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 22/30] target/arm/translate-a64.c: add FP16 FAGCT to AdvSIMD 3 Same Alex Bennée
2017-10-17 0:39 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 23/30] softfloat: add float16_rem and float16_muladd (!CHECK) Alex Bennée
2017-10-17 2:17 ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 24/30] disas_simd_indexed: support half-precision operations Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 25/30] softfloat: float16_round_to_int Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 26/30] tests/test-softfloat: add a simple test framework Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 27/30] target/arm/translate-a64.c: add FP16 FRINTP to 2 reg misc Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16 conversion Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 29/30] tests/test-softfloat: add f16_to_int16 conversion test Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 30/30] target/arm/translate-a64.c: add FP16 FCVTPS to 2 reg misc Alex Bennée
2017-10-13 16:58 ` [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress) no-reply
2017-10-13 16:59 ` no-reply
2017-10-17 2:34 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171013162438.32458-16-alex.bennee@linaro.org \
--to=alex.bennee@linaro.org \
--cc=aurelien@aurel32.net \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).