qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: richard.henderson@linaro.org
Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Aurelien Jarno" <aurelien@aurel32.net>
Subject: [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16 conversion
Date: Fri, 13 Oct 2017 17:24:36 +0100	[thread overview]
Message-ID: <20171013162438.32458-29-alex.bennee@linaro.org> (raw)
In-Reply-To: <20171013162438.32458-1-alex.bennee@linaro.org>

I didn't have another reference for this so I wrote it from first
principles. The roundAndPackInt16 works with the same shifted input as
roundAndPacknt32 but with different constants for invalid testing for
overflow.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 98 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/fpu/softfloat.h |  1 +
 2 files changed, 99 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index dc7f5f6d88..63f7cd1226 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,6 +132,62 @@ static inline flag extractFloat16Sign(float16 a)
     return float16_val(a)>>15;
 }
 
+/*----------------------------------------------------------------------------
+| Takes a 32-bit fixed-point value `absZ' with binary point between bits 6
+| and 7, and returns the properly rounded 16-bit integer corresponding to the
+| input.  If `zSign' is 1, the input is negated before being converted to an
+| integer.  Bit 31 of `absZ' must be zero.  Ordinarily, the fixed-point input
+| is simply rounded to an integer, with the inexact exception raised if the
+| input cannot be represented exactly as an integer.  However, if the fixed-
+| point input is too large, the invalid exception is raised and the largest
+| positive or negative integer is returned.
+*----------------------------------------------------------------------------*/
+
+static int16_t roundAndPackInt16(flag zSign, uint32_t absZ, float_status *status)
+{
+    int8_t roundingMode;
+    flag roundNearestEven;
+    int8_t roundIncrement, roundBits;
+    int16_t z;
+
+    roundingMode = status->float_rounding_mode;
+    roundNearestEven = ( roundingMode == float_round_nearest_even );
+
+    switch (roundingMode) {
+    case float_round_nearest_even:
+    case float_round_ties_away:
+        roundIncrement = 0x40;
+        break;
+    case float_round_to_zero:
+        roundIncrement = 0;
+        break;
+    case float_round_up:
+        roundIncrement = zSign ? 0 : 0x7f;
+        break;
+    case float_round_down:
+        roundIncrement = zSign ? 0x7f : 0;
+        break;
+    default:
+        abort();
+    }
+    roundBits = absZ & 0x7F;
+
+    absZ = ( absZ + roundIncrement )>>7;
+    absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven );
+    z = absZ;
+    if ( zSign ) z = - z;
+
+    if ( ( absZ>>16 ) || ( z && ( ( z < 0 ) ^ zSign ) ) ) {
+        float_raise(float_flag_invalid, status);
+        return zSign ? (int16_t) 0x8000 : 0x7FFF;
+    }
+    if (roundBits) {
+        status->float_exception_flags |= float_flag_inexact;
+    }
+    return z;
+
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -4509,6 +4565,48 @@ int float16_unordered_quiet(float16 a, float16 b, float_status *status)
     return 0;
 }
 
+/*----------------------------------------------------------------------------
+| Returns the result of converting the half-precision floating-point value
+| `a' to the 16-bit two's complement integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| positive integer is returned.  Otherwise, if the conversion overflows, the
+| largest integer with the same sign as `a' is returned.
+*----------------------------------------------------------------------------*/
+
+int16_t float16_to_int16(float32 a, float_status *status)
+{
+    flag aSign;
+    int aExp;
+    uint32_t aSig;
+
+    a = float16_squash_input_denormal(a, status);
+    aSig = extractFloat16Frac( a );
+    aExp = extractFloat16Exp( a );
+    aSign = extractFloat16Sign( a );
+    if ( ( aExp == 0x1F ) && aSig ) aSign = 0;
+    if ( aExp ) aSig |= 0x0400; /* implicit bit */
+
+    /* At this point the binary point is between 10:9, we need to
+     * shift the significand it up by the +ve exponent to get the
+     * integer and then move the binary point down to the  7:6 for
+     * the final roundAnPackInt16.
+     *
+     * Even with the maximum +ve shift everything happily fits in the
+     * 32 bit aSig.
+     */
+    aExp -= 15; /* exp bias */
+    if (aExp >= 3) {
+        aSig <<= aExp - 3;
+    } else {
+        /* ensure small numbers still get rounded */
+        shift32RightJamming( aSig, 3 - aExp, &aSig );
+    }
+
+    return roundAndPackInt16(aSign, aSig, status);
+}
+
 /* Half precision floats come in two formats: standard IEEE and "ARM" format.
    The latter gains extra exponent range by omitting the NaN/Inf encodings.  */
 
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 856f67cf12..49517b19ea 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -338,6 +338,7 @@ static inline float64 uint16_to_float64(uint16_t v, float_status *status)
 | Software half-precision conversion routines.
 *----------------------------------------------------------------------------*/
 float16 float32_to_float16(float32, flag, float_status *status);
+int16_t float16_to_int16(float32 a, float_status *status);
 float32 float16_to_float32(float16, flag, float_status *status);
 float16 float64_to_float16(float64 a, flag ieee, float_status *status);
 float64 float16_to_float64(float16 a, flag ieee, float_status *status);
-- 
2.14.1

  parent reply	other threads:[~2017-10-13 16:32 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 16:24 [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress) Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 01/30] linux-user/main: support dfilter Alex Bennée
2017-10-13 20:36   ` Richard Henderson
2017-10-14  9:58   ` Laurent Vivier
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 02/30] arm: introduce ARM_V8_FP16 feature bit Alex Bennée
2017-10-13 20:44   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 03/30] include/exec/helper-head.h: support f16 in helper calls Alex Bennée
2017-10-13 20:44   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 04/30] target/arm/cpu.h: update comment for half-precision values Alex Bennée
2017-10-13 20:44   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 05/30] softfloat: implement propagateFloat16NaN Alex Bennée
2017-10-13 20:49   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 06/30] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
2017-10-13 20:51   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 07/30] fpu/softfloat: implement float16_abs helper Alex Bennée
2017-10-13 20:51   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 08/30] softfloat: add half-precision expansions for MINMAX fns Alex Bennée
2017-10-13 20:52   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 09/30] softfloat: propagate signalling NaNs in MINMAX Alex Bennée
2017-10-15 16:13   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 10/30] softfloat: improve comments on ARM NaN propagation Alex Bennée
2017-10-15 16:14   ` Richard Henderson
2017-10-15 16:54   ` Peter Maydell
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 11/30] target/arm: implement half-precision F(MIN|MAX)(V|NMV) Alex Bennée
2017-10-16 20:10   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 12/30] target/arm/translate-a64.c: handle_3same_64 comment fix Alex Bennée
2017-10-15 16:28   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 13/30] target/arm/translate-a64.c: AdvSIMD scalar 3 Same FP16 initial decode Alex Bennée
2017-10-16 20:16   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing Alex Bennée
2017-10-15 18:02   ` Richard Henderson
2017-10-16  8:20     ` Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 15/30] softfloat: half-precision add/sub/mul/div support Alex Bennée
2017-10-16 22:01   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub) Alex Bennée
2017-10-16 22:08   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 17/30] target/arm/translate-a64.c: add FP16 FMULX Alex Bennée
2017-10-16 22:24   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 18/30] target/arm/translate-a64.c: add AdvSIMD scalar two-reg misc skeleton Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc Alex Bennée
2017-10-16 23:47   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 20/30] softfloat: half-precision compare functions Alex Bennée
2017-10-17  0:06   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero) Alex Bennée
2017-10-17  0:36   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 22/30] target/arm/translate-a64.c: add FP16 FAGCT to AdvSIMD 3 Same Alex Bennée
2017-10-17  0:39   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 23/30] softfloat: add float16_rem and float16_muladd (!CHECK) Alex Bennée
2017-10-17  2:17   ` Richard Henderson
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 24/30] disas_simd_indexed: support half-precision operations Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 25/30] softfloat: float16_round_to_int Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 26/30] tests/test-softfloat: add a simple test framework Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 27/30] target/arm/translate-a64.c: add FP16 FRINTP to 2 reg misc Alex Bennée
2017-10-13 16:24 ` Alex Bennée [this message]
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 29/30] tests/test-softfloat: add f16_to_int16 conversion test Alex Bennée
2017-10-13 16:24 ` [Qemu-devel] [RFC PATCH 30/30] target/arm/translate-a64.c: add FP16 FCVTPS to 2 reg misc Alex Bennée
2017-10-13 16:58 ` [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress) no-reply
2017-10-13 16:59 ` no-reply
2017-10-17  2:34 ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171013162438.32458-29-alex.bennee@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).