From: Peter Maydell <peter.maydell@linaro.org>
To: Anthony Liguori <aliguori@amazon.com>
Cc: Blue Swirl <blauwirbel@gmail.com>,
qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>
Subject: [Qemu-devel] [PULL 65/76] softfloat: Factor out RoundAndPackFloat16 and NormalizeFloat16Subnormal
Date: Tue, 7 Jan 2014 20:04:01 +0000 [thread overview]
Message-ID: <1389125052-22931-66-git-send-email-peter.maydell@linaro.org> (raw)
In-Reply-To: <1389125052-22931-1-git-send-email-peter.maydell@linaro.org>
In preparation for adding conversions between float16 and float64,
factor out code currently done inline in the float16<=>float32
conversion functions into functions RoundAndPackFloat16 and
NormalizeFloat16Subnormal along the lines of the existing versions
for the other float types.
Note that we change the handling of zExp from the inline code
to match the API of the other RoundAndPackFloat functions; however
we leave the positioning of the binary point between bits 22 and 23
rather than shifting it up to the high end of the word.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
fpu/softfloat.c | 209 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 125 insertions(+), 84 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4abcd36..c63e011 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3086,6 +3086,127 @@ static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig)
(((uint32_t)zSign) << 15) + (((uint32_t)zExp) << 10) + zSig);
}
+/*----------------------------------------------------------------------------
+| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
+| and significand `zSig', and returns the proper half-precision floating-
+| point value corresponding to the abstract input. Ordinarily, the abstract
+| value is simply rounded and packed into the half-precision format, with
+| the inexact exception raised if the abstract input cannot be represented
+| exactly. However, if the abstract value is too large, the overflow and
+| inexact exceptions are raised and an infinity or maximal finite value is
+| returned. If the abstract value is too small, the input value is rounded to
+| a subnormal number, and the underflow and inexact exceptions are raised if
+| the abstract input cannot be represented exactly as a subnormal half-
+| precision floating-point number.
+| The `ieee' flag indicates whether to use IEEE standard half precision, or
+| ARM-style "alternative representation", which omits the NaN and Inf
+| encodings in order to raise the maximum representable exponent by one.
+| The input significand `zSig' has its binary point between bits 22
+| and 23, which is 13 bits to the left of the usual location. This shifted
+| significand must be normalized or smaller. If `zSig' is not normalized,
+| `zExp' must be 0; in that case, the result returned is a subnormal number,
+| and it must not require rounding. In the usual case that `zSig' is
+| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
+| Note the slightly odd position of the binary point in zSig compared with the
+| other roundAndPackFloat functions. This should probably be fixed if we
+| need to implement more float16 routines than just conversion.
+| The handling of underflow and overflow follows the IEC/IEEE Standard for
+| Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+static float32 roundAndPackFloat16(flag zSign, int_fast16_t zExp,
+ uint32_t zSig, flag ieee STATUS_PARAM)
+{
+ int maxexp = ieee ? 29 : 30;
+ uint32_t mask;
+ uint32_t increment;
+ int8 roundingMode;
+ bool rounding_bumps_exp;
+ bool is_tiny = false;
+
+ /* Calculate the mask of bits of the mantissa which are not
+ * representable in half-precision and will be lost.
+ */
+ if (zExp < 1) {
+ /* Will be denormal in halfprec */
+ mask = 0x00ffffff;
+ if (zExp >= -11) {
+ mask >>= 11 + zExp;
+ }
+ } else {
+ /* Normal number in halfprec */
+ mask = 0x00001fff;
+ }
+
+ roundingMode = STATUS(float_rounding_mode);
+ switch (roundingMode) {
+ case float_round_nearest_even:
+ increment = (mask + 1) >> 1;
+ if ((zSig & mask) == increment) {
+ increment = zSig & (increment << 1);
+ }
+ break;
+ case float_round_up:
+ increment = zSign ? 0 : mask;
+ break;
+ case float_round_down:
+ increment = zSign ? mask : 0;
+ break;
+ default: /* round_to_zero */
+ increment = 0;
+ break;
+ }
+
+ rounding_bumps_exp = (zSig + increment >= 0x01000000);
+
+ if (zExp > maxexp || (zExp == maxexp && rounding_bumps_exp)) {
+ if (ieee) {
+ float_raise(float_flag_overflow | float_flag_inexact STATUS_VAR);
+ return packFloat16(zSign, 0x1f, 0);
+ } else {
+ float_raise(float_flag_invalid STATUS_VAR);
+ return packFloat16(zSign, 0x1f, 0x3ff);
+ }
+ }
+
+ if (zExp < 0) {
+ /* Note that flush-to-zero does not affect half-precision results */
+ is_tiny =
+ (STATUS(float_detect_tininess) == float_tininess_before_rounding)
+ || (zExp < -1)
+ || (!rounding_bumps_exp);
+ }
+ if (zSig & mask) {
+ float_raise(float_flag_inexact STATUS_VAR);
+ if (is_tiny) {
+ float_raise(float_flag_underflow STATUS_VAR);
+ }
+ }
+
+ zSig += increment;
+ if (rounding_bumps_exp) {
+ zSig >>= 1;
+ zExp++;
+ }
+
+ if (zExp < -10) {
+ return packFloat16(zSign, 0, 0);
+ }
+ if (zExp < 0) {
+ zSig >>= -zExp;
+ zExp = 0;
+ }
+ return packFloat16(zSign, zExp, zSig >> 13);
+}
+
+static void normalizeFloat16Subnormal(uint32_t aSig, int_fast16_t *zExpPtr,
+ uint32_t *zSigPtr)
+{
+ int8_t shiftCount = countLeadingZeros32(aSig) - 21;
+ *zSigPtr = aSig << shiftCount;
+ *zExpPtr = 1 - shiftCount;
+}
+
/* Half precision floats come in two formats: standard IEEE and "ARM" format.
The latter gains extra exponent range by omitting the NaN/Inf encodings. */
@@ -3106,15 +3227,12 @@ float32 float16_to_float32(float16 a, flag ieee STATUS_PARAM)
return packFloat32(aSign, 0xff, 0);
}
if (aExp == 0) {
- int8 shiftCount;
-
if (aSig == 0) {
return packFloat32(aSign, 0, 0);
}
- shiftCount = countLeadingZeros32( aSig ) - 21;
- aSig = aSig << shiftCount;
- aExp = -shiftCount;
+ normalizeFloat16Subnormal(aSig, &aExp, &aSig);
+ aExp--;
}
return packFloat32( aSign, aExp + 0x70, aSig << 13);
}
@@ -3124,12 +3242,6 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM)
flag aSign;
int_fast16_t aExp;
uint32_t aSig;
- uint32_t mask;
- uint32_t increment;
- int8 roundingMode;
- int maxexp = ieee ? 15 : 16;
- bool rounding_bumps_exp;
- bool is_tiny = false;
a = float32_squash_input_denormal(a STATUS_VAR);
@@ -3164,80 +3276,9 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM)
* codepath.
*/
aSig |= 0x00800000;
- aExp -= 0x7f;
- /* Calculate the mask of bits of the mantissa which are not
- * representable in half-precision and will be lost.
- */
- if (aExp < -14) {
- /* Will be denormal in halfprec */
- mask = 0x00ffffff;
- if (aExp >= -24) {
- mask >>= 25 + aExp;
- }
- } else {
- /* Normal number in halfprec */
- mask = 0x00001fff;
- }
+ aExp -= 0x71;
- roundingMode = STATUS(float_rounding_mode);
- switch (roundingMode) {
- case float_round_nearest_even:
- increment = (mask + 1) >> 1;
- if ((aSig & mask) == increment) {
- increment = aSig & (increment << 1);
- }
- break;
- case float_round_up:
- increment = aSign ? 0 : mask;
- break;
- case float_round_down:
- increment = aSign ? mask : 0;
- break;
- default: /* round_to_zero */
- increment = 0;
- break;
- }
-
- rounding_bumps_exp = (aSig + increment >= 0x01000000);
-
- if (aExp > maxexp || (aExp == maxexp && rounding_bumps_exp)) {
- if (ieee) {
- float_raise(float_flag_overflow | float_flag_inexact STATUS_VAR);
- return packFloat16(aSign, 0x1f, 0);
- } else {
- float_raise(float_flag_invalid STATUS_VAR);
- return packFloat16(aSign, 0x1f, 0x3ff);
- }
- }
-
- if (aExp < -14) {
- /* Note that flush-to-zero does not affect half-precision results */
- is_tiny =
- (STATUS(float_detect_tininess) == float_tininess_before_rounding)
- || (aExp < -15)
- || (!rounding_bumps_exp);
- }
- if (aSig & mask) {
- float_raise(float_flag_inexact STATUS_VAR);
- if (is_tiny) {
- float_raise(float_flag_underflow STATUS_VAR);
- }
- }
-
- aSig += increment;
- if (rounding_bumps_exp) {
- aSig >>= 1;
- aExp++;
- }
-
- if (aExp < -24) {
- return packFloat16(aSign, 0, 0);
- }
- if (aExp < -14) {
- aSig >>= -14 - aExp;
- aExp = -14;
- }
- return packFloat16(aSign, aExp + 14, aSig >> 13);
+ return roundAndPackFloat16(aSign, aExp, aSig, ieee STATUS_VAR);
}
/*----------------------------------------------------------------------------
--
1.8.5
next prev parent reply other threads:[~2014-01-07 20:04 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 20:02 [Qemu-devel] [PULL 00/76] target-arm queue Peter Maydell
2014-01-07 20:02 ` [Qemu-devel] [PULL 01/76] target-arm: A64: add support for ld/st pair Peter Maydell
2014-01-07 20:02 ` [Qemu-devel] [PULL 02/76] target-arm: A64: add support for ld/st unsigned imm Peter Maydell
2014-01-07 20:02 ` [Qemu-devel] [PULL 03/76] target-arm: A64: add support for ld/st with reg offset Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 04/76] target-arm: A64: add support for ld/st with index Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 05/76] target-arm: A64: add support for add, addi, sub, subi Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 06/76] target-arm: A64: add support for move wide instructions Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 07/76] target-arm: A64: add support for 3 src data proc insns Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 08/76] target-arm: A64: implement SVC, BRK Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 09/76] target-arm: A64: Add decoder skeleton for FP instructions Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 10/76] target-arm: A64: implement FMOV Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 11/76] target-arm: Pull "add one cpreg to hashtable" into its own function Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 12/76] target-arm: Update generic cpreg code for AArch64 Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 13/76] target-arm: Remove ARMCPU/CPUARMState from cpregs APIs used by decoder Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 14/76] target-arm: A64: Implement MRS/MSR/SYS/SYSL Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 15/76] target-arm: A64: Implement minimal set of EL0-visible sysregs Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 16/76] target-arm: Widen thread-local register state fields to 64 bits Peter Maydell
2014-01-08 18:32 ` Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 17/76] target-arm: A64: add support for add/sub with carry Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 18/76] target-arm: A64: add support for conditional compare insns Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 19/76] target-arm: aarch64: add support for ld lit Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 20/76] target-arm: Widen exclusive-access support struct fields to 64 bits Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 21/76] target-arm: A64: support for ld/st/cl exclusive Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 22/76] linux-user: AArch64: define TARGET_CLONE_BACKWARDS Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 23/76] linux-user: AArch64: Use correct values for FPSR/FPCR in sigcontext Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 24/76] .travis.yml: Add aarch64-* targets Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 25/76] default-configs: Add config for aarch64-linux-user Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 26/76] target-arm: A64: Add support for dumping AArch64 VFP register state Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 27/76] target-arm: A64: Fix vector register access on bigendian hosts Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 28/76] target-arm: Use VFP_BINOP macro for min, max, minnum, maxnum Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 29/76] target-arm: A64: Add "Floating-point data-processing (2 source)" insns Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 30/76] target-arm: A64: Add "Floating-point data-processing (3 " Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 31/76] target-arm: A64: Add fmov (scalar, immediate) instruction Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 32/76] target-arm: A64: Add support for floating point compare Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 33/76] target-arm: A64: Add support for floating point conditional compare Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 34/76] target-arm: A64: Add support for floating point cond select Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 35/76] target-arm: Give the FPSCR rounding modes names Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 36/76] char/cadence_uart: Mark struct fields as public/private Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 37/76] char/cadence_uart: Add missing uart_update_state Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 38/76] char/cadence_uart: Fix reset Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 39/76] char/cadence_uart: s/r_fifo/rx_fifo Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 40/76] char/cadence_uart: Simplify status generation Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 41/76] char/cadence_uart: Define Missing SR/ISR fields Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 42/76] char/cadence_uart: Remove TX timer & add TX FIFO state Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 43/76] char/cadence_uart: Fix can_receive logic Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 44/76] char/cadence_uart: Use the TX fifo for transmission Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 45/76] char/cadence_uart: Delete redundant rx rst logic Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 46/76] char/cadence_uart: Implement Tx flow control Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 47/76] target-arm: use c13_context field for CONTEXTIDR Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 48/76] target-arm: remove raw_read|write duplication Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 49/76] arm/xilinx_zynq: Always instantiate the GEMs Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 50/76] target-arm: fix build with gcc 4.8.2 Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 51/76] arm_gic: Rename GIC_X_TRIGGER to GIC_X_EDGE_TRIGGER Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 52/76] hw: arm_gic: Introduce gic_set_priority function Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 53/76] softfloat: Fix exception flag handling for float32_to_float16() Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 54/76] softfloat: Add float to 16bit integer conversions Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 55/76] softfloat: Add 16 bit integer to float conversions Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 56/76] softfloat: Make the int-to-float functions take exact-width types Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 57/76] softfloat: Fix float64_to_uint64 Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 58/76] softfloat: Only raise Invalid when conversions to int are out of range Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 59/76] softfloat: Fix factor 2 error for scalbn on denormal inputs Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 60/76] softfloat: Add float32_to_uint64() Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 61/76] softfloat: Fix float64_to_uint64_round_to_zero Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 62/76] softfloat: Fix float64_to_uint32 Peter Maydell
2014-01-07 20:03 ` [Qemu-devel] [PULL 63/76] softfloat: Fix float64_to_uint32_round_to_zero Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 64/76] softfloat: Provide complete set of accessors for fp state Peter Maydell
2014-01-07 20:04 ` Peter Maydell [this message]
2014-01-07 20:04 ` [Qemu-devel] [PULL 66/76] softfloat: Add float16 <=> float64 conversion functions Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 67/76] softfloat: Refactor code handling various rounding modes Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 68/76] softfloat: Add support for ties-away rounding Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 69/76] target-arm: Prepare VFP_CONV_FIX helpers for A64 uses Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 70/76] target-arm: Rename A32 VFP conversion helpers Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 71/76] target-arm: Ignore most exceptions from scalbn when doing fixpoint conversion Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 72/76] target-arm: A64: Add extra VFP fixed point conversion helpers Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 73/76] target-arm: A64: Add floating-point<->fixed-point instructions Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 74/76] target-arm: A64: Add floating-point<->integer conversion instructions Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 75/76] target-arm: A64: Add 1-source 32-to-32 and 64-to-64 FP instructions Peter Maydell
2014-01-07 20:04 ` [Qemu-devel] [PULL 76/76] target-arm: A64: Add support for FCVT between half, single and double Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1389125052-22931-66-git-send-email-peter.maydell@linaro.org \
--to=peter.maydell@linaro.org \
--cc=aliguori@amazon.com \
--cc=aurelien@aurel32.net \
--cc=blauwirbel@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).