[Qemu-devel] [PATCH v2 01/24] softfloat: Fix exception flag handling for float32_to_float16()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Cc: "Tom Musta" <tommusta@gmail.com>,
	"Peter Crosthwaite" <peter.crosthwaite@xilinx.com>,
	patches@linaro.org, "Michael Matz" <matz@suse.de>,
	"Alexander Graf" <agraf@suse.de>,
	"Claudio Fontana" <claudio.fontana@linaro.org>,
	"Dirk Mueller" <dmueller@suse.de>,
	"Will Newton" <will.newton@linaro.org>,
	"Laurent Desnogues" <laurent.desnogues@gmail.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	kvmarm@lists.cs.columbia.edu,
	"Christoffer Dall" <christoffer.dall@linaro.org>,
	"Richard Henderson" <rth@twiddle.net>
Subject: [Qemu-devel] [PATCH v2 01/24] softfloat: Fix exception flag handling for float32_to_float16()
Date: Mon,  6 Jan 2014 13:10:58 +0000	[thread overview]
Message-ID: <1389013881-15726-2-git-send-email-peter.maydell@linaro.org> (raw)
In-Reply-To: <1389013881-15726-1-git-send-email-peter.maydell@linaro.org>

Our float32 to float16 conversion routine was generating the correct
numerical answers, but not always setting the right set of exception
flags. Fix this, mostly by rearranging the code to more closely
resemble RoundAndPackFloat*, and in particular:
 * non-IEEE halfprec always raises Invalid for input NaNs
 * we need to check for the overflow case before underflow
 * we weren't getting the tininess-detected-after-rounding
   case correct (somewhat academic since only ARM uses halfprec
   and it is always tininess-detected-before-rounding)
 * non-IEEE halfprec overflow raises only Invalid, not
   Invalid + Inexact
 * we weren't setting Inexact when we should

Also add some clarifying comments about what the code is doing.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat.c | 105 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 66 insertions(+), 39 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index dbda61b..6a6b656 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3046,6 +3046,10 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM)
     uint32_t mask;
     uint32_t increment;
     int8 roundingMode;
+    int maxexp = ieee ? 15 : 16;
+    bool rounding_bumps_exp;
+    bool is_tiny = false;
+
     a = float32_squash_input_denormal(a STATUS_VAR);
 
     aSig = extractFloat32Frac( a );
@@ -3054,11 +3058,12 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM)
     if ( aExp == 0xFF ) {
         if (aSig) {
             /* Input is a NaN */
-            float16 r = commonNaNToFloat16( float32ToCommonNaN( a STATUS_VAR ) STATUS_VAR );
             if (!ieee) {
+                float_raise(float_flag_invalid STATUS_VAR);
                 return packFloat16(aSign, 0, 0);
             }
-            return r;
+            return commonNaNToFloat16(
+                float32ToCommonNaN(a STATUS_VAR) STATUS_VAR);
         }
         /* Infinity */
         if (!ieee) {
@@ -3070,58 +3075,80 @@ float16 float32_to_float16(float32 a, flag ieee STATUS_PARAM)
     if (aExp == 0 && aSig == 0) {
         return packFloat16(aSign, 0, 0);
     }
-    /* Decimal point between bits 22 and 23.  */
+    /* Decimal point between bits 22 and 23. Note that we add the 1 bit
+     * even if the input is denormal; however this is harmless because
+     * the largest possible single-precision denormal is still smaller
+     * than the smallest representable half-precision denormal, and so we
+     * will end up ignoring aSig and returning via the "always return zero"
+     * codepath.
+     */
     aSig |= 0x00800000;
     aExp -= 0x7f;
+    /* Calculate the mask of bits of the mantissa which are not
+     * representable in half-precision and will be lost.
+     */
     if (aExp < -14) {
+        /* Will be denormal in halfprec */
         mask = 0x00ffffff;
         if (aExp >= -24) {
             mask >>= 25 + aExp;
         }
     } else {
+        /* Normal number in halfprec */
         mask = 0x00001fff;
     }
-    if (aSig & mask) {
-        float_raise( float_flag_underflow STATUS_VAR );
-        roundingMode = STATUS(float_rounding_mode);
-        switch (roundingMode) {
-        case float_round_nearest_even:
-            increment = (mask + 1) >> 1;
-            if ((aSig & mask) == increment) {
-                increment = aSig & (increment << 1);
-            }
-            break;
-        case float_round_up:
-            increment = aSign ? 0 : mask;
-            break;
-        case float_round_down:
-            increment = aSign ? mask : 0;
-            break;
-        default: /* round_to_zero */
-            increment = 0;
-            break;
-        }
-        aSig += increment;
-        if (aSig >= 0x01000000) {
-            aSig >>= 1;
-            aExp++;
-        }
-    } else if (aExp < -14
-          && STATUS(float_detect_tininess) == float_tininess_before_rounding) {
-        float_raise( float_flag_underflow STATUS_VAR);
-    }
 
-    if (ieee) {
-        if (aExp > 15) {
-            float_raise( float_flag_overflow | float_flag_inexact STATUS_VAR);
+    roundingMode = STATUS(float_rounding_mode);
+    switch (roundingMode) {
+    case float_round_nearest_even:
+        increment = (mask + 1) >> 1;
+        if ((aSig & mask) == increment) {
+            increment = aSig & (increment << 1);
+        }
+        break;
+    case float_round_up:
+        increment = aSign ? 0 : mask;
+        break;
+    case float_round_down:
+        increment = aSign ? mask : 0;
+        break;
+    default: /* round_to_zero */
+        increment = 0;
+        break;
+    }
+
+    rounding_bumps_exp = (aSig + increment >= 0x01000000);
+
+    if (aExp > maxexp || (aExp == maxexp && rounding_bumps_exp)) {
+        if (ieee) {
+            float_raise(float_flag_overflow | float_flag_inexact STATUS_VAR);
             return packFloat16(aSign, 0x1f, 0);
-        }
-    } else {
-        if (aExp > 16) {
-            float_raise(float_flag_invalid | float_flag_inexact STATUS_VAR);
+        } else {
+            float_raise(float_flag_invalid STATUS_VAR);
             return packFloat16(aSign, 0x1f, 0x3ff);
         }
     }
+
+    if (aExp < -14) {
+        /* Note that flush-to-zero does not affect half-precision results */
+        is_tiny =
+            (STATUS(float_detect_tininess) == float_tininess_before_rounding)
+            || (aExp < -15)
+            || (!rounding_bumps_exp);
+    }
+    if (aSig & mask) {
+        float_raise(float_flag_inexact STATUS_VAR);
+        if (is_tiny) {
+            float_raise(float_flag_underflow STATUS_VAR);
+        }
+    }
+
+    aSig += increment;
+    if (rounding_bumps_exp) {
+        aSig >>= 1;
+        aExp++;
+    }
+
     if (aExp < -24) {
         return packFloat16(aSign, 0, 0);
     }
-- 
1.8.5

next prev parent reply	other threads:[~2014-01-06 13:27 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-06 13:10 [Qemu-devel] [PATCH v2 00/24] A64 decoder patchset 6: rest of floating point Peter Maydell
2014-01-06 13:10 ` Peter Maydell [this message]
2014-01-06 18:18   ` [Qemu-devel] [PATCH v2 01/24] softfloat: Fix exception flag handling for float32_to_float16() Richard Henderson
2014-01-06 13:10 ` [Qemu-devel] [PATCH v2 02/24] softfloat: Add float to 16bit integer conversions Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 03/24] softfloat: Add 16 bit integer to float conversions Peter Maydell
2014-01-06 18:19   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 04/24] softfloat: Make the int-to-float functions take exact-width types Peter Maydell
2014-01-06 18:20   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 05/24] softfloat: Fix float64_to_uint64 Peter Maydell
2014-01-06 18:23   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 06/24] softfloat: Only raise Invalid when conversions to int are out of range Peter Maydell
2014-01-06 18:24   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 07/24] softfloat: Fix factor 2 error for scalbn on denormal inputs Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 08/24] softfloat: Add float32_to_uint64() Peter Maydell
2014-01-06 18:25   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 09/24] softfloat: Fix float64_to_uint64_round_to_zero Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 10/24] softfloat: Fix float64_to_uint32 Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 11/24] softfloat: Fix float64_to_uint32_round_to_zero Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 12/24] softfloat: Provide complete set of accessors for fp state Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 13/24] softfloat: Factor out RoundAndPackFloat16 and NormalizeFloat16Subnormal Peter Maydell
2014-01-06 18:28   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 14/24] softfloat: Add float16 <=> float64 conversion functions Peter Maydell
2014-01-06 18:30   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 15/24] softfloat: Refactor code handling various rounding modes Peter Maydell
2014-01-06 18:34   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 16/24] softfloat: Add support for ties-away rounding Peter Maydell
2014-01-06 18:36   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 17/24] target-arm: Prepare VFP_CONV_FIX helpers for A64 uses Peter Maydell
2014-01-06 18:37   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 18/24] target-arm: Rename A32 VFP conversion helpers Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 19/24] target-arm: Ignore most exceptions from scalbn when doing fixpoint conversion Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 20/24] target-arm: A64: Add extra VFP fixed point conversion helpers Peter Maydell
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 21/24] target-arm: A64: Add "Floating-point<->fixed-point" instructions Peter Maydell
2014-01-06 18:44   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 22/24] target-arm: A64: Add floating-point<->integer conversion instructions Peter Maydell
2014-01-06 18:47   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 23/24] target-arm: A64: Add 1-source 32-to-32 and 64-to-64 FP instructions Peter Maydell
2014-01-06 18:52   ` Richard Henderson
2014-01-06 13:11 ` [Qemu-devel] [PATCH v2 24/24] target-arm: A64: Add support for FCVT between half, single and double Peter Maydell
2014-01-06 18:55   ` Richard Henderson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:dbda61b dfblob:6a6b656 )
 OR (
bs:"[Qemu-devel] [PATCH v2 01/24] softfloat: Fix exception flag handling for float32_to_float16()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1389013881-15726-2-git-send-email-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=agraf@suse.de \
    --cc=alex.bennee@linaro.org \
    --cc=christoffer.dall@linaro.org \
    --cc=claudio.fontana@linaro.org \
    --cc=dmueller@suse.de \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=laurent.desnogues@gmail.com \
    --cc=matz@suse.de \
    --cc=patches@linaro.org \
    --cc=peter.crosthwaite@xilinx.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tommusta@gmail.com \
    --cc=will.newton@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).