[Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions.
@ 2011-02-16 17:59 christophe.lyon
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign() christophe.lyon
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: christophe.lyon @ 2011-02-16 17:59 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

These 3 patches fix the ARM Neon vrecpe and vrsqrte instructions by
matching the algorithms descibed in the ARM ARM.

With these patches, qemu passes my ARM/Neon tests.

Patch #1 modifies softfloat by exporting float32_default_nan and
float32_infinity. For consistency, I have also moved all the
target-dependent definitions of floatXX_default_nan to softfloat.h (ie
the 16, 64, x80 and 128 bits versions in addition to the 32 bits
ones).

It also adds float32_set_sign() to help return the right special
values (-0, -infinity).

Patch #2 uses these newly exported values and uses the vrecpe
algorithm described in the ARM ARM.

Patch #3 uses these newly exported values and uses the vrsqrte
algorithm described in the ARM ARM.

Christophe Lyon (3):
  softfloat: export float32_default_nan, and float32_infinity. Add
    float32_set_sign().
  target-arm: fix support for vrecpe.
  target-arm: fix support for vrsqrte.

 fpu/softfloat-specialize.h |   68 ---------------
 fpu/softfloat.h            |   75 ++++++++++++++++
 target-arm/helper.c        |  206 +++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 259 insertions(+), 90 deletions(-)

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign().
  2011-02-16 17:59 [Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions christophe.lyon
@ 2011-02-16 17:59 ` christophe.lyon
  2011-02-17 12:56   ` Peter Maydell
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe christophe.lyon
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte christophe.lyon
  2 siblings, 1 reply; 9+ messages in thread
From: christophe.lyon @ 2011-02-16 17:59 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

These special values are needed to implement some helper functions,
which return these values in some cases.

This patch also moves the definitions of default_nan for 16, 64, x80
and 128 bits floats for consistency with float32.

Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
---
 fpu/softfloat-specialize.h |   68 ---------------------------------------
 fpu/softfloat.h            |   75 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index 2d025bf..adc5ada 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -30,12 +30,6 @@ these four paragraphs for those parts of this code that are retained.
 
 =============================================================================*/
 
-#if defined(TARGET_MIPS) || defined(TARGET_SH4)
-#define SNAN_BIT_IS_ONE		1
-#else
-#define SNAN_BIT_IS_ONE		0
-#endif
-
 /*----------------------------------------------------------------------------
 | Raises the exceptions specified by `flags'.  Floating-point traps can be
 | defined here if desired.  It is currently not possible for such a trap
@@ -57,17 +51,6 @@ typedef struct {
 } commonNaNT;
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated half-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_ARM)
-#define float16_default_nan make_float16(0x7E00)
-#elif SNAN_BIT_IS_ONE
-#define float16_default_nan make_float16(0x7DFF)
-#else
-#define float16_default_nan make_float16(0xFE00)
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the half-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -158,19 +141,6 @@ static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated single-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_SPARC)
-#define float32_default_nan make_float32(0x7FFFFFFF)
-#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
-#define float32_default_nan make_float32(0x7FC00000)
-#elif SNAN_BIT_IS_ONE
-#define float32_default_nan make_float32(0x7FBFFFFF)
-#else
-#define float32_default_nan make_float32(0xFFC00000)
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the single-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -413,19 +383,6 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated double-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_SPARC)
-#define float64_default_nan make_float64(LIT64( 0x7FFFFFFFFFFFFFFF ))
-#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
-#define float64_default_nan make_float64(LIT64( 0x7FF8000000000000 ))
-#elif SNAN_BIT_IS_ONE
-#define float64_default_nan make_float64(LIT64( 0x7FF7FFFFFFFFFFFF ))
-#else
-#define float64_default_nan make_float64(LIT64( 0xFFF8000000000000 ))
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the double-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -564,19 +521,6 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM)
 #ifdef FLOATX80
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated extended double-precision NaN.  The
-| `high' and `low' values hold the most- and least-significant bits,
-| respectively.
-*----------------------------------------------------------------------------*/
-#if SNAN_BIT_IS_ONE
-#define floatx80_default_nan_high 0x7FFF
-#define floatx80_default_nan_low  LIT64( 0xBFFFFFFFFFFFFFFF )
-#else
-#define floatx80_default_nan_high 0xFFFF
-#define floatx80_default_nan_low  LIT64( 0xC000000000000000 )
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the extended double-precision floating-point value `a' is a
 | quiet NaN; otherwise returns 0. This slightly differs from the same
 | function for other types as floatx80 has an explicit bit.
@@ -728,18 +672,6 @@ static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM)
 #ifdef FLOAT128
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated quadruple-precision NaN.  The `high' and
-| `low' values hold the most- and least-significant bits, respectively.
-*----------------------------------------------------------------------------*/
-#if SNAN_BIT_IS_ONE
-#define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF )
-#define float128_default_nan_low  LIT64( 0xFFFFFFFFFFFFFFFF )
-#else
-#define float128_default_nan_high LIT64( 0xFFFF800000000000 )
-#define float128_default_nan_low  LIT64( 0x0000000000000000 )
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the quadruple-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
diff --git a/fpu/softfloat.h b/fpu/softfloat.h
index e57ee1e..aa9e246 100644
--- a/fpu/softfloat.h
+++ b/fpu/softfloat.h
@@ -77,6 +77,12 @@ typedef int64_t sbits64;
 #define LIT64( a ) a##LL
 #define INLINE static inline
 
+#if defined(TARGET_MIPS) || defined(TARGET_SH4)
+#define SNAN_BIT_IS_ONE		1
+#else
+#define SNAN_BIT_IS_ONE		0
+#endif
+
 /*----------------------------------------------------------------------------
 | The macro `FLOATX80' must be defined to enable the extended double-precision
 | floating-point format `floatx80'.  If this macro is not defined, the
@@ -278,6 +284,17 @@ int float16_is_signaling_nan( float16 );
 float16 float16_maybe_silence_nan( float16 );
 
 /*----------------------------------------------------------------------------
+| The pattern for a default generated half-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_ARM)
+#define float16_default_nan make_float16(0x7E00)
+#elif SNAN_BIT_IS_ONE
+#define float16_default_nan make_float16(0x7DFF)
+#else
+#define float16_default_nan make_float16(0xFE00)
+#endif
+
+/*----------------------------------------------------------------------------
 | Software IEC/IEEE single-precision conversion routines.
 *----------------------------------------------------------------------------*/
 int float32_to_int16_round_to_zero( float32 STATUS_PARAM );
@@ -362,9 +379,29 @@ INLINE int float32_is_zero_or_denormal(float32 a)
     return (float32_val(a) & 0x7f800000) == 0;
 }
 
+INLINE float32 float32_set_sign(float32 a, int sign)
+{
+    return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
+}
+
 #define float32_zero make_float32(0)
 #define float32_one make_float32(0x3f800000)
 #define float32_ln2 make_float32(0x3f317218)
+#define float32_infinity make_float32(0x7f800000)
+
+
+/*----------------------------------------------------------------------------
+| The pattern for a default generated single-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_SPARC)
+#define float32_default_nan make_float32(0x7FFFFFFF)
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+#define float32_default_nan make_float32(0x7FC00000)
+#elif SNAN_BIT_IS_ONE
+#define float32_default_nan make_float32(0x7FBFFFFF)
+#else
+#define float32_default_nan make_float32(0xFFC00000)
+#endif
 
 /*----------------------------------------------------------------------------
 | Software IEC/IEEE double-precision conversion routines.
@@ -452,6 +489,19 @@ INLINE int float64_is_any_nan(float64 a)
 #define float64_one make_float64(0x3ff0000000000000LL)
 #define float64_ln2 make_float64(0x3fe62e42fefa39efLL)
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated double-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_SPARC)
+#define float64_default_nan make_float64(LIT64( 0x7FFFFFFFFFFFFFFF ))
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+#define float64_default_nan make_float64(LIT64( 0x7FF8000000000000 ))
+#elif SNAN_BIT_IS_ONE
+#define float64_default_nan make_float64(LIT64( 0x7FF7FFFFFFFFFFFF ))
+#else
+#define float64_default_nan make_float64(LIT64( 0xFFF8000000000000 ))
+#endif
+
 #ifdef FLOATX80
 
 /*----------------------------------------------------------------------------
@@ -520,6 +570,19 @@ INLINE int floatx80_is_any_nan(floatx80 a)
     return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
 }
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated extended double-precision NaN.  The
+| `high' and `low' values hold the most- and least-significant bits,
+| respectively.
+*----------------------------------------------------------------------------*/
+#if SNAN_BIT_IS_ONE
+#define floatx80_default_nan_high 0x7FFF
+#define floatx80_default_nan_low  LIT64( 0xBFFFFFFFFFFFFFFF )
+#else
+#define floatx80_default_nan_high 0xFFFF
+#define floatx80_default_nan_low  LIT64( 0xC000000000000000 )
+#endif
+
 #endif
 
 #ifdef FLOAT128
@@ -593,6 +656,18 @@ INLINE int float128_is_any_nan(float128 a)
         ((a.low != 0) || ((a.high & 0xffffffffffffLL) != 0));
 }
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated quadruple-precision NaN.  The `high' and
+| `low' values hold the most- and least-significant bits, respectively.
+*----------------------------------------------------------------------------*/
+#if SNAN_BIT_IS_ONE
+#define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF )
+#define float128_default_nan_low  LIT64( 0xFFFFFFFFFFFFFFFF )
+#else
+#define float128_default_nan_high LIT64( 0xFFFF800000000000 )
+#define float128_default_nan_low  LIT64( 0x0000000000000000 )
+#endif
+
 #endif
 
 #else /* CONFIG_SOFTFLOAT */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe.
  2011-02-16 17:59 [Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions christophe.lyon
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign() christophe.lyon
@ 2011-02-16 17:59 ` christophe.lyon
  2011-02-17 12:23   ` Peter Maydell
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte christophe.lyon
  2 siblings, 1 reply; 9+ messages in thread
From: christophe.lyon @ 2011-02-16 17:59 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

Now use the same algorithm as described in the ARM ARM.

Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
---
 target-arm/helper.c |   84 +++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 7f63a28..7751d21 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2687,13 +2687,68 @@ float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUState *env)
 
 /* NEON helpers.  */
 
-/* TODO: The architecture specifies the value that the estimate functions
-   should return.  We return the exact reciprocal/root instead.  */
+/* The algorithm that must be used to calculate the estimate
+ * is specified by the ARM ARM.
+ */
+static float64 recip_estimate(float64 a, CPUState *env)
+{
+    float_status *s = &env->vfp.standard_fp_status;
+    float64 one = int64_to_float64(1, s);
+    /* q = (int)(a * 512.0) */
+    float64 x512 = int64_to_float64(512, s);
+    float64 q = float64_mul(x512, a, s);
+    int64_t q_int = float64_to_int64_round_to_zero(q, s);
+
+    /* r = 1.0 / (((double)q + 0.5) / 512.0) */
+    q = int64_to_float64(q_int, s);
+    float64 half = float64_div(one, int64_to_float64(2, s), s);
+    q = float64_add(q, half, s);
+    q = float64_div(q, x512, s);
+    q = float64_div(one, q, s);
+
+    /* s = (int)(256.0 * r + 0.5) */
+    float64 x256 = int64_to_float64(256, s);
+    q = float64_mul(q, x256, s);
+    q = float64_add(q, half, s);
+    q_int = float64_to_int64_round_to_zero(q, s);
+
+    /* return (double)s / 256.0 */
+    return float64_div(int64_to_float64(q_int, s), x256, s);
+}
+
 float32 HELPER(recpe_f32)(float32 a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 one = int32_to_float32(1, s);
-    return float32_div(one, a, s);
+    float_status *s = &env->vfp.standard_fp_status;
+    float64 f64;
+    uint32_t val32 = float32_val(a);
+
+    int result_exp;
+    int a_exp = (val32  & 0x7f800000) >> 23;
+    int sign = val32 & 0x80000000;
+
+    if (float32_is_any_nan(a)) {
+        return float32_default_nan;
+    } else if (float32_is_infinity(a)) {
+        return float32_set_sign(float32_zero, float32_is_neg(a));
+    } else if (float32_is_zero_or_denormal(a)) {
+        float_raise(float_flag_divbyzero, s);
+        return float32_set_sign(float32_infinity, float32_is_neg(a));
+    } else if (a_exp >= 253) {
+        float_raise(float_flag_underflow, s);
+        return float32_set_sign(float32_zero, float32_is_neg(a));
+    }
+
+    f64 = make_float64((0x3feULL << 52)
+                       | ((int64_t)(val32 & 0x7fffff) << 29));
+
+    result_exp = 253 - a_exp;
+
+    f64 = recip_estimate(f64, env);
+
+    val32 = sign
+        | ((result_exp & 0xff) << 23)
+        | ((float64_val(f64) >> 29) & 0x7fffff);
+    return make_float32(val32);
 }
 
 float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
@@ -2705,13 +2760,18 @@ float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
 
 uint32_t HELPER(recpe_u32)(uint32_t a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 tmp;
-    tmp = int32_to_float32(a, s);
-    tmp = float32_scalbn(tmp, -32, s);
-    tmp = helper_recpe_f32(tmp, env);
-    tmp = float32_scalbn(tmp, 31, s);
-    return float32_to_int32(tmp, s);
+    float64 f64;
+
+    if ((a & 0x80000000) == 0) {
+        return 0xffffffff;
+    }
+
+    f64 = make_float64((0x3feULL << 52)
+                       | ((int64_t)(a & 0x7fffffff) << 21));
+
+    f64 = recip_estimate (f64, env);
+
+    return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff);
 }
 
 uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUState *env)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte.
  2011-02-16 17:59 [Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions christophe.lyon
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign() christophe.lyon
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe christophe.lyon
@ 2011-02-16 17:59 ` christophe.lyon
  2011-02-17 12:49   ` Peter Maydell
  2 siblings, 1 reply; 9+ messages in thread
From: christophe.lyon @ 2011-02-16 17:59 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

Now use the same algorithm as described in the ARM ARM.

Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
---
 target-arm/helper.c |  122 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 112 insertions(+), 10 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 7751d21..f0f2231 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2751,11 +2751,105 @@ float32 HELPER(recpe_f32)(float32 a, CPUState *env)
     return make_float32(val32);
 }
 
+/* The algorithm that must be used to calculate the estimate
+ * is specified by the ARM ARM.
+ */
+static float64 recip_sqrt_estimate(float64 a, CPUState *env)
+{
+    float_status *s = &env->vfp.standard_fp_status;
+    float64 one = int64_to_float64(1, s);
+    float64 half = float64_div(one, int64_to_float64(2, s), s);
+    float64 x256 = int64_to_float64(256, s);
+    float64 q;
+    int64_t q_int;
+
+    if (float64_lt(a, half, s)) {
+        /* range 0.25 <= a < 0.5 */
+
+        /* a in units of 1/512 rounded down */
+        /* q0 = (int)(a * 512.0);  */
+        float64 x512 = int64_to_float64(512, s);
+        q = float64_mul(x512, a, s);
+        q_int = float64_to_int64_round_to_zero(q, s);
+
+        /* reciprocal root r */
+        /* r = 1.0 / sqrt(((double)q0 + 0.5) / 512.0);  */
+        q = int64_to_float64(q_int, s);
+        q = float64_add(q, half, s);
+        q = float64_div(q, x512, s);
+        q = float64_sqrt(q, s);
+        q = float64_div(one, q, s);
+    } else {
+        /* range 0.5 <= a < 1.0 */
+
+        /* a in units of 1/256 rounded down */
+        /* q1 = (int)(a * 256.0); */
+        q = float64_mul(x256, a, s);
+        int64_t q_int = float64_to_int64_round_to_zero(q, s);
+
+        /* reciprocal root r */
+        /* r = 1.0 /sqrt(((double)q1 + 0.5) / 256); */
+        q = int64_to_float64(q_int, s);
+        q = float64_add(q, half, s);
+        q = float64_div(q, x256, s);
+        q = float64_sqrt(q, s);
+        q = float64_div(one, q, s);
+    }
+    /* r in units of 1/256 rounded to nearest */
+    /* s = (int)(256.0 * r + 0.5); */
+
+    q = float64_mul(q, x256,s );
+    q = float64_add(q, half, s);
+    q_int = float64_to_int64_round_to_zero(q, s);
+
+    /* return (double)s / 256.0;*/
+    return float64_div(int64_to_float64(q_int, s), x256, s);
+}
+
 float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 one = int32_to_float32(1, s);
-    return float32_div(one, float32_sqrt(a, s), s);
+    float_status *s = &env->vfp.standard_fp_status;
+    int result_exp;
+    float64 f64;
+    int32_t val;
+    int64_t val64;
+
+    val = float32_val(a);
+
+    if (float32_is_any_nan(a)) {
+        return float32_default_nan;
+    } else if (float32_is_zero(a)) {
+        float_raise(float_flag_divbyzero, s);
+        return float32_set_sign(float32_infinity, float32_is_neg(a));
+    } else if (val < 0) {
+        float_raise(float_flag_invalid, s);
+        return float32_default_nan;
+    } else if (float32_is_infinity(a)) {
+        return float32_zero;
+    }
+
+    /* Normalize to a double-precision value between 0.25 and 1.0,
+     * preserving the parity of the exponent.  */
+    if ((val & 0x800000) == 0) {
+        f64 = make_float64(((uint64_t)(val & 0x80000000) << 32)
+                           | (0x3feULL << 52)
+                           | ((uint64_t)(val & 0x7ffffff) << 29));
+    } else {
+        f64 = make_float64(((uint64_t)(val & 0x80000000) << 32)
+                           | (0x3fdULL << 52)
+                           | ((uint64_t)(val & 0x7ffffff) << 29));
+    }
+
+    result_exp = (380 - ((val & 0x7f800000) >> 23)) / 2;
+
+    f64 = recip_sqrt_estimate(f64, env);
+
+    val64 = float64_val(f64);
+
+    val = ((val64 >> 63)  & 0x80000000)
+        | ((result_exp & 0xff) << 23)
+        | ((val64 >> 29)  & 0x7fffff);
+    return make_float32(val);
 }
 
 uint32_t HELPER(recpe_u32)(uint32_t a, CPUState *env)
@@ -2776,13 +2870,21 @@ uint32_t HELPER(recpe_u32)(uint32_t a, CPUState *env)
 
 uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 tmp;
-    tmp = int32_to_float32(a, s);
-    tmp = float32_scalbn(tmp, -32, s);
-    tmp = helper_rsqrte_f32(tmp, env);
-    tmp = float32_scalbn(tmp, 31, s);
-    return float32_to_int32(tmp, s);
+    float64 f64;
+
+    if ((a & 0xc0000000) == 0) return 0xffffffff;
+
+    if (a & 0x80000000) {
+        f64 = make_float64((0x3feULL << 52)
+                           | ((uint64_t)(a & 0x7fffffff) << 21));
+    } else { /* bits 31-30 == '01' */
+        f64 = make_float64((0x3fdULL << 52)
+                           | ((uint64_t)(a & 0x3fffffff) << 22));
+    }
+
+    f64 = recip_sqrt_estimate(f64, env);
+
+    return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff);
 }
 
 void HELPER(set_teecr)(CPUState *env, uint32_t val)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe.
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe christophe.lyon
@ 2011-02-17 12:23   ` Peter Maydell
  2011-02-17 17:51     ` Christophe Lyon
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Maydell @ 2011-02-17 12:23 UTC (permalink / raw)
  To: christophe.lyon; +Cc: qemu-devel

On 16 February 2011 17:59,  <christophe.lyon@st.com> wrote:
> From: Christophe Lyon <christophe.lyon@st.com>
>
> Now use the same algorithm as described in the ARM ARM.
>
> Signed-off-by: Christophe Lyon <christophe.lyon@st.com>

Mostly looks good, and seems to pass random testing.

> +    float_status *s = &env->vfp.standard_fp_status;
> +    float64 one = int64_to_float64(1, s);

You don't need to create a variable for this, float64_one
will do what you want.

> +    /* q = (int)(a * 512.0) */
> +    float64 x512 = int64_to_float64(512, s);
> +    float64 q = float64_mul(x512, a, s);
> +    int64_t q_int = float64_to_int64_round_to_zero(q, s);
> +
> +    /* r = 1.0 / (((double)q + 0.5) / 512.0) */
> +    q = int64_to_float64(q_int, s);
> +    float64 half = float64_div(one, int64_to_float64(2, s), s);

...and a runtime division just to get a constant 0.5?
Better to just make_float64() on the appropriate bit
pattern, I think.

>  float32 HELPER(recpe_f32)(float32 a, CPUState *env)
>  {
> -    float_status *s = &env->vfp.fp_status;
> -    float32 one = int32_to_float32(1, s);
> -    return float32_div(one, a, s);
> +    float_status *s = &env->vfp.standard_fp_status;
> +    float64 f64;
> +    uint32_t val32 = float32_val(a);
> +
> +    int result_exp;
> +    int a_exp = (val32  & 0x7f800000) >> 23;
> +    int sign = val32 & 0x80000000;
> +
> +    if (float32_is_any_nan(a)) {
> +        return float32_default_nan;

This won't set InvalidOp if the input is a signalling NaN.

-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte.
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte christophe.lyon
@ 2011-02-17 12:49   ` Peter Maydell
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2011-02-17 12:49 UTC (permalink / raw)
  To: christophe.lyon; +Cc: qemu-devel

On 16 February 2011 17:59,  <christophe.lyon@st.com> wrote:
> From: Christophe Lyon <christophe.lyon@st.com>
>
> Now use the same algorithm as described in the ARM ARM.

This doesn't pass random testing.

> +static float64 recip_sqrt_estimate(float64 a, CPUState *env)
> +{
> +    float_status *s = &env->vfp.standard_fp_status;
> +    float64 one = int64_to_float64(1, s);
> +    float64 half = float64_div(one, int64_to_float64(2, s), s);

Same remarks about one and half apply as for 2/3.
Maybe we should put a float64_half in softfloat.h.

>  float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
>  {
> -    float_status *s = &env->vfp.fp_status;
> -    float32 one = int32_to_float32(1, s);

Use float32_one.

> +    if (float32_is_any_nan(a)) {
> +        return float32_default_nan;

Should raise InvalidOp for an SNaN.

> +    } else if (float32_is_zero(a)) {
> +        float_raise(float_flag_divbyzero, s);
> +        return float32_set_sign(float32_infinity, float32_is_neg(a));

Should use float32_is_zero_or_denormal()
so we handle denormals properly.

> +    } else if (val < 0) {

Use if (float32_is_neg(a)). Then you can make
val and val64 bit uint32_t,uint64_t rather than
the signed types, which makes more sense given
these are really just bit patterns.

> +    if ((val & 0x800000) == 0) {
> +        f64 = make_float64(((uint64_t)(val & 0x80000000) << 32)
> +                           | (0x3feULL << 52)
> +                           | ((uint64_t)(val & 0x7ffffff) << 29));
> +    } else {
> +        f64 = make_float64(((uint64_t)(val & 0x80000000) << 32)
> +                           | (0x3fdULL << 52)
> +                           | ((uint64_t)(val & 0x7ffffff) << 29));
> +    }

These are both wrong -- the ARM ARM says f64
gets operand<31> : ‘01111111110’ : operand<22:0> : Zeros(29);
(or similar for the other case), which means you
need to mask with 0x7fffff. (Count the 'f's !)

If you fix this and the denormal case then it passes
random testing.

-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign().
  2011-02-16 17:59 ` [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign() christophe.lyon
@ 2011-02-17 12:56   ` Peter Maydell
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2011-02-17 12:56 UTC (permalink / raw)
  To: christophe.lyon; +Cc: qemu-devel, Aurelien Jarno

On 16 February 2011 17:59,  <christophe.lyon@st.com> wrote:
> From: Christophe Lyon <christophe.lyon@st.com>
>
> These special values are needed to implement some helper functions,
> which return these values in some cases.
>
> This patch also moves the definitions of default_nan for 16, 64, x80
> and 128 bits floats for consistency with float32.
>
> Signed-off-by: Christophe Lyon <christophe.lyon@st.com>

I'm happy enough with this patch; I think it's reasonable
for target code to want to be able to get at the default
NaN value without having to redefine it locally, so the
move of stuff out of softfloat-specialize.h is OK.

Arguable points:
 * should we have versions of float32_infinity and
float32_set_sign() for the other float* types?
 * should those be in a separate patch from "move all
the default NaN code"?

but I don't have a very strong opinion either way on
those, so unless somebody else (Aurelien?) wants to weigh
in this is fine by me.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe.
  2011-02-17 12:23   ` Peter Maydell
@ 2011-02-17 17:51     ` Christophe Lyon
  2011-02-17 18:35       ` Peter Maydell
  0 siblings, 1 reply; 9+ messages in thread
From: Christophe Lyon @ 2011-02-17 17:51 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org


>> +    float_status *s = &env->vfp.standard_fp_status;
>> +    float64 one = int64_to_float64(1, s);
> You don't need to create a variable for this, float64_one
> will do what you want.
OK

 
>> +    /* q = (int)(a * 512.0) */
>> +    float64 x512 = int64_to_float64(512, s);
>> +    float64 q = float64_mul(x512, a, s);
>> +    int64_t q_int = float64_to_int64_round_to_zero(q, s);
>> +
>> +    /* r = 1.0 / (((double)q + 0.5) / 512.0) */
>> +    q = int64_to_float64(q_int, s);
>> +    float64 half = float64_div(one, int64_to_float64(2, s), s);
> ...and a runtime division just to get a constant 0.5?
> Better to just make_float64() on the appropriate bit
> pattern, I think.

It makes sense.
Then, what about using the right bit patterns for 512 and 256?

Actually, for these last two, I mimicked recps and rsqrts which build constants 2 and 3. I could add another patch to address this point.
 

>> +    if (float32_is_any_nan(a)) {
>> +        return float32_default_nan;
> This won't set InvalidOp if the input is a signalling NaN.
Thanks for the notice, I have no means of testing those cases.

Christophe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe.
  2011-02-17 17:51     ` Christophe Lyon
@ 2011-02-17 18:35       ` Peter Maydell
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2011-02-17 18:35 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: qemu-devel@nongnu.org

On 17 February 2011 17:51, Christophe Lyon <christophe.lyon@st.com> wrote:

>> ...and a runtime division just to get a constant 0.5?
>> Better to just make_float64() on the appropriate bit
>> pattern, I think.
>
> It makes sense.
> Then, what about using the right bit patterns for 512 and 256?

Not so worried about those -- it's really the division I
wanted to avoid, int-to-float isn't as expensive.

-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-17 18:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 17:59 [Qemu-devel] [PATCH v3 0/3] ARM: fix Neon vrecpe and vrsqrte instructions christophe.lyon
2011-02-16 17:59 ` [Qemu-devel] [PATCH 1/3] softfloat: export float32_default_nan, and float32_infinity. Add float32_set_sign() christophe.lyon
2011-02-17 12:56   ` Peter Maydell
2011-02-16 17:59 ` [Qemu-devel] [PATCH 2/3] target-arm: fix support for vrecpe christophe.lyon
2011-02-17 12:23   ` Peter Maydell
2011-02-17 17:51     ` Christophe Lyon
2011-02-17 18:35       ` Peter Maydell
2011-02-16 17:59 ` [Qemu-devel] [PATCH 3/3] target-arm: fix support for vrsqrte christophe.lyon
2011-02-17 12:49   ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).