[Qemu-devel] [PATCH v2 0/2] ARM: fix Neon vrecpe instruction.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/2] ARM: fix Neon vrecpe instruction.
@ 2011-02-16 14:51 christophe.lyon
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity christophe.lyon
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe christophe.lyon
  0 siblings, 2 replies; 6+ messages in thread
From: christophe.lyon @ 2011-02-16 14:51 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

These 2 patches fix the ARM Neon vrecpe instruction by matching the
algorithm descibed in the ARM ARM.

With these patches, qemu passes my ARM/Neon tests.

Patch #1 modifies softfloat by exporting float32_nan and
float32_infinity. For consistency, I have also moved all the
target-dependent definitions of floatXX_default_nan to softfloat.h (ie
the 16, 64, x80 and 128 bits versions in addition to the 32 bits
ones).

Patch #2 uses these newly exported values and uses the vrecpe
algorithm described in the ARM ARM.

Christophe Lyon (2):
  softfloat: export float32_nan and float32_infinity.
  target-arm: fix support for vrecpe.

 fpu/softfloat-specialize.h |   68 -----------------------------------
 fpu/softfloat.h            |   71 +++++++++++++++++++++++++++++++++++++
 target-arm/helper.c        |   84 +++++++++++++++++++++++++++++++++++++------
 3 files changed, 143 insertions(+), 80 deletions(-)

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity.
  2011-02-16 14:51 [Qemu-devel] [PATCH v2 0/2] ARM: fix Neon vrecpe instruction christophe.lyon
@ 2011-02-16 14:51 ` christophe.lyon
  2011-02-16 16:54   ` Peter Maydell
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe christophe.lyon
  1 sibling, 1 reply; 6+ messages in thread
From: christophe.lyon @ 2011-02-16 14:51 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

These two special values are needed to implement some helper
functions, which return these values in some cases.

This patch also moves the definitions of default_nan for 16, 64, x80
and 128 bits floats for consistency with float32.

Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
---
 fpu/softfloat-specialize.h |   68 ------------------------------------------
 fpu/softfloat.h            |   71 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 68 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index 2d025bf..adc5ada 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -30,12 +30,6 @@ these four paragraphs for those parts of this code that are retained.
 
 =============================================================================*/
 
-#if defined(TARGET_MIPS) || defined(TARGET_SH4)
-#define SNAN_BIT_IS_ONE		1
-#else
-#define SNAN_BIT_IS_ONE		0
-#endif
-
 /*----------------------------------------------------------------------------
 | Raises the exceptions specified by `flags'.  Floating-point traps can be
 | defined here if desired.  It is currently not possible for such a trap
@@ -57,17 +51,6 @@ typedef struct {
 } commonNaNT;
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated half-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_ARM)
-#define float16_default_nan make_float16(0x7E00)
-#elif SNAN_BIT_IS_ONE
-#define float16_default_nan make_float16(0x7DFF)
-#else
-#define float16_default_nan make_float16(0xFE00)
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the half-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -158,19 +141,6 @@ static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated single-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_SPARC)
-#define float32_default_nan make_float32(0x7FFFFFFF)
-#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
-#define float32_default_nan make_float32(0x7FC00000)
-#elif SNAN_BIT_IS_ONE
-#define float32_default_nan make_float32(0x7FBFFFFF)
-#else
-#define float32_default_nan make_float32(0xFFC00000)
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the single-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -413,19 +383,6 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated double-precision NaN.
-*----------------------------------------------------------------------------*/
-#if defined(TARGET_SPARC)
-#define float64_default_nan make_float64(LIT64( 0x7FFFFFFFFFFFFFFF ))
-#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
-#define float64_default_nan make_float64(LIT64( 0x7FF8000000000000 ))
-#elif SNAN_BIT_IS_ONE
-#define float64_default_nan make_float64(LIT64( 0x7FF7FFFFFFFFFFFF ))
-#else
-#define float64_default_nan make_float64(LIT64( 0xFFF8000000000000 ))
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the double-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -564,19 +521,6 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM)
 #ifdef FLOATX80
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated extended double-precision NaN.  The
-| `high' and `low' values hold the most- and least-significant bits,
-| respectively.
-*----------------------------------------------------------------------------*/
-#if SNAN_BIT_IS_ONE
-#define floatx80_default_nan_high 0x7FFF
-#define floatx80_default_nan_low  LIT64( 0xBFFFFFFFFFFFFFFF )
-#else
-#define floatx80_default_nan_high 0xFFFF
-#define floatx80_default_nan_low  LIT64( 0xC000000000000000 )
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the extended double-precision floating-point value `a' is a
 | quiet NaN; otherwise returns 0. This slightly differs from the same
 | function for other types as floatx80 has an explicit bit.
@@ -728,18 +672,6 @@ static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARAM)
 #ifdef FLOAT128
 
 /*----------------------------------------------------------------------------
-| The pattern for a default generated quadruple-precision NaN.  The `high' and
-| `low' values hold the most- and least-significant bits, respectively.
-*----------------------------------------------------------------------------*/
-#if SNAN_BIT_IS_ONE
-#define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF )
-#define float128_default_nan_low  LIT64( 0xFFFFFFFFFFFFFFFF )
-#else
-#define float128_default_nan_high LIT64( 0xFFFF800000000000 )
-#define float128_default_nan_low  LIT64( 0x0000000000000000 )
-#endif
-
-/*----------------------------------------------------------------------------
 | Returns 1 if the quadruple-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
diff --git a/fpu/softfloat.h b/fpu/softfloat.h
index e57ee1e..7bb77d5 100644
--- a/fpu/softfloat.h
+++ b/fpu/softfloat.h
@@ -77,6 +77,12 @@ typedef int64_t sbits64;
 #define LIT64( a ) a##LL
 #define INLINE static inline
 
+#if defined(TARGET_MIPS) || defined(TARGET_SH4)
+#define SNAN_BIT_IS_ONE		1
+#else
+#define SNAN_BIT_IS_ONE		0
+#endif
+
 /*----------------------------------------------------------------------------
 | The macro `FLOATX80' must be defined to enable the extended double-precision
 | floating-point format `floatx80'.  If this macro is not defined, the
@@ -278,6 +284,17 @@ int float16_is_signaling_nan( float16 );
 float16 float16_maybe_silence_nan( float16 );
 
 /*----------------------------------------------------------------------------
+| The pattern for a default generated half-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_ARM)
+#define float16_default_nan make_float16(0x7E00)
+#elif SNAN_BIT_IS_ONE
+#define float16_default_nan make_float16(0x7DFF)
+#else
+#define float16_default_nan make_float16(0xFE00)
+#endif
+
+/*----------------------------------------------------------------------------
 | Software IEC/IEEE single-precision conversion routines.
 *----------------------------------------------------------------------------*/
 int float32_to_int16_round_to_zero( float32 STATUS_PARAM );
@@ -365,6 +382,22 @@ INLINE int float32_is_zero_or_denormal(float32 a)
 #define float32_zero make_float32(0)
 #define float32_one make_float32(0x3f800000)
 #define float32_ln2 make_float32(0x3f317218)
+#define float32_infinity make_float32(0x7f800000)
+#define float32_nan float32_default_nan
+
+
+/*----------------------------------------------------------------------------
+| The pattern for a default generated single-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_SPARC)
+#define float32_default_nan make_float32(0x7FFFFFFF)
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+#define float32_default_nan make_float32(0x7FC00000)
+#elif SNAN_BIT_IS_ONE
+#define float32_default_nan make_float32(0x7FBFFFFF)
+#else
+#define float32_default_nan make_float32(0xFFC00000)
+#endif
 
 /*----------------------------------------------------------------------------
 | Software IEC/IEEE double-precision conversion routines.
@@ -452,6 +485,19 @@ INLINE int float64_is_any_nan(float64 a)
 #define float64_one make_float64(0x3ff0000000000000LL)
 #define float64_ln2 make_float64(0x3fe62e42fefa39efLL)
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated double-precision NaN.
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_SPARC)
+#define float64_default_nan make_float64(LIT64( 0x7FFFFFFFFFFFFFFF ))
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+#define float64_default_nan make_float64(LIT64( 0x7FF8000000000000 ))
+#elif SNAN_BIT_IS_ONE
+#define float64_default_nan make_float64(LIT64( 0x7FF7FFFFFFFFFFFF ))
+#else
+#define float64_default_nan make_float64(LIT64( 0xFFF8000000000000 ))
+#endif
+
 #ifdef FLOATX80
 
 /*----------------------------------------------------------------------------
@@ -520,6 +566,19 @@ INLINE int floatx80_is_any_nan(floatx80 a)
     return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
 }
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated extended double-precision NaN.  The
+| `high' and `low' values hold the most- and least-significant bits,
+| respectively.
+*----------------------------------------------------------------------------*/
+#if SNAN_BIT_IS_ONE
+#define floatx80_default_nan_high 0x7FFF
+#define floatx80_default_nan_low  LIT64( 0xBFFFFFFFFFFFFFFF )
+#else
+#define floatx80_default_nan_high 0xFFFF
+#define floatx80_default_nan_low  LIT64( 0xC000000000000000 )
+#endif
+
 #endif
 
 #ifdef FLOAT128
@@ -593,6 +652,18 @@ INLINE int float128_is_any_nan(float128 a)
         ((a.low != 0) || ((a.high & 0xffffffffffffLL) != 0));
 }
 
+/*----------------------------------------------------------------------------
+| The pattern for a default generated quadruple-precision NaN.  The `high' and
+| `low' values hold the most- and least-significant bits, respectively.
+*----------------------------------------------------------------------------*/
+#if SNAN_BIT_IS_ONE
+#define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF )
+#define float128_default_nan_low  LIT64( 0xFFFFFFFFFFFFFFFF )
+#else
+#define float128_default_nan_high LIT64( 0xFFFF800000000000 )
+#define float128_default_nan_low  LIT64( 0x0000000000000000 )
+#endif
+
 #endif
 
 #else /* CONFIG_SOFTFLOAT */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe.
  2011-02-16 14:51 [Qemu-devel] [PATCH v2 0/2] ARM: fix Neon vrecpe instruction christophe.lyon
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity christophe.lyon
@ 2011-02-16 14:51 ` christophe.lyon
  2011-02-16 16:47   ` Peter Maydell
  1 sibling, 1 reply; 6+ messages in thread
From: christophe.lyon @ 2011-02-16 14:51 UTC (permalink / raw)
  To: qemu-devel

From: Christophe Lyon <christophe.lyon@st.com>

Now use the same algorithm as described in the ARM ARM.

Signed-off-by: Christophe Lyon <christophe.lyon@st.com>
---
 target-arm/helper.c |   84 +++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 7f63a28..a17df42 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2687,13 +2687,68 @@ float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUState *env)
 
 /* NEON helpers.  */
 
-/* TODO: The architecture specifies the value that the estimate functions
-   should return.  We return the exact reciprocal/root instead.  */
+/* The algorithm that must be used to calculate the estimate
+ * is specified by the ARM ARM.
+ */
+static float64 recip_estimate(float64 a, CPUState *env)
+{
+    float_status *s = &env->vfp.standard_fp_status;
+    float64 one = int64_to_float64(1, s);
+    /* q = (int)(a * 512.0) */
+    float64 x512 = int64_to_float64(512, s);
+    float64 q = float64_mul(x512, a, s);
+    int64_t q_int = float64_to_int64_round_to_zero(q, s);
+
+    /* r = 1.0 / (((double)q + 0.5) / 512.0) */
+    q = int64_to_float64(q_int, s);
+    float64 half = float64_div(one, int64_to_float64(2, s), s);
+    q = float64_add(q, half, s);
+    q = float64_div(q, x512, s);
+    q = float64_div(one, q, s);
+
+    /* s = (int)(256.0 * r + 0.5) */
+    float64 x256 = int64_to_float64(256, s);
+    q = float64_mul(q, x256, s);
+    q = float64_add(q, half, s);
+    q_int = float64_to_int64_round_to_zero(q, s);
+
+    /* return (double)s / 256.0 */
+    return float64_div(int64_to_float64(q_int, s), x256, s);
+}
+
 float32 HELPER(recpe_f32)(float32 a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 one = int32_to_float32(1, s);
-    return float32_div(one, a, s);
+    float_status *s = &env->vfp.standard_fp_status;
+    float64 f64;
+    uint32_t val32 = float32_val(a);
+
+    int result_exp;
+    int a_exp = (val32  & 0x7F800000) >> 23;
+    int sign = val32 & 0x80000000;
+
+    if (float32_is_any_nan(a)) {
+        return float32_maybe_silence_nan(a);
+    } else if (float32_is_infinity(a)) {
+        return float32_zero;
+    } else if (float32_is_zero(a)) {
+        float_raise(float_flag_divbyzero, s);
+        return float32_infinity;
+    } else if (a_exp >= 253) {
+        float_raise(float_flag_underflow, s);
+        return float32_zero;
+    }
+
+    f64 = make_float64((0x3FEULL << 52)
+                       | ((int64_t)(val32 & 0x7FFFFF) << 29));
+
+    result_exp = 253 - a_exp;
+
+    f64 = recip_estimate(f64, env);
+
+    val32 = sign
+        | ((result_exp & 0xFF) << 23)
+        | ((float64_val(f64) >> 29) & 0x7FFFFF);
+    return make_float32(val32);
 }
 
 float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
@@ -2705,13 +2760,18 @@ float32 HELPER(rsqrte_f32)(float32 a, CPUState *env)
 
 uint32_t HELPER(recpe_u32)(uint32_t a, CPUState *env)
 {
-    float_status *s = &env->vfp.fp_status;
-    float32 tmp;
-    tmp = int32_to_float32(a, s);
-    tmp = float32_scalbn(tmp, -32, s);
-    tmp = helper_recpe_f32(tmp, env);
-    tmp = float32_scalbn(tmp, 31, s);
-    return float32_to_int32(tmp, s);
+    float64 f64;
+
+    if ((a & 0x80000000) == 0) {
+        return 0xFFFFFFFF;
+    }
+
+    f64 = make_float64((0x3FEULL << 52)
+                       | ((int64_t)(a & 0x7FFFFFFF) << 21));
+
+    f64 = recip_estimate (f64, env);
+
+    return 0x80000000 | ((float64_val(f64) >> 21) & 0x7FFFFFFF);
 }
 
 uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUState *env)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe.
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe christophe.lyon
@ 2011-02-16 16:47   ` Peter Maydell
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2011-02-16 16:47 UTC (permalink / raw)
  To: christophe.lyon; +Cc: qemu-devel

On 16 February 2011 14:51,  <christophe.lyon@st.com> wrote:

Your corner case handling isn't quite right, I'm afraid.

>  float32 HELPER(recpe_f32)(float32 a, CPUState *env)
>  {
> -    float_status *s = &env->vfp.fp_status;
> -    float32 one = int32_to_float32(1, s);
> -    return float32_div(one, a, s);
> +    float_status *s = &env->vfp.standard_fp_status;
> +    float64 f64;
> +    uint32_t val32 = float32_val(a);
> +
> +    int result_exp;
> +    int a_exp = (val32  & 0x7F800000) >> 23;
> +    int sign = val32 & 0x80000000;
> +
> +    if (float32_is_any_nan(a)) {
> +        return float32_maybe_silence_nan(a);

This won't give the right answer for NaNs: we should be
returning the default NaN (since this is a Neon op
and uses the standard FPSCR value), but
maybe_silence_nan() will return the input NaN
with the QNaN bit set.

> +    } else if (float32_is_infinity(a)) {
> +        return float32_zero;

This will return +0 for an input of -inf, when
it should be -0.

> +    } else if (float32_is_zero(a)) {
> +        float_raise(float_flag_divbyzero, s);
> +        return float32_infinity;

Return value for -0 should be -inf, not +inf.
Also you want the float32_is_zero_or_denormal()
function, since we know that denormals must be
flushed to zero (standard FPSCR again).

> +    } else if (a_exp >= 253) {
> +        float_raise(float_flag_underflow, s);
> +        return float32_zero;

The ARM ARM says "underflows to zero of correct sign",
not always +0.

-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity.
  2011-02-16 14:51 ` [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity christophe.lyon
@ 2011-02-16 16:54   ` Peter Maydell
  2011-02-16 16:58     ` Christophe Lyon
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2011-02-16 16:54 UTC (permalink / raw)
  To: christophe.lyon; +Cc: qemu-devel

On 16 February 2011 14:51,  <christophe.lyon@st.com> wrote:
> From: Christophe Lyon <christophe.lyon@st.com>
>
> These two special values are needed to implement some helper
> functions, which return these values in some cases.
>
> This patch also moves the definitions of default_nan for 16, 64, x80
> and 128 bits floats for consistency with float32.

Your other patch only uses float32_infinity, not float32_nan
or float32_default_nan, which renders a lot of this patch moot
at the moment.

> +#define float32_nan float32_default_nan

If we do need to expose NaN, we should just have callers
use float32_default_nan, there's no need for this extra
#define.

-- PMM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity.
  2011-02-16 16:54   ` Peter Maydell
@ 2011-02-16 16:58     ` Christophe Lyon
  0 siblings, 0 replies; 6+ messages in thread
From: Christophe Lyon @ 2011-02-16 16:58 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel@nongnu.org

On 16.02.2011 17:54, Peter Maydell wrote:
> On 16 February 2011 14:51,  <christophe.lyon@st.com> wrote:
>> From: Christophe Lyon <christophe.lyon@st.com>
>>
>> These two special values are needed to implement some helper
>> functions, which return these values in some cases.
>>
>> This patch also moves the definitions of default_nan for 16, 64, x80
>> and 128 bits floats for consistency with float32.
> 
> Your other patch only uses float32_infinity, not float32_nan
> or float32_default_nan, which renders a lot of this patch moot
> at the moment.

Indeed, I have another patch for vrsqrte which does need float32_nan.

>> +#define float32_nan float32_default_nan
> 
> If we do need to expose NaN, we should just have callers
> use float32_default_nan, there's no need for this extra
> #define.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-02-16 16:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 14:51 [Qemu-devel] [PATCH v2 0/2] ARM: fix Neon vrecpe instruction christophe.lyon
2011-02-16 14:51 ` [Qemu-devel] [PATCH 1/2] softfloat: export float32_nan and float32_infinity christophe.lyon
2011-02-16 16:54   ` Peter Maydell
2011-02-16 16:58     ` Christophe Lyon
2011-02-16 14:51 ` [Qemu-devel] [PATCH 2/2] target-arm: fix support for vrecpe christophe.lyon
2011-02-16 16:47   ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).