[Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes
@ 2011-04-01 14:30 Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status Peter Maydell
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

This is a pull request for a set of patches fixing various minor Neon
problems for ARM targets, which I sent to the list a couple of weeks ago.
A few of them got reviewed-by Nathan, one had some minor discussion which
didn't amount to a request for any change, the rest had no comments.

The softfloat patch includes a trivial change since the original
posted version: the MINMAX macro now has a "uint ## s ## _t" rather than
a "bits ## s" since the bits32/bits64 types were removed in commit bb98fe4.

The following changes since commit a5086f95421e43c7b9e1b28a111aae0be4848117:

  lm32: use lookup table for opcodes (2011-03-31 08:54:05 +0200)

are available in the git repository at:
  git://git.linaro.org/people/pmaydell/qemu-arm.git for-upstream

Peter Maydell (10):
      target-arm: Make Neon helper routines use correct FP status
      target-arm/neon_helper.c: Use make_float32/float32_val macros
      target-arm: Return right result for Neon comparison with NaNs
      target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling
      target-arm: Correct ABD's handling of negative zeroes
      softfloat: Add float*_min() and float*_max() functions
      target-arm: Use new softfloat min/max functions for VMAX, VMIN
      target-arm: Fix VLD of single element to all lanes
      target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
      target-arm/helper.c: For float-int conversion helpers pass ints as ints

 fpu/softfloat.c          |   49 +++++++++++++++
 fpu/softfloat.h          |    4 +
 target-arm/helper.c      |  155 ++++++++++++++++-----------------------------
 target-arm/helpers.h     |   82 ++++++++++++------------
 target-arm/neon_helper.c |  103 ++++++++++++------------------
 target-arm/translate.c   |  152 +++++++++++++++++++++++++++++----------------
 6 files changed, 289 insertions(+), 256 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 18:29   ` Blue Swirl
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 02/10] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Make the Neon helper routines use the correct FP status from
the CPUEnv rather than using a dummy static one. This means
they will correctly handle denormals and NaNs and will set
FPSCR exception bits properly.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helpers.h     |   22 +++++++++++-----------
 target-arm/neon_helper.c |   21 ++++++++++-----------
 target-arm/translate.c   |   42 ++++++++++++++++++++++--------------------
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/target-arm/helpers.h b/target-arm/helpers.h
index bd6977c..e2260b6 100644
--- a/target-arm/helpers.h
+++ b/target-arm/helpers.h
@@ -350,17 +350,17 @@ DEF_HELPER_2(neon_qneg_s8, i32, env, i32)
 DEF_HELPER_2(neon_qneg_s16, i32, env, i32)
 DEF_HELPER_2(neon_qneg_s32, i32, env, i32)
 
-DEF_HELPER_2(neon_min_f32, i32, i32, i32)
-DEF_HELPER_2(neon_max_f32, i32, i32, i32)
-DEF_HELPER_2(neon_abd_f32, i32, i32, i32)
-DEF_HELPER_2(neon_add_f32, i32, i32, i32)
-DEF_HELPER_2(neon_sub_f32, i32, i32, i32)
-DEF_HELPER_2(neon_mul_f32, i32, i32, i32)
-DEF_HELPER_2(neon_ceq_f32, i32, i32, i32)
-DEF_HELPER_2(neon_cge_f32, i32, i32, i32)
-DEF_HELPER_2(neon_cgt_f32, i32, i32, i32)
-DEF_HELPER_2(neon_acge_f32, i32, i32, i32)
-DEF_HELPER_2(neon_acgt_f32, i32, i32, i32)
+DEF_HELPER_3(neon_min_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_max_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_abd_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_add_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_sub_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_mul_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_ceq_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_cge_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_cgt_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_acge_f32, i32, env, i32, i32)
+DEF_HELPER_3(neon_acgt_f32, i32, env, i32, i32)
 
 /* iwmmxt_helper.c */
 DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64)
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 002a9c1..97bc1e6 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -18,8 +18,7 @@
 
 #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] = CPSR_Q
 
-static float_status neon_float_status;
-#define NFS &neon_float_status
+#define NFS (&env->vfp.standard_fp_status)
 
 /* Helper routines to perform bitwise copies between float and int.  */
 static inline float32 vfp_itos(uint32_t i)
@@ -1794,21 +1793,21 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
 }
 
 /* NEON Float helpers.  */
-uint32_t HELPER(neon_min_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
     return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b;
 }
 
-uint32_t HELPER(neon_max_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
     return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b;
 }
 
-uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = vfp_itos(a);
     float32 f1 = vfp_itos(b);
@@ -1817,24 +1816,24 @@ uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b)
                     : float32_sub(f1, f0, NFS));
 }
 
-uint32_t HELPER(neon_add_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_add(vfp_itos(a), vfp_itos(b), NFS));
 }
 
-uint32_t HELPER(neon_sub_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_sub_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_sub(vfp_itos(a), vfp_itos(b), NFS));
 }
 
-uint32_t HELPER(neon_mul_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     return vfp_stoi(float32_mul(vfp_itos(a), vfp_itos(b), NFS));
 }
 
 /* Floating point comparisons produce an integer result.  */
 #define NEON_VOP_FCMP(name, cmp) \
-uint32_t HELPER(neon_##name)(uint32_t a, uint32_t b) \
+uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \
 { \
     if (float32_compare_quiet(vfp_itos(a), vfp_itos(b), NFS) cmp 0) \
         return ~0; \
@@ -1846,14 +1845,14 @@ NEON_VOP_FCMP(ceq_f32, ==)
 NEON_VOP_FCMP(cge_f32, >=)
 NEON_VOP_FCMP(cgt_f32, >)
 
-uint32_t HELPER(neon_acge_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(vfp_itos(a));
     float32 f1 = float32_abs(vfp_itos(b));
     return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0;
 }
 
-uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b)
+uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(vfp_itos(a));
     float32 f1 = float32_abs(vfp_itos(b));
diff --git a/target-arm/translate.c b/target-arm/translate.c
index f69912f..cf2440e 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4519,56 +4519,56 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
         case 26: /* Floating point arithnetic.  */
             switch ((u << 2) | size) {
             case 0: /* VADD */
-                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 2: /* VSUB */
-                gen_helper_neon_sub_f32(tmp, tmp, tmp2);
+                gen_helper_neon_sub_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 4: /* VPADD */
-                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             case 6: /* VABD */
-                gen_helper_neon_abd_f32(tmp, tmp, tmp2);
+                gen_helper_neon_abd_f32(tmp, cpu_env, tmp, tmp2);
                 break;
             default:
                 return 1;
             }
             break;
         case 27: /* Float multiply.  */
-            gen_helper_neon_mul_f32(tmp, tmp, tmp2);
+            gen_helper_neon_mul_f32(tmp, cpu_env, tmp, tmp2);
             if (!u) {
                 tcg_temp_free_i32(tmp2);
                 tmp2 = neon_load_reg(rd, pass);
                 if (size == 0) {
-                    gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2);
                 } else {
-                    gen_helper_neon_sub_f32(tmp, tmp2, tmp);
+                    gen_helper_neon_sub_f32(tmp, cpu_env, tmp2, tmp);
                 }
             }
             break;
         case 28: /* Float compare.  */
             if (!u) {
-                gen_helper_neon_ceq_f32(tmp, tmp, tmp2);
+                gen_helper_neon_ceq_f32(tmp, cpu_env, tmp, tmp2);
             } else {
                 if (size == 0)
-                    gen_helper_neon_cge_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_cge_f32(tmp, cpu_env, tmp, tmp2);
                 else
-                    gen_helper_neon_cgt_f32(tmp, tmp, tmp2);
+                    gen_helper_neon_cgt_f32(tmp, cpu_env, tmp, tmp2);
             }
             break;
         case 29: /* Float compare absolute.  */
             if (!u)
                 return 1;
             if (size == 0)
-                gen_helper_neon_acge_f32(tmp, tmp, tmp2);
+                gen_helper_neon_acge_f32(tmp, cpu_env, tmp, tmp2);
             else
-                gen_helper_neon_acgt_f32(tmp, tmp, tmp2);
+                gen_helper_neon_acgt_f32(tmp, cpu_env, tmp, tmp2);
             break;
         case 30: /* Float min/max.  */
             if (size == 0)
-                gen_helper_neon_max_f32(tmp, tmp, tmp2);
+                gen_helper_neon_max_f32(tmp, cpu_env, tmp, tmp2);
             else
-                gen_helper_neon_min_f32(tmp, tmp, tmp2);
+                gen_helper_neon_min_f32(tmp, cpu_env, tmp, tmp2);
             break;
         case 31:
             if (size == 0)
@@ -5232,7 +5232,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                             }
                         } else if (op & 1) {
-                            gen_helper_neon_mul_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_mul_f32(tmp,  cpu_env, tmp, tmp2);
                         } else {
                             switch (size) {
                             case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
@@ -5250,13 +5250,15 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                                 gen_neon_add(size, tmp, tmp2);
                                 break;
                             case 1:
-                                gen_helper_neon_add_f32(tmp, tmp, tmp2);
+                                gen_helper_neon_add_f32(tmp, cpu_env,
+                                                        tmp, tmp2);
                                 break;
                             case 4:
                                 gen_neon_rsb(size, tmp, tmp2);
                                 break;
                             case 5:
-                                gen_helper_neon_sub_f32(tmp, tmp2, tmp);
+                                gen_helper_neon_sub_f32(tmp,  cpu_env,
+                                                        tmp2, tmp);
                                 break;
                             default:
                                 abort();
@@ -5641,21 +5643,21 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                             break;
                         case 24: case 27: /* Float VCGT #0, Float VCLE #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_cgt_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_cgt_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             if (op == 27)
                                 tcg_gen_not_i32(tmp, tmp);
                             break;
                         case 25: case 28: /* Float VCGE #0, Float VCLT #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_cge_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_cge_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             if (op == 28)
                                 tcg_gen_not_i32(tmp, tmp);
                             break;
                         case 26: /* Float VCEQ #0 */
                             tmp2 = tcg_const_i32(0);
-                            gen_helper_neon_ceq_f32(tmp, tmp, tmp2);
+                            gen_helper_neon_ceq_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             break;
                         case 30: /* Float VABS */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 02/10] target-arm/neon_helper.c: Use make_float32/float32_val macros
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 03/10] target-arm: Return right result for Neon comparison with NaNs Peter Maydell
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Use the softfloat make_float32 and float32_val macros to convert between
softfloat's float32 type and raw uint32_t types, rather than private
conversion functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
---
 target-arm/neon_helper.c |   56 ++++++++++++++--------------------------------
 1 files changed, 17 insertions(+), 39 deletions(-)

diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 97bc1e6..4039036 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -20,29 +20,6 @@
 
 #define NFS (&env->vfp.standard_fp_status)
 
-/* Helper routines to perform bitwise copies between float and int.  */
-static inline float32 vfp_itos(uint32_t i)
-{
-    union {
-        uint32_t i;
-        float32 s;
-    } v;
-
-    v.i = i;
-    return v.s;
-}
-
-static inline uint32_t vfp_stoi(float32 s)
-{
-    union {
-        uint32_t i;
-        float32 s;
-    } v;
-
-    v.s = s;
-    return v.i;
-}
-
 #define NEON_TYPE1(name, type) \
 typedef struct \
 { \
@@ -1795,50 +1772,51 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
 /* NEON Float helpers.  */
 uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = vfp_itos(a);
-    float32 f1 = vfp_itos(b);
+    float32 f0 = make_float32(a);
+    float32 f1 = make_float32(b);
     return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b;
 }
 
 uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = vfp_itos(a);
-    float32 f1 = vfp_itos(b);
+    float32 f0 = make_float32(a);
+    float32 f1 = make_float32(b);
     return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b;
 }
 
 uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = vfp_itos(a);
-    float32 f1 = vfp_itos(b);
-    return vfp_stoi((float32_compare_quiet(f0, f1, NFS) == 1)
+    float32 f0 = make_float32(a);
+    float32 f1 = make_float32(b);
+    return float32_val((float32_compare_quiet(f0, f1, NFS) == 1)
                     ? float32_sub(f0, f1, NFS)
                     : float32_sub(f1, f0, NFS));
 }
 
 uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    return vfp_stoi(float32_add(vfp_itos(a), vfp_itos(b), NFS));
+    return float32_val(float32_add(make_float32(a), make_float32(b), NFS));
 }
 
 uint32_t HELPER(neon_sub_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    return vfp_stoi(float32_sub(vfp_itos(a), vfp_itos(b), NFS));
+    return float32_val(float32_sub(make_float32(a), make_float32(b), NFS));
 }
 
 uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    return vfp_stoi(float32_mul(vfp_itos(a), vfp_itos(b), NFS));
+    return float32_val(float32_mul(make_float32(a), make_float32(b), NFS));
 }
 
 /* Floating point comparisons produce an integer result.  */
 #define NEON_VOP_FCMP(name, cmp) \
 uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \
 { \
-    if (float32_compare_quiet(vfp_itos(a), vfp_itos(b), NFS) cmp 0) \
+    if (float32_compare_quiet(make_float32(a), make_float32(b), NFS) cmp 0) { \
         return ~0; \
-    else \
+    } else { \
         return 0; \
+    } \
 }
 
 NEON_VOP_FCMP(ceq_f32, ==)
@@ -1847,15 +1825,15 @@ NEON_VOP_FCMP(cgt_f32, >)
 
 uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = float32_abs(vfp_itos(a));
-    float32 f1 = float32_abs(vfp_itos(b));
+    float32 f0 = float32_abs(make_float32(a));
+    float32 f1 = float32_abs(make_float32(b));
     return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0;
 }
 
 uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = float32_abs(vfp_itos(a));
-    float32 f1 = float32_abs(vfp_itos(b));
+    float32 f0 = float32_abs(make_float32(a));
+    float32 f1 = float32_abs(make_float32(b));
     return (float32_compare_quiet(f0, f1, NFS) > 0) ? ~0 : 0;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 03/10] target-arm: Return right result for Neon comparison with NaNs
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 02/10] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 04/10] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Fix the helper functions implementing the Neon floating point comparison
ops (VCGE, VCGT, VCEQ, VACGT, VACGE) to return the right answer when
one of the values being compared is a NaN.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/neon_helper.c |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 4039036..8eb4cef 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1809,32 +1809,40 @@ uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b)
 }
 
 /* Floating point comparisons produce an integer result.  */
-#define NEON_VOP_FCMP(name, cmp) \
+#define NEON_VOP_FCMP(name, ok) \
 uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \
 { \
-    if (float32_compare_quiet(make_float32(a), make_float32(b), NFS) cmp 0) { \
-        return ~0; \
-    } else { \
-        return 0; \
+    switch (float32_compare_quiet(make_float32(a), make_float32(b), NFS)) { \
+    ok return ~0; \
+    default: return 0; \
     } \
 }
 
-NEON_VOP_FCMP(ceq_f32, ==)
-NEON_VOP_FCMP(cge_f32, >=)
-NEON_VOP_FCMP(cgt_f32, >)
+NEON_VOP_FCMP(ceq_f32, case float_relation_equal:)
+NEON_VOP_FCMP(cge_f32, case float_relation_equal: case float_relation_greater:)
+NEON_VOP_FCMP(cgt_f32, case float_relation_greater:)
 
 uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(make_float32(a));
     float32 f1 = float32_abs(make_float32(b));
-    return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0;
+    switch (float32_compare_quiet(f0, f1, NFS)) {
+    case float_relation_equal:
+    case float_relation_greater:
+        return ~0;
+    default:
+        return 0;
+    }
 }
 
 uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = float32_abs(make_float32(a));
     float32 f1 = float32_abs(make_float32(b));
-    return (float32_compare_quiet(f0, f1, NFS) > 0) ? ~0 : 0;
+    if (float32_compare_quiet(f0, f1, NFS) == float_relation_greater) {
+        return ~0;
+    }
+    return 0;
 }
 
 #define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1))
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 04/10] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (2 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 03/10] target-arm: Return right result for Neon comparison with NaNs Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 05/10] target-arm: Correct ABD's handling of negative zeroes Peter Maydell
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Implementing the floating-point versions of VCLE #0 and VCLT #0 by
doing a GT comparison and inverting the result gives the wrong
result if the input is a NaN. Implement as a GT comparison with the
operands swapped instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index cf2440e..6ce8b7a 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -5641,25 +5641,31 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                             gen_neon_rsb(size, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             break;
-                        case 24: case 27: /* Float VCGT #0, Float VCLE #0 */
+                        case 24: /* Float VCGT #0 */
                             tmp2 = tcg_const_i32(0);
                             gen_helper_neon_cgt_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
-                            if (op == 27)
-                                tcg_gen_not_i32(tmp, tmp);
                             break;
-                        case 25: case 28: /* Float VCGE #0, Float VCLT #0 */
+                        case 25: /* Float VCGE #0 */
                             tmp2 = tcg_const_i32(0);
                             gen_helper_neon_cge_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
-                            if (op == 28)
-                                tcg_gen_not_i32(tmp, tmp);
                             break;
                         case 26: /* Float VCEQ #0 */
                             tmp2 = tcg_const_i32(0);
                             gen_helper_neon_ceq_f32(tmp,  cpu_env, tmp, tmp2);
                             tcg_temp_free(tmp2);
                             break;
+                        case 27: /* Float VCLE #0 */
+                            tmp2 = tcg_const_i32(0);
+                            gen_helper_neon_cge_f32(tmp,  cpu_env, tmp2, tmp);
+                            tcg_temp_free(tmp2);
+                            break;
+                        case 28: /* Float VCLT #0 */
+                            tmp2 = tcg_const_i32(0);
+                            gen_helper_neon_cgt_f32(tmp,  cpu_env, tmp2, tmp);
+                            tcg_temp_free(tmp2);
+                            break;
                         case 30: /* Float VABS */
                             gen_vfp_abs(0);
                             break;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 05/10] target-arm: Correct ABD's handling of negative zeroes
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (3 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 04/10] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 06/10] softfloat: Add float*_min() and float*_max() functions Peter Maydell
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Implement ABD by taking the absolute value of the difference
of the operands (as the ARM ARM specifies) rather than by
flipping the order of the operands to the subtract based
on the results of a comparison. The latter approch gives
the wrong answers for some edge cases like negative zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/neon_helper.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 8eb4cef..1905545 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1788,9 +1788,7 @@ uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
     float32 f0 = make_float32(a);
     float32 f1 = make_float32(b);
-    return float32_val((float32_compare_quiet(f0, f1, NFS) == 1)
-                    ? float32_sub(f0, f1, NFS)
-                    : float32_sub(f1, f0, NFS));
+    return float32_val(float32_abs(float32_sub(f0, f1, NFS)));
 }
 
 uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 06/10] softfloat: Add float*_min() and float*_max() functions
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (4 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 05/10] target-arm: Correct ABD's handling of negative zeroes Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 07/10] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Add min and max operations to softfloat. This allows us to implement
propagation of NaNs and handling of negative zero correctly (unlike
the approach of having target helper routines return one of the operands
based on the result of a comparison op).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 fpu/softfloat.h |    4 ++++
 2 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 08e4ae0..03fb948 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -6057,6 +6057,55 @@ int float128_compare_quiet( float128 a, float128 b STATUS_PARAM )
     return float128_compare_internal(a, b, 1 STATUS_VAR);
 }
 
+/* min() and max() functions. These can't be implemented as
+ * 'compare and pick one input' because that would mishandle
+ * NaNs and +0 vs -0.
+ */
+#define MINMAX(s, nan_exp)                                              \
+INLINE float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
+                                        int ismin STATUS_PARAM )        \
+{                                                                       \
+    flag aSign, bSign;                                                  \
+    uint ## s ## _t av, bv;                                             \
+    a = float ## s ## _squash_input_denormal(a STATUS_VAR);             \
+    b = float ## s ## _squash_input_denormal(b STATUS_VAR);             \
+    if (float ## s ## _is_any_nan(a) ||                                 \
+        float ## s ## _is_any_nan(b)) {                                 \
+        return propagateFloat ## s ## NaN(a, b STATUS_VAR);             \
+    }                                                                   \
+    aSign = extractFloat ## s ## Sign(a);                               \
+    bSign = extractFloat ## s ## Sign(b);                               \
+    av = float ## s ## _val(a);                                         \
+    bv = float ## s ## _val(b);                                         \
+    if (aSign != bSign) {                                               \
+        if (ismin) {                                                    \
+            return aSign ? a : b;                                       \
+        } else {                                                        \
+            return aSign ? b : a;                                       \
+        }                                                               \
+    } else {                                                            \
+        if (ismin) {                                                    \
+            return (aSign ^ (av < bv)) ? a : b;                         \
+        } else {                                                        \
+            return (aSign ^ (av < bv)) ? b : a;                         \
+        }                                                               \
+    }                                                                   \
+}                                                                       \
+                                                                        \
+float ## s float ## s ## _min(float ## s a, float ## s b STATUS_PARAM)  \
+{                                                                       \
+    return float ## s ## _minmax(a, b, 1 STATUS_VAR);                   \
+}                                                                       \
+                                                                        \
+float ## s float ## s ## _max(float ## s a, float ## s b STATUS_PARAM)  \
+{                                                                       \
+    return float ## s ## _minmax(a, b, 0 STATUS_VAR);                   \
+}
+
+MINMAX(32, 0xff)
+MINMAX(64, 0x7ff)
+
+
 /* Multiply A by 2 raised to the power N.  */
 float32 float32_scalbn( float32 a, int n STATUS_PARAM )
 {
diff --git a/fpu/softfloat.h b/fpu/softfloat.h
index 5d05fa5..90f4250 100644
--- a/fpu/softfloat.h
+++ b/fpu/softfloat.h
@@ -324,6 +324,8 @@ int float32_le_quiet( float32, float32 STATUS_PARAM );
 int float32_lt_quiet( float32, float32 STATUS_PARAM );
 int float32_compare( float32, float32 STATUS_PARAM );
 int float32_compare_quiet( float32, float32 STATUS_PARAM );
+float32 float32_min(float32, float32 STATUS_PARAM);
+float32 float32_max(float32, float32 STATUS_PARAM);
 int float32_is_quiet_nan( float32 );
 int float32_is_signaling_nan( float32 );
 float32 float32_maybe_silence_nan( float32 );
@@ -436,6 +438,8 @@ int float64_le_quiet( float64, float64 STATUS_PARAM );
 int float64_lt_quiet( float64, float64 STATUS_PARAM );
 int float64_compare( float64, float64 STATUS_PARAM );
 int float64_compare_quiet( float64, float64 STATUS_PARAM );
+float64 float64_min(float64, float64 STATUS_PARAM);
+float64 float64_max(float64, float64 STATUS_PARAM);
 int float64_is_quiet_nan( float64 a );
 int float64_is_signaling_nan( float64 );
 float64 float64_maybe_silence_nan( float64 );
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 07/10] target-arm: Use new softfloat min/max functions for VMAX, VMIN
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (5 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 06/10] softfloat: Add float*_min() and float*_max() functions Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 08/10] target-arm: Fix VLD of single element to all lanes Peter Maydell
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Use the new softfloat min/max functions to implement the Neon VMAX
and VMIN instructions. This allows us to get the right behaviour
for NaN and negative zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/neon_helper.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index 1905545..19784ab 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1772,16 +1772,12 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
 /* NEON Float helpers.  */
 uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = make_float32(a);
-    float32 f1 = make_float32(b);
-    return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b;
+    return float32_val(float32_min(make_float32(a), make_float32(b), NFS));
 }
 
 uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b)
 {
-    float32 f0 = make_float32(a);
-    float32 f1 = make_float32(b);
-    return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b;
+    return float32_val(float32_max(make_float32(a), make_float32(b), NFS));
 }
 
 uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 08/10] target-arm: Fix VLD of single element to all lanes
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (6 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 07/10] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 09/10] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Fix several bugs in VLD of single element to all lanes:

The "single element to all lanes" form of VLD1 differs from those for
VLD2, VLD3 and VLD4 in that bit 5 indicates whether the loaded element
should be written to one or two Dregs (rather than being a register
stride). Handle this by special-casing VLD1 rather than trying to
have one loop which deals with both VLD1 and 2/3/4.

Handle VLD4.32 with 16 byte alignment specified, rather than UNDEFfing.

UNDEF for the invalid size and alignment combinations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |   84 +++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 6ce8b7a..e79ea03 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -2648,6 +2648,28 @@ static void gen_neon_dup_high16(TCGv var)
     tcg_temp_free_i32(tmp);
 }
 
+static TCGv gen_load_and_replicate(DisasContext *s, TCGv addr, int size)
+{
+    /* Load a single Neon element and replicate into a 32 bit TCG reg */
+    TCGv tmp;
+    switch (size) {
+    case 0:
+        tmp = gen_ld8u(addr, IS_USER(s));
+        gen_neon_dup_u8(tmp, 0);
+        break;
+    case 1:
+        tmp = gen_ld16u(addr, IS_USER(s));
+        gen_neon_dup_low16(tmp);
+        break;
+    case 2:
+        tmp = gen_ld32(addr, IS_USER(s));
+        break;
+    default: /* Avoid compiler warnings.  */
+        abort();
+    }
+    return tmp;
+}
+
 /* Disassemble a VFP instruction.  Returns nonzero if an error occured
    (ie. an undefined instruction).  */
 static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
@@ -3890,36 +3912,48 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
         size = (insn >> 10) & 3;
         if (size == 3) {
             /* Load single element to all lanes.  */
-            if (!load)
+            int a = (insn >> 4) & 1;
+            if (!load) {
                 return 1;
+            }
             size = (insn >> 6) & 3;
             nregs = ((insn >> 8) & 3) + 1;
-            stride = (insn & (1 << 5)) ? 2 : 1;
-            load_reg_var(s, addr, rn);
-            for (reg = 0; reg < nregs; reg++) {
-                switch (size) {
-                case 0:
-                    tmp = gen_ld8u(addr, IS_USER(s));
-                    gen_neon_dup_u8(tmp, 0);
-                    break;
-                case 1:
-                    tmp = gen_ld16u(addr, IS_USER(s));
-                    gen_neon_dup_low16(tmp);
-                    break;
-                case 2:
-                    tmp = gen_ld32(addr, IS_USER(s));
-                    break;
-                case 3:
+
+            if (size == 3) {
+                if (nregs != 4 || a == 0) {
                     return 1;
-                default: /* Avoid compiler warnings.  */
-                    abort();
                 }
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-                tmp2 = tcg_temp_new_i32();
-                tcg_gen_mov_i32(tmp2, tmp);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp);
-                rd += stride;
+                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
+                size = 2;
+            }
+            if (nregs == 1 && a == 1 && size == 0) {
+                return 1;
+            }
+            if (nregs == 3 && a == 1) {
+                return 1;
+            }
+            load_reg_var(s, addr, rn);
+            if (nregs == 1) {
+                /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
+                tmp = gen_load_and_replicate(s, addr, size);
+                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+                if (insn & (1 << 5)) {
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 0));
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 1));
+                }
+                tcg_temp_free_i32(tmp);
+            } else {
+                /* VLD2/3/4 to all lanes: bit 5 indicates register stride */
+                stride = (insn & (1 << 5)) ? 2 : 1;
+                for (reg = 0; reg < nregs; reg++) {
+                    tmp = gen_load_and_replicate(s, addr, size);
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
+                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
+                    tcg_temp_free_i32(tmp);
+                    tcg_gen_addi_i32(addr, addr, 1 << size);
+                    rd += stride;
+                }
             }
             stride = (1 << size) * nregs;
         } else {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 09/10] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (7 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 08/10] target-arm: Fix VLD of single element to all lanes Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints Peter Maydell
  2011-04-03 16:03 ` [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Aurelien Jarno
  10 siblings, 0 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Move the allocation and freeing of the TCG temp used for the address for
Neon load/store instructions so that we don't allocate the temporary
until we've done enough decoding to know that the instruction is not
an UNDEF pattern; this avoids leaking the TCG temp in these cases.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index e79ea03..527e260 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3810,7 +3810,6 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
-    addr = tcg_temp_new_i32();
     if ((insn & (1 << 23)) == 0) {
         /* Load store all elements.  */
         op = (insn >> 8) & 0xf;
@@ -3822,6 +3821,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
         spacing = neon_ls_element_type[op].spacing;
         if (size == 3 && (interleave | spacing) != 1)
             return 1;
+        addr = tcg_temp_new_i32();
         load_reg_var(s, addr, rn);
         stride = (1 << size) * interleave;
         for (reg = 0; reg < nregs; reg++) {
@@ -3907,6 +3907,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
             }
             rd += spacing;
         }
+        tcg_temp_free_i32(addr);
         stride = nregs * 8;
     } else {
         size = (insn >> 10) & 3;
@@ -3932,6 +3933,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
             if (nregs == 3 && a == 1) {
                 return 1;
             }
+            addr = tcg_temp_new_i32();
             load_reg_var(s, addr, rn);
             if (nregs == 1) {
                 /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
@@ -3955,6 +3957,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                     rd += stride;
                 }
             }
+            tcg_temp_free_i32(addr);
             stride = (1 << size) * nregs;
         } else {
             /* Single element.  */
@@ -3976,6 +3979,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 abort();
             }
             nregs = ((insn >> 8) & 3) + 1;
+            addr = tcg_temp_new_i32();
             load_reg_var(s, addr, rn);
             for (reg = 0; reg < nregs; reg++) {
                 if (load) {
@@ -4017,10 +4021,10 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 rd += stride;
                 tcg_gen_addi_i32(addr, addr, 1 << size);
             }
+            tcg_temp_free_i32(addr);
             stride = nregs * (1 << size);
         }
     }
-    tcg_temp_free_i32(addr);
     if (rm != 15) {
         TCGv base;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (8 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 09/10] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
@ 2011-04-01 14:30 ` Peter Maydell
  2011-04-01 18:25   ` Blue Swirl
  2011-04-03 16:03 ` [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Aurelien Jarno
  10 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 14:30 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Aurelien Jarno

Correct the argument and return types for the float<->int conversion helper
functions so that integer arguments and return values are declared as
uint32_t/uint64_t, not float32/float64. This allows us to remove the
hand-rolled functions which were doing bitwise copies between the types
via unions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
---
 target-arm/helper.c  |  155 ++++++++++++++++++--------------------------------
 target-arm/helpers.h |   60 ++++++++++----------
 2 files changed, 85 insertions(+), 130 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 78f3d39..6788a4c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2486,135 +2486,90 @@ DO_VFP_cmp(s, float32)
 DO_VFP_cmp(d, float64)
 #undef DO_VFP_cmp
 
-/* Helper routines to perform bitwise copies between float and int.  */
-static inline float32 vfp_itos(uint32_t i)
-{
-    union {
-        uint32_t i;
-        float32 s;
-    } v;
-
-    v.i = i;
-    return v.s;
-}
-
-static inline uint32_t vfp_stoi(float32 s)
-{
-    union {
-        uint32_t i;
-        float32 s;
-    } v;
-
-    v.s = s;
-    return v.i;
-}
-
-static inline float64 vfp_itod(uint64_t i)
-{
-    union {
-        uint64_t i;
-        float64 d;
-    } v;
-
-    v.i = i;
-    return v.d;
-}
-
-static inline uint64_t vfp_dtoi(float64 d)
-{
-    union {
-        uint64_t i;
-        float64 d;
-    } v;
-
-    v.d = d;
-    return v.i;
-}
-
 /* Integer to float conversion.  */
-float32 VFP_HELPER(uito, s)(float32 x, CPUState *env)
+float32 VFP_HELPER(uito, s)(uint32_t x, CPUState *env)
 {
-    return uint32_to_float32(vfp_stoi(x), &env->vfp.fp_status);
+    return uint32_to_float32(x, &env->vfp.fp_status);
 }
 
-float64 VFP_HELPER(uito, d)(float32 x, CPUState *env)
+float64 VFP_HELPER(uito, d)(uint32_t x, CPUState *env)
 {
-    return uint32_to_float64(vfp_stoi(x), &env->vfp.fp_status);
+    return uint32_to_float64(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(sito, s)(float32 x, CPUState *env)
+float32 VFP_HELPER(sito, s)(uint32_t x, CPUState *env)
 {
-    return int32_to_float32(vfp_stoi(x), &env->vfp.fp_status);
+    return int32_to_float32(x, &env->vfp.fp_status);
 }
 
-float64 VFP_HELPER(sito, d)(float32 x, CPUState *env)
+float64 VFP_HELPER(sito, d)(uint32_t x, CPUState *env)
 {
-    return int32_to_float64(vfp_stoi(x), &env->vfp.fp_status);
+    return int32_to_float64(x, &env->vfp.fp_status);
 }
 
 /* Float to integer conversion.  */
-float32 VFP_HELPER(toui, s)(float32 x, CPUState *env)
+uint32_t VFP_HELPER(toui, s)(float32 x, CPUState *env)
 {
     if (float32_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float32_to_uint32(x, &env->vfp.fp_status));
+    return float32_to_uint32(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(toui, d)(float64 x, CPUState *env)
+uint32_t VFP_HELPER(toui, d)(float64 x, CPUState *env)
 {
     if (float64_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float64_to_uint32(x, &env->vfp.fp_status));
+    return float64_to_uint32(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(tosi, s)(float32 x, CPUState *env)
+uint32_t VFP_HELPER(tosi, s)(float32 x, CPUState *env)
 {
     if (float32_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float32_to_int32(x, &env->vfp.fp_status));
+    return float32_to_int32(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(tosi, d)(float64 x, CPUState *env)
+uint32_t VFP_HELPER(tosi, d)(float64 x, CPUState *env)
 {
     if (float64_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float64_to_int32(x, &env->vfp.fp_status));
+    return float64_to_int32(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(touiz, s)(float32 x, CPUState *env)
+uint32_t VFP_HELPER(touiz, s)(float32 x, CPUState *env)
 {
     if (float32_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float32_to_uint32_round_to_zero(x, &env->vfp.fp_status));
+    return float32_to_uint32_round_to_zero(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(touiz, d)(float64 x, CPUState *env)
+uint32_t VFP_HELPER(touiz, d)(float64 x, CPUState *env)
 {
     if (float64_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float64_to_uint32_round_to_zero(x, &env->vfp.fp_status));
+    return float64_to_uint32_round_to_zero(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(tosiz, s)(float32 x, CPUState *env)
+uint32_t VFP_HELPER(tosiz, s)(float32 x, CPUState *env)
 {
     if (float32_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float32_to_int32_round_to_zero(x, &env->vfp.fp_status));
+    return float32_to_int32_round_to_zero(x, &env->vfp.fp_status);
 }
 
-float32 VFP_HELPER(tosiz, d)(float64 x, CPUState *env)
+uint32_t VFP_HELPER(tosiz, d)(float64 x, CPUState *env)
 {
     if (float64_is_any_nan(x)) {
-        return float32_zero;
+        return 0;
     }
-    return vfp_itos(float64_to_int32_round_to_zero(x, &env->vfp.fp_status));
+    return float64_to_int32_round_to_zero(x, &env->vfp.fp_status);
 }
 
 /* floating point conversion */
@@ -2637,33 +2592,33 @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUState *env)
 }
 
 /* VFP3 fixed point conversion.  */
-#define VFP_CONV_FIX(name, p, ftype, itype, sign) \
-ftype VFP_HELPER(name##to, p)(ftype x, uint32_t shift, CPUState *env) \
+#define VFP_CONV_FIX(name, p, fsz, itype, sign) \
+float##fsz VFP_HELPER(name##to, p)(uint##fsz##_t  x, uint32_t shift, \
+                                   CPUState *env) \
 { \
-    ftype tmp; \
-    tmp = sign##int32_to_##ftype ((itype##_t)vfp_##p##toi(x), \
-                                  &env->vfp.fp_status); \
-    return ftype##_scalbn(tmp, -(int)shift, &env->vfp.fp_status); \
+    float##fsz tmp; \
+    tmp = sign##int32_to_##float##fsz ((itype##_t)x, &env->vfp.fp_status); \
+    return float##fsz##_scalbn(tmp, -(int)shift, &env->vfp.fp_status); \
 } \
-ftype VFP_HELPER(to##name, p)(ftype x, uint32_t shift, CPUState *env) \
+uint##fsz##_t VFP_HELPER(to##name, p)(float##fsz x, uint32_t shift, \
+                                      CPUState *env) \
 { \
-    ftype tmp; \
-    if (ftype##_is_any_nan(x)) { \
-        return ftype##_zero; \
+    float##fsz tmp; \
+    if (float##fsz##_is_any_nan(x)) { \
+        return 0; \
     } \
-    tmp = ftype##_scalbn(x, shift, &env->vfp.fp_status); \
-    return vfp_ito##p(ftype##_to_##itype##_round_to_zero(tmp, \
-        &env->vfp.fp_status)); \
-}
-
-VFP_CONV_FIX(sh, d, float64, int16, )
-VFP_CONV_FIX(sl, d, float64, int32, )
-VFP_CONV_FIX(uh, d, float64, uint16, u)
-VFP_CONV_FIX(ul, d, float64, uint32, u)
-VFP_CONV_FIX(sh, s, float32, int16, )
-VFP_CONV_FIX(sl, s, float32, int32, )
-VFP_CONV_FIX(uh, s, float32, uint16, u)
-VFP_CONV_FIX(ul, s, float32, uint32, u)
+    tmp = float##fsz##_scalbn(x, shift, &env->vfp.fp_status); \
+    return float##fsz##_to_##itype##_round_to_zero(tmp, &env->vfp.fp_status); \
+}
+
+VFP_CONV_FIX(sh, d, 64, int16, )
+VFP_CONV_FIX(sl, d, 64, int32, )
+VFP_CONV_FIX(uh, d, 64, uint16, u)
+VFP_CONV_FIX(ul, d, 64, uint32, u)
+VFP_CONV_FIX(sh, s, 32, int16, )
+VFP_CONV_FIX(sl, s, 32, int32, )
+VFP_CONV_FIX(uh, s, 32, uint16, u)
+VFP_CONV_FIX(ul, s, 32, uint32, u)
 #undef VFP_CONV_FIX
 
 /* Half precision conversions.  */
diff --git a/target-arm/helpers.h b/target-arm/helpers.h
index e2260b6..35ca388 100644
--- a/target-arm/helpers.h
+++ b/target-arm/helpers.h
@@ -96,36 +96,36 @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 
-DEF_HELPER_2(vfp_uitos, f32, f32, env)
-DEF_HELPER_2(vfp_uitod, f64, f32, env)
-DEF_HELPER_2(vfp_sitos, f32, f32, env)
-DEF_HELPER_2(vfp_sitod, f64, f32, env)
-
-DEF_HELPER_2(vfp_touis, f32, f32, env)
-DEF_HELPER_2(vfp_touid, f32, f64, env)
-DEF_HELPER_2(vfp_touizs, f32, f32, env)
-DEF_HELPER_2(vfp_touizd, f32, f64, env)
-DEF_HELPER_2(vfp_tosis, f32, f32, env)
-DEF_HELPER_2(vfp_tosid, f32, f64, env)
-DEF_HELPER_2(vfp_tosizs, f32, f32, env)
-DEF_HELPER_2(vfp_tosizd, f32, f64, env)
-
-DEF_HELPER_3(vfp_toshs, f32, f32, i32, env)
-DEF_HELPER_3(vfp_tosls, f32, f32, i32, env)
-DEF_HELPER_3(vfp_touhs, f32, f32, i32, env)
-DEF_HELPER_3(vfp_touls, f32, f32, i32, env)
-DEF_HELPER_3(vfp_toshd, f64, f64, i32, env)
-DEF_HELPER_3(vfp_tosld, f64, f64, i32, env)
-DEF_HELPER_3(vfp_touhd, f64, f64, i32, env)
-DEF_HELPER_3(vfp_tould, f64, f64, i32, env)
-DEF_HELPER_3(vfp_shtos, f32, f32, i32, env)
-DEF_HELPER_3(vfp_sltos, f32, f32, i32, env)
-DEF_HELPER_3(vfp_uhtos, f32, f32, i32, env)
-DEF_HELPER_3(vfp_ultos, f32, f32, i32, env)
-DEF_HELPER_3(vfp_shtod, f64, f64, i32, env)
-DEF_HELPER_3(vfp_sltod, f64, f64, i32, env)
-DEF_HELPER_3(vfp_uhtod, f64, f64, i32, env)
-DEF_HELPER_3(vfp_ultod, f64, f64, i32, env)
+DEF_HELPER_2(vfp_uitos, f32, i32, env)
+DEF_HELPER_2(vfp_uitod, f64, i32, env)
+DEF_HELPER_2(vfp_sitos, f32, i32, env)
+DEF_HELPER_2(vfp_sitod, f64, i32, env)
+
+DEF_HELPER_2(vfp_touis, i32, f32, env)
+DEF_HELPER_2(vfp_touid, i32, f64, env)
+DEF_HELPER_2(vfp_touizs, i32, f32, env)
+DEF_HELPER_2(vfp_touizd, i32, f64, env)
+DEF_HELPER_2(vfp_tosis, i32, f32, env)
+DEF_HELPER_2(vfp_tosid, i32, f64, env)
+DEF_HELPER_2(vfp_tosizs, i32, f32, env)
+DEF_HELPER_2(vfp_tosizd, i32, f64, env)
+
+DEF_HELPER_3(vfp_toshs, i32, f32, i32, env)
+DEF_HELPER_3(vfp_tosls, i32, f32, i32, env)
+DEF_HELPER_3(vfp_touhs, i32, f32, i32, env)
+DEF_HELPER_3(vfp_touls, i32, f32, i32, env)
+DEF_HELPER_3(vfp_toshd, i64, f64, i32, env)
+DEF_HELPER_3(vfp_tosld, i64, f64, i32, env)
+DEF_HELPER_3(vfp_touhd, i64, f64, i32, env)
+DEF_HELPER_3(vfp_tould, i64, f64, i32, env)
+DEF_HELPER_3(vfp_shtos, f32, i32, i32, env)
+DEF_HELPER_3(vfp_sltos, f32, i32, i32, env)
+DEF_HELPER_3(vfp_uhtos, f32, i32, i32, env)
+DEF_HELPER_3(vfp_ultos, f32, i32, i32, env)
+DEF_HELPER_3(vfp_shtod, f64, i64, i32, env)
+DEF_HELPER_3(vfp_sltod, f64, i64, i32, env)
+DEF_HELPER_3(vfp_uhtod, f64, i64, i32, env)
+DEF_HELPER_3(vfp_ultod, f64, i64, i32, env)
 
 DEF_HELPER_2(vfp_fcvt_f16_to_f32, f32, i32, env)
 DEF_HELPER_2(vfp_fcvt_f32_to_f16, i32, f32, env)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints Peter Maydell
@ 2011-04-01 18:25   ` Blue Swirl
  0 siblings, 0 replies; 23+ messages in thread
From: Blue Swirl @ 2011-04-01 18:25 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Aurelien Jarno, qemu-devel

On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> Correct the argument and return types for the float<->int conversion helper
> functions so that integer arguments and return values are declared as
> uint32_t/uint64_t, not float32/float64. This allows us to remove the
> hand-rolled functions which were doing bitwise copies between the types
> via unions.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Nathan Froyd <froydnj@codesourcery.com>
> ---
>  target-arm/helper.c  |  155 ++++++++++++++++++--------------------------------
>  target-arm/helpers.h |   60 ++++++++++----------
>  2 files changed, 85 insertions(+), 130 deletions(-)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 78f3d39..6788a4c 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2486,135 +2486,90 @@ DO_VFP_cmp(s, float32)
>  DO_VFP_cmp(d, float64)
>  #undef DO_VFP_cmp
>
> -/* Helper routines to perform bitwise copies between float and int.  */
> -static inline float32 vfp_itos(uint32_t i)
> -{
> -    union {
> -        uint32_t i;
> -        float32 s;
> -    } v;
> -
> -    v.i = i;
> -    return v.s;
> -}
> -
> -static inline uint32_t vfp_stoi(float32 s)
> -{
> -    union {
> -        uint32_t i;
> -        float32 s;
> -    } v;
> -
> -    v.s = s;
> -    return v.i;
> -}
> -
> -static inline float64 vfp_itod(uint64_t i)
> -{
> -    union {
> -        uint64_t i;
> -        float64 d;
> -    } v;
> -
> -    v.i = i;
> -    return v.d;
> -}
> -
> -static inline uint64_t vfp_dtoi(float64 d)
> -{
> -    union {
> -        uint64_t i;
> -        float64 d;
> -    } v;
> -
> -    v.d = d;
> -    return v.i;
> -}
> -
>  /* Integer to float conversion.  */
> -float32 VFP_HELPER(uito, s)(float32 x, CPUState *env)
> +float32 VFP_HELPER(uito, s)(uint32_t x, CPUState *env)

If you moved these functions to op_helper.c, passing env would not be
needed anymore.

Another possible optimization is that maybe the softfloat functions
could be used directly as helpers if type of fp_status could be
changed to something that can be passed in a register, like uint32_t.
This would be useful for most targets.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status Peter Maydell
@ 2011-04-01 18:29   ` Blue Swirl
  2011-04-01 22:33     ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Blue Swirl @ 2011-04-01 18:29 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Aurelien Jarno, qemu-devel

On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> Make the Neon helper routines use the correct FP status from
> the CPUEnv rather than using a dummy static one. This means
> they will correctly handle denormals and NaNs and will set
> FPSCR exception bits properly.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/helpers.h     |   22 +++++++++++-----------
>  target-arm/neon_helper.c |   21 ++++++++++-----------
>  target-arm/translate.c   |   42 ++++++++++++++++++++++--------------------
>  3 files changed, 43 insertions(+), 42 deletions(-)
>
> diff --git a/target-arm/helpers.h b/target-arm/helpers.h
> index bd6977c..e2260b6 100644
> --- a/target-arm/helpers.h
> +++ b/target-arm/helpers.h
> @@ -350,17 +350,17 @@ DEF_HELPER_2(neon_qneg_s8, i32, env, i32)
>  DEF_HELPER_2(neon_qneg_s16, i32, env, i32)
>  DEF_HELPER_2(neon_qneg_s32, i32, env, i32)
>
> -DEF_HELPER_2(neon_min_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_max_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_abd_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_add_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_sub_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_mul_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_ceq_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_cge_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_cgt_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_acge_f32, i32, i32, i32)
> -DEF_HELPER_2(neon_acgt_f32, i32, i32, i32)
> +DEF_HELPER_3(neon_min_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_max_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_abd_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_add_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_sub_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_mul_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_ceq_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_cge_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_cgt_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_acge_f32, i32, env, i32, i32)
> +DEF_HELPER_3(neon_acgt_f32, i32, env, i32, i32)
>
>  /* iwmmxt_helper.c */
>  DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64)
> diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
> index 002a9c1..97bc1e6 100644
> --- a/target-arm/neon_helper.c
> +++ b/target-arm/neon_helper.c
> @@ -18,8 +18,7 @@
>
>  #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] = CPSR_Q
>
> -static float_status neon_float_status;
> -#define NFS &neon_float_status
> +#define NFS (&env->vfp.standard_fp_status)
>
>  /* Helper routines to perform bitwise copies between float and int.  */
>  static inline float32 vfp_itos(uint32_t i)
> @@ -1794,21 +1793,21 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x)
>  }
>
>  /* NEON Float helpers.  */
> -uint32_t HELPER(neon_min_f32)(uint32_t a, uint32_t b)
> +uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b)

I didn't check, but if neon_helper.c is compiled like op_helper.c,
passing env should not be needed. If that is not the case, the
functions could be moved to op_helper.c.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-01 18:29   ` Blue Swirl
@ 2011-04-01 22:33     ` Peter Maydell
  2011-04-03  9:41       ` Blue Swirl
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2011-04-01 22:33 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Aurelien Jarno, qemu-devel

On 1 April 2011 19:29, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Make the Neon helper routines use the correct FP status from
>> the CPUEnv rather than using a dummy static one. This means
>> they will correctly handle denormals and NaNs and will set
>> FPSCR exception bits properly.

> I didn't check, but if neon_helper.c is compiled like op_helper.c,
> passing env should not be needed.

It isn't; see the comments when this patch was first posted.

> If that is not the case, the
> functions could be moved to op_helper.c.

Nobody seemed to have a coherent rule about when functions
should be in op_helper.c and use the global 'env' variable
and when they should be in some other file and have 'env'
passed as a parameter (or indeed why only op_helper.c
should be special in this way). Currently the ARM target has
both kinds of helper function. So I took the straightforward
approach of not moving code around wholesale, and just
passed in the env pointer, consistent with the way we already
handle most functions that talk to softfloat.

-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-01 22:33     ` Peter Maydell
@ 2011-04-03  9:41       ` Blue Swirl
  2011-04-03 10:51         ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Blue Swirl @ 2011-04-03  9:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Aurelien Jarno, qemu-devel

On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 1 April 2011 19:29, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Fri, Apr 1, 2011 at 5:30 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> Make the Neon helper routines use the correct FP status from
>>> the CPUEnv rather than using a dummy static one. This means
>>> they will correctly handle denormals and NaNs and will set
>>> FPSCR exception bits properly.
>
>> I didn't check, but if neon_helper.c is compiled like op_helper.c,
>> passing env should not be needed.
>
> It isn't; see the comments when this patch was first posted.
>
>> If that is not the case, the
>> functions could be moved to op_helper.c.
>
> Nobody seemed to have a coherent rule about when functions
> should be in op_helper.c and use the global 'env' variable
> and when they should be in some other file and have 'env'
> passed as a parameter (or indeed why only op_helper.c
> should be special in this way). Currently the ARM target has
> both kinds of helper function. So I took the straightforward
> approach of not moving code around wholesale, and just
> passed in the env pointer, consistent with the way we already
> handle most functions that talk to softfloat.

In general, helpers for the translated code belong to op_helper.c.
They can and should access global env directly for speed. If a helper
is used for both translated code and outside of it, a wrapper should
be added to do global env shuffling (or for example a copy without
shuffling added).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03  9:41       ` Blue Swirl
@ 2011-04-03 10:51         ` Peter Maydell
  2011-04-03 11:10           ` Blue Swirl
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2011-04-03 10:51 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Aurelien Jarno, qemu-devel

On 3 April 2011 10:41, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Nobody seemed to have a coherent rule about when functions
>> should be in op_helper.c and use the global 'env' variable
>> and when they should be in some other file and have 'env'
>> passed as a parameter

> In general, helpers for the translated code belong to op_helper.c.
> They can and should access global env directly for speed. If a helper
> is used for both translated code and outside of it, a wrapper should
> be added to do global env shuffling (or for example a copy without
> shuffling added).

OK, we can do that, but at the moment "helper function not in
op_helper.c" is hugely in the majority so there's a lot of
code we'd be moving around:

$ grep -c HELPER target-arm/*.c
target-arm/helper.c:68
target-arm/iwmmxt_helper.c:59
target-arm/machine.c:0
target-arm/neon_helper.c:103
target-arm/op_helper.c:28
target-arm/translate.c:2

(ie just 10% or so of ARM helper functions are in op_helper.c)

...and this cleanup would basically amount to folding
neon_helper.c, iwmmxt_helper.c and bits of helper.c into
op_helper.c (and then removing the 'env' parameters, so
a big patch to translate.c as well, which I don't suppose
anybody maintaining an out-of-tree target-arm patchset will
thank us for :-)).

But I can submit a patch to do that if it's the right thing.

-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 10:51         ` Peter Maydell
@ 2011-04-03 11:10           ` Blue Swirl
  2011-04-03 11:21             ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Blue Swirl @ 2011-04-03 11:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Aurelien Jarno, qemu-devel

On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 3 April 2011 10:41, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Sat, Apr 2, 2011 at 1:33 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> Nobody seemed to have a coherent rule about when functions
>>> should be in op_helper.c and use the global 'env' variable
>>> and when they should be in some other file and have 'env'
>>> passed as a parameter
>
>> In general, helpers for the translated code belong to op_helper.c.
>> They can and should access global env directly for speed. If a helper
>> is used for both translated code and outside of it, a wrapper should
>> be added to do global env shuffling (or for example a copy without
>> shuffling added).
>
> OK, we can do that, but at the moment "helper function not in
> op_helper.c" is hugely in the majority so there's a lot of
> code we'd be moving around:
>
> $ grep -c HELPER target-arm/*.c
> target-arm/helper.c:68
> target-arm/iwmmxt_helper.c:59
> target-arm/machine.c:0
> target-arm/neon_helper.c:103
> target-arm/op_helper.c:28
> target-arm/translate.c:2
>
> (ie just 10% or so of ARM helper functions are in op_helper.c)
>
> ...and this cleanup would basically amount to folding
> neon_helper.c, iwmmxt_helper.c and bits of helper.c into
> op_helper.c (and then removing the 'env' parameters, so
> a big patch to translate.c as well, which I don't suppose
> anybody maintaining an out-of-tree target-arm patchset will
> thank us for :-)).

Alternatively those files could be compiled with HELPER_CFLAGS. In
either case, the code would have to be checked for 'env' usage and
adjusted.

> But I can submit a patch to do that if it's the right thing.

It's not so much about correctness, but performance. All generated
code already has access to global env, so passing it via helper
arguments requires extra instructions which can be avoided.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 11:10           ` Blue Swirl
@ 2011-04-03 11:21             ` Peter Maydell
  2011-04-03 11:52               ` Blue Swirl
  2011-04-03 15:12               ` Aurelien Jarno
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Maydell @ 2011-04-03 11:21 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Aurelien Jarno, qemu-devel

On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> But I can submit a patch to do that if it's the right thing.
>
> It's not so much about correctness, but performance. All generated
> code already has access to global env, so passing it via helper
> arguments requires extra instructions which can be avoided.

Yes; I meant "right thing" in the more general senses of "is best
practice for qemu code" and "causes my patches to be accepted" :-)

Anyway, how about I do a version of this patch which moves
the affected neon helpers to op_helper.c rather than adding
an env parameter (which would avoid changes that a cleanup
would just have to revert); but leave patch 10 as-is (that's
the one which is touching vfp helpers, but it is just cleanup
which isn't changing how they handle env) ?

Then I can do a separate patchset to move other helpers,
rather than tangling a code-cleanup patchset with Neon
correctness fixes.

-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 11:21             ` Peter Maydell
@ 2011-04-03 11:52               ` Blue Swirl
  2011-04-03 15:12               ` Aurelien Jarno
  1 sibling, 0 replies; 23+ messages in thread
From: Blue Swirl @ 2011-04-03 11:52 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Aurelien Jarno, qemu-devel

On Sun, Apr 3, 2011 at 2:21 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
>> On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> But I can submit a patch to do that if it's the right thing.
>>
>> It's not so much about correctness, but performance. All generated
>> code already has access to global env, so passing it via helper
>> arguments requires extra instructions which can be avoided.
>
> Yes; I meant "right thing" in the more general senses of "is best
> practice for qemu code" and "causes my patches to be accepted" :-)
>
> Anyway, how about I do a version of this patch which moves
> the affected neon helpers to op_helper.c rather than adding
> an env parameter (which would avoid changes that a cleanup
> would just have to revert); but leave patch 10 as-is (that's
> the one which is touching vfp helpers, but it is just cleanup
> which isn't changing how they handle env) ?
>
> Then I can do a separate patchset to move other helpers,
> rather than tangling a code-cleanup patchset with Neon
> correctness fixes.

Sounds OK to me, but this is entirely up to you and ARM maintainers.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 11:21             ` Peter Maydell
  2011-04-03 11:52               ` Blue Swirl
@ 2011-04-03 15:12               ` Aurelien Jarno
  2011-04-03 15:32                 ` Peter Maydell
  1 sibling, 1 reply; 23+ messages in thread
From: Aurelien Jarno @ 2011-04-03 15:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Blue Swirl, qemu-devel

On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
> On 3 April 2011 12:10, Blue Swirl <blauwirbel@gmail.com> wrote:
> > On Sun, Apr 3, 2011 at 1:51 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> But I can submit a patch to do that if it's the right thing.
> >
> > It's not so much about correctness, but performance. All generated
> > code already has access to global env, so passing it via helper
> > arguments requires extra instructions which can be avoided.
> 
> Yes; I meant "right thing" in the more general senses of "is best
> practice for qemu code" and "causes my patches to be accepted" :-)
> 
> Anyway, how about I do a version of this patch which moves
> the affected neon helpers to op_helper.c rather than adding
> an env parameter (which would avoid changes that a cleanup
> would just have to revert); but leave patch 10 as-is (that's
> the one which is touching vfp helpers, but it is just cleanup
> which isn't changing how they handle env) ?
> 
> Then I can do a separate patchset to move other helpers,
> rather than tangling a code-cleanup patchset with Neon
> correctness fixes.
> 
 
This solution looks fine for me. That said, I am not sure moving
everything to op_helper.c is the best solution. I would rather go for
compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
file which is messy to edit, and long to compile.

I leave you choose what you prefer.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 15:12               ` Aurelien Jarno
@ 2011-04-03 15:32                 ` Peter Maydell
  2011-04-03 16:01                   ` Aurelien Jarno
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2011-04-03 15:32 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Blue Swirl, qemu-devel

On 3 April 2011 16:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
>> Anyway, how about I do a version of this patch which moves
>> the affected neon helpers to op_helper.c rather than adding
>> an env parameter

>> Then I can do a separate patchset to move other helpers,
>> rather than tangling a code-cleanup patchset with Neon
>> correctness fixes.

> This solution looks fine for me. That said, I am not sure moving
> everything to op_helper.c is the best solution. I would rather go for
> compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
> file which is messy to edit, and long to compile.

That sounds better, actually, and avoids moving too much
code around. Still leaves the choice of:
 * move VFP helpers from target-arm/helper.c to another file
 * compile all target-*/helper.c with HELPER_CFLAGS
 * arm-specific exception in Makefile.target

of which I'll go for option 1.

-- PMM

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status
  2011-04-03 15:32                 ` Peter Maydell
@ 2011-04-03 16:01                   ` Aurelien Jarno
  0 siblings, 0 replies; 23+ messages in thread
From: Aurelien Jarno @ 2011-04-03 16:01 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Blue Swirl, qemu-devel

On Sun, Apr 03, 2011 at 04:32:51PM +0100, Peter Maydell wrote:
> On 3 April 2011 16:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > On Sun, Apr 03, 2011 at 12:21:49PM +0100, Peter Maydell wrote:
> >> Anyway, how about I do a version of this patch which moves
> >> the affected neon helpers to op_helper.c rather than adding
> >> an env parameter
> 
> >> Then I can do a separate patchset to move other helpers,
> >> rather than tangling a code-cleanup patchset with Neon
> >> correctness fixes.
> 
> > This solution looks fine for me. That said, I am not sure moving
> > everything to op_helper.c is the best solution. I would rather go for
> > compiling *_helper.c with HELPER_CFLAGS, which avoids having one big
> > file which is messy to edit, and long to compile.
> 
> That sounds better, actually, and avoids moving too much
> code around. Still leaves the choice of:
>  * move VFP helpers from target-arm/helper.c to another file
>  * compile all target-*/helper.c with HELPER_CFLAGS

I am not sure you can do this one, as some functions there are not
called from the TCG code nor from code where env is in a fixed register.

>  * arm-specific exception in Makefile.target
> 
> of which I'll go for option 1.
> 
That's also my preferred option.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes
  2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
                   ` (9 preceding siblings ...)
  2011-04-01 14:30 ` [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints Peter Maydell
@ 2011-04-03 16:03 ` Aurelien Jarno
  10 siblings, 0 replies; 23+ messages in thread
From: Aurelien Jarno @ 2011-04-03 16:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On Fri, Apr 01, 2011 at 03:30:33PM +0100, Peter Maydell wrote:
> This is a pull request for a set of patches fixing various minor Neon
> problems for ARM targets, which I sent to the list a couple of weeks ago.
> A few of them got reviewed-by Nathan, one had some minor discussion which
> didn't amount to a request for any change, the rest had no comments.
> 
> The softfloat patch includes a trivial change since the original
> posted version: the MINMAX macro now has a "uint ## s ## _t" rather than
> a "bits ## s" since the bits32/bits64 types were removed in commit bb98fe4.
> 
> The following changes since commit a5086f95421e43c7b9e1b28a111aae0be4848117:
> 
>   lm32: use lookup table for opcodes (2011-03-31 08:54:05 +0200)
> 
> are available in the git repository at:
>   git://git.linaro.org/people/pmaydell/qemu-arm.git for-upstream
> 
> Peter Maydell (10):
>       target-arm: Make Neon helper routines use correct FP status
>       target-arm/neon_helper.c: Use make_float32/float32_val macros
>       target-arm: Return right result for Neon comparison with NaNs
>       target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling
>       target-arm: Correct ABD's handling of negative zeroes
>       softfloat: Add float*_min() and float*_max() functions
>       target-arm: Use new softfloat min/max functions for VMAX, VMIN
>       target-arm: Fix VLD of single element to all lanes
>       target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space
>       target-arm/helper.c: For float-int conversion helpers pass ints as ints
> 
>  fpu/softfloat.c          |   49 +++++++++++++++
>  fpu/softfloat.h          |    4 +
>  target-arm/helper.c      |  155 ++++++++++++++++-----------------------------
>  target-arm/helpers.h     |   82 ++++++++++++------------
>  target-arm/neon_helper.c |  103 ++++++++++++------------------
>  target-arm/translate.c   |  152 +++++++++++++++++++++++++++++----------------
>  6 files changed, 289 insertions(+), 256 deletions(-)
> 

I have pulled all patches, except the first one, given the current
discussion. Thanks. 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-04-03 16:03 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-01 14:30 [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 01/10] target-arm: Make Neon helper routines use correct FP status Peter Maydell
2011-04-01 18:29   ` Blue Swirl
2011-04-01 22:33     ` Peter Maydell
2011-04-03  9:41       ` Blue Swirl
2011-04-03 10:51         ` Peter Maydell
2011-04-03 11:10           ` Blue Swirl
2011-04-03 11:21             ` Peter Maydell
2011-04-03 11:52               ` Blue Swirl
2011-04-03 15:12               ` Aurelien Jarno
2011-04-03 15:32                 ` Peter Maydell
2011-04-03 16:01                   ` Aurelien Jarno
2011-04-01 14:30 ` [Qemu-devel] [PATCH 02/10] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 03/10] target-arm: Return right result for Neon comparison with NaNs Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 04/10] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 05/10] target-arm: Correct ABD's handling of negative zeroes Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 06/10] softfloat: Add float*_min() and float*_max() functions Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 07/10] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 08/10] target-arm: Fix VLD of single element to all lanes Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 09/10] target-arm: Don't leak TCG temp for UNDEFs in Neon load/store space Peter Maydell
2011-04-01 14:30 ` [Qemu-devel] [PATCH 10/10] target-arm/helper.c: For float-int conversion helpers pass ints as ints Peter Maydell
2011-04-01 18:25   ` Blue Swirl
2011-04-03 16:03 ` [Qemu-devel] [PATCH 00/10] [PULL] ARM Neon fixes Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).