* [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) @ 2011-03-11 18:12 Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status Peter Maydell ` (6 more replies) 0 siblings, 7 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches This patch series fixes a number of minor Neon bugs, mostly related to edge cases involving NaN handling. The patch series includes adding support for float*_min() and float*_max() operations to softfloat, because this makes it easy to get NaN propagation and handling of zeroes correct. I've only added support for float32 and float64 versions because I think the implementation is complex enough that there's no point providing an untested and unused version for float128. (floatx80 doesn't even implement the comparison ops so min/max is definitely a stretch there.) I note in passing that patch 6 (the softfloat one) seems to hang scripts/checkpatch.pl, or at least make it take an unreasonably long time to finish... Peter Maydell (7): target-arm: Make Neon helper routines use correct FP status target-arm/neon_helper.c: Use make_float32/float32_val macros target-arm: Return right result for Neon comparison with NaNs target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling target-arm: Correct ABD's handling of negative zeroes softfloat: Add float*_min() and float*_max() functions target-arm: Use new softfloat min/max functions for VMAX,VMIN fpu/softfloat.c | 49 ++++++++++++++++++++++ fpu/softfloat.h | 4 ++ target-arm/helpers.h | 22 +++++----- target-arm/neon_helper.c | 103 ++++++++++++++++++--------------------------- target-arm/translate.c | 60 +++++++++++++++------------ 5 files changed, 139 insertions(+), 99 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:30 ` Nathan Froyd 2011-03-11 18:12 ` [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell ` (5 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Make the Neon helper routines use the correct FP status from the CPUEnv rather than using a dummy static one. This means they will correctly handle denormals and NaNs and will set FPSCR exception bits properly. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helpers.h | 22 +++++++++++----------- target-arm/neon_helper.c | 21 ++++++++++----------- target-arm/translate.c | 42 ++++++++++++++++++++++-------------------- 3 files changed, 43 insertions(+), 42 deletions(-) diff --git a/target-arm/helpers.h b/target-arm/helpers.h index bd6977c..e2260b6 100644 --- a/target-arm/helpers.h +++ b/target-arm/helpers.h @@ -350,17 +350,17 @@ DEF_HELPER_2(neon_qneg_s8, i32, env, i32) DEF_HELPER_2(neon_qneg_s16, i32, env, i32) DEF_HELPER_2(neon_qneg_s32, i32, env, i32) -DEF_HELPER_2(neon_min_f32, i32, i32, i32) -DEF_HELPER_2(neon_max_f32, i32, i32, i32) -DEF_HELPER_2(neon_abd_f32, i32, i32, i32) -DEF_HELPER_2(neon_add_f32, i32, i32, i32) -DEF_HELPER_2(neon_sub_f32, i32, i32, i32) -DEF_HELPER_2(neon_mul_f32, i32, i32, i32) -DEF_HELPER_2(neon_ceq_f32, i32, i32, i32) -DEF_HELPER_2(neon_cge_f32, i32, i32, i32) -DEF_HELPER_2(neon_cgt_f32, i32, i32, i32) -DEF_HELPER_2(neon_acge_f32, i32, i32, i32) -DEF_HELPER_2(neon_acgt_f32, i32, i32, i32) +DEF_HELPER_3(neon_min_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_max_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_abd_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_add_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_sub_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_mul_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_ceq_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_cge_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_cgt_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_acge_f32, i32, env, i32, i32) +DEF_HELPER_3(neon_acgt_f32, i32, env, i32, i32) /* iwmmxt_helper.c */ DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index 002a9c1..97bc1e6 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -18,8 +18,7 @@ #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] = CPSR_Q -static float_status neon_float_status; -#define NFS &neon_float_status +#define NFS (&env->vfp.standard_fp_status) /* Helper routines to perform bitwise copies between float and int. */ static inline float32 vfp_itos(uint32_t i) @@ -1794,21 +1793,21 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x) } /* NEON Float helpers. */ -uint32_t HELPER(neon_min_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = vfp_itos(a); float32 f1 = vfp_itos(b); return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b; } -uint32_t HELPER(neon_max_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = vfp_itos(a); float32 f1 = vfp_itos(b); return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b; } -uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = vfp_itos(a); float32 f1 = vfp_itos(b); @@ -1817,24 +1816,24 @@ uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b) : float32_sub(f1, f0, NFS)); } -uint32_t HELPER(neon_add_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b) { return vfp_stoi(float32_add(vfp_itos(a), vfp_itos(b), NFS)); } -uint32_t HELPER(neon_sub_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_sub_f32)(CPUState *env, uint32_t a, uint32_t b) { return vfp_stoi(float32_sub(vfp_itos(a), vfp_itos(b), NFS)); } -uint32_t HELPER(neon_mul_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b) { return vfp_stoi(float32_mul(vfp_itos(a), vfp_itos(b), NFS)); } /* Floating point comparisons produce an integer result. */ #define NEON_VOP_FCMP(name, cmp) \ -uint32_t HELPER(neon_##name)(uint32_t a, uint32_t b) \ +uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \ { \ if (float32_compare_quiet(vfp_itos(a), vfp_itos(b), NFS) cmp 0) \ return ~0; \ @@ -1846,14 +1845,14 @@ NEON_VOP_FCMP(ceq_f32, ==) NEON_VOP_FCMP(cge_f32, >=) NEON_VOP_FCMP(cgt_f32, >) -uint32_t HELPER(neon_acge_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = float32_abs(vfp_itos(a)); float32 f1 = float32_abs(vfp_itos(b)); return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0; } -uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b) +uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = float32_abs(vfp_itos(a)); float32 f1 = float32_abs(vfp_itos(b)); diff --git a/target-arm/translate.c b/target-arm/translate.c index 062de5e..f4be8dc 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -4517,56 +4517,56 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) case 26: /* Floating point arithnetic. */ switch ((u << 2) | size) { case 0: /* VADD */ - gen_helper_neon_add_f32(tmp, tmp, tmp2); + gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2); break; case 2: /* VSUB */ - gen_helper_neon_sub_f32(tmp, tmp, tmp2); + gen_helper_neon_sub_f32(tmp, cpu_env, tmp, tmp2); break; case 4: /* VPADD */ - gen_helper_neon_add_f32(tmp, tmp, tmp2); + gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2); break; case 6: /* VABD */ - gen_helper_neon_abd_f32(tmp, tmp, tmp2); + gen_helper_neon_abd_f32(tmp, cpu_env, tmp, tmp2); break; default: return 1; } break; case 27: /* Float multiply. */ - gen_helper_neon_mul_f32(tmp, tmp, tmp2); + gen_helper_neon_mul_f32(tmp, cpu_env, tmp, tmp2); if (!u) { tcg_temp_free_i32(tmp2); tmp2 = neon_load_reg(rd, pass); if (size == 0) { - gen_helper_neon_add_f32(tmp, tmp, tmp2); + gen_helper_neon_add_f32(tmp, cpu_env, tmp, tmp2); } else { - gen_helper_neon_sub_f32(tmp, tmp2, tmp); + gen_helper_neon_sub_f32(tmp, cpu_env, tmp2, tmp); } } break; case 28: /* Float compare. */ if (!u) { - gen_helper_neon_ceq_f32(tmp, tmp, tmp2); + gen_helper_neon_ceq_f32(tmp, cpu_env, tmp, tmp2); } else { if (size == 0) - gen_helper_neon_cge_f32(tmp, tmp, tmp2); + gen_helper_neon_cge_f32(tmp, cpu_env, tmp, tmp2); else - gen_helper_neon_cgt_f32(tmp, tmp, tmp2); + gen_helper_neon_cgt_f32(tmp, cpu_env, tmp, tmp2); } break; case 29: /* Float compare absolute. */ if (!u) return 1; if (size == 0) - gen_helper_neon_acge_f32(tmp, tmp, tmp2); + gen_helper_neon_acge_f32(tmp, cpu_env, tmp, tmp2); else - gen_helper_neon_acgt_f32(tmp, tmp, tmp2); + gen_helper_neon_acgt_f32(tmp, cpu_env, tmp, tmp2); break; case 30: /* Float min/max. */ if (size == 0) - gen_helper_neon_max_f32(tmp, tmp, tmp2); + gen_helper_neon_max_f32(tmp, cpu_env, tmp, tmp2); else - gen_helper_neon_min_f32(tmp, tmp, tmp2); + gen_helper_neon_min_f32(tmp, cpu_env, tmp, tmp2); break; case 31: if (size == 0) @@ -5230,7 +5230,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2); } } else if (op & 1) { - gen_helper_neon_mul_f32(tmp, tmp, tmp2); + gen_helper_neon_mul_f32(tmp, cpu_env, tmp, tmp2); } else { switch (size) { case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break; @@ -5248,13 +5248,15 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_neon_add(size, tmp, tmp2); break; case 1: - gen_helper_neon_add_f32(tmp, tmp, tmp2); + gen_helper_neon_add_f32(tmp, cpu_env, + tmp, tmp2); break; case 4: gen_neon_rsb(size, tmp, tmp2); break; case 5: - gen_helper_neon_sub_f32(tmp, tmp2, tmp); + gen_helper_neon_sub_f32(tmp, cpu_env, + tmp2, tmp); break; default: abort(); @@ -5639,21 +5641,21 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) break; case 24: case 27: /* Float VCGT #0, Float VCLE #0 */ tmp2 = tcg_const_i32(0); - gen_helper_neon_cgt_f32(tmp, tmp, tmp2); + gen_helper_neon_cgt_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); if (op == 27) tcg_gen_not_i32(tmp, tmp); break; case 25: case 28: /* Float VCGE #0, Float VCLT #0 */ tmp2 = tcg_const_i32(0); - gen_helper_neon_cge_f32(tmp, tmp, tmp2); + gen_helper_neon_cge_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); if (op == 28) tcg_gen_not_i32(tmp, tmp); break; case 26: /* Float VCEQ #0 */ tmp2 = tcg_const_i32(0); - gen_helper_neon_ceq_f32(tmp, tmp, tmp2); + gen_helper_neon_ceq_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); break; case 30: /* Float VABS */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-11 18:12 ` [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status Peter Maydell @ 2011-03-11 18:30 ` Nathan Froyd 2011-03-11 22:31 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Nathan Froyd @ 2011-03-11 18:30 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, patches On Fri, Mar 11, 2011 at 06:12:20PM +0000, Peter Maydell wrote: > Make the Neon helper routines use the correct FP status from > the CPUEnv rather than using a dummy static one. This means > they will correctly handle denormals and NaNs and will set > FPSCR exception bits properly. Is there a reason that you don't simply use the global env rather than passing in an extra parameter everywhere? I wonder if it'd be worthwhile just to merge these functions into op_helper.c, since we have a proper FP status for NEON bits now. -Nathan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-11 18:30 ` Nathan Froyd @ 2011-03-11 22:31 ` Peter Maydell 2011-03-14 5:35 ` Nathan Froyd 0 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2011-03-11 22:31 UTC (permalink / raw) To: Nathan Froyd; +Cc: qemu-devel, patches On 11 March 2011 18:30, Nathan Froyd <froydnj@codesourcery.com> wrote: > On Fri, Mar 11, 2011 at 06:12:20PM +0000, Peter Maydell wrote: >> Make the Neon helper routines use the correct FP status from >> the CPUEnv rather than using a dummy static one. This means >> they will correctly handle denormals and NaNs and will set >> FPSCR exception bits properly. > > Is there a reason that you don't simply use the global env rather than > passing in an extra parameter everywhere? Just following the pattern that generally seems to be used by most helper functions, ie if you want the CPU env pass it in as a parameter. As far as I know, you can't use the global env unless you're in op_helper.c because that's the only source file compiled with the right flags. > I wonder if it'd be > worthwhile just to merge these functions into op_helper.c, > since we have a proper FP status for NEON bits now. Why move these and not (for instance) the VFP helpers in helper.c which use the CPU env for more or less the same reasons? -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-11 22:31 ` Peter Maydell @ 2011-03-14 5:35 ` Nathan Froyd 2011-03-28 14:15 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Nathan Froyd @ 2011-03-14 5:35 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, patches On Fri, Mar 11, 2011 at 10:31:31PM +0000, Peter Maydell wrote: > On 11 March 2011 18:30, Nathan Froyd <froydnj@codesourcery.com> wrote: > > Is there a reason that you don't simply use the global env rather than > > passing in an extra parameter everywhere? > > Just following the pattern that generally seems to be used by > most helper functions, ie if you want the CPU env pass it in > as a parameter. As far as I know, you can't use the global > env unless you're in op_helper.c because that's the only > source file compiled with the right flags. Oh, right. I am ambivalent as to whether passing env to such functions is the right thing to do or not. > > I wonder if it'd be worthwhile just to merge these functions into > > op_helper.c, since we have a proper FP status for NEON bits now. > > Why move these and not (for instance) the VFP helpers > in helper.c which use the CPU env for more or less the > same reasons? No reason, other than that I wasn't thinking about the VFP helpers. :) -Nathan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-14 5:35 ` Nathan Froyd @ 2011-03-28 14:15 ` Peter Maydell 2011-03-30 18:38 ` Nathan Froyd 0 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2011-03-28 14:15 UTC (permalink / raw) To: Nathan Froyd; +Cc: qemu-devel, patches On 14 March 2011 05:35, Nathan Froyd <froydnj@codesourcery.com> wrote: > On Fri, Mar 11, 2011 at 10:31:31PM +0000, Peter Maydell wrote: >> On 11 March 2011 18:30, Nathan Froyd <froydnj@codesourcery.com> wrote: >> > Is there a reason that you don't simply use the global env rather than >> > passing in an extra parameter everywhere? >> >> Just following the pattern that generally seems to be used by >> most helper functions, ie if you want the CPU env pass it in >> as a parameter. As far as I know, you can't use the global >> env unless you're in op_helper.c because that's the only >> source file compiled with the right flags. > > Oh, right. I am ambivalent as to whether passing env to such functions > is the right thing to do or not. So did this amount to a request for a change to this patchset, or are you happy to let it pass? (I'm planning to stick this patchset into a pull-request with some of the other ARM patches that have had a few weeks for review comment later this week, so if you'd like a change now would be a good time to say so...) thanks -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status 2011-03-28 14:15 ` Peter Maydell @ 2011-03-30 18:38 ` Nathan Froyd 0 siblings, 0 replies; 16+ messages in thread From: Nathan Froyd @ 2011-03-30 18:38 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, patches On Mon, Mar 28, 2011 at 03:15:08PM +0100, Peter Maydell wrote: > On 14 March 2011 05:35, Nathan Froyd <froydnj@codesourcery.com> wrote: > > Oh, right. I am ambivalent as to whether passing env to such functions > > is the right thing to do or not. > > So did this amount to a request for a change to this patchset, > or are you happy to let it pass? I am happy to let it pass. -Nathan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:29 ` Nathan Froyd 2011-03-11 18:12 ` [Qemu-devel] [PATCH 3/7] target-arm: Return right result for Neon comparison with NaNs Peter Maydell ` (4 subsequent siblings) 6 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Use the softfloat make_float32 and float32_val macros to convert between softfloat's float32 type and raw uint32_t types, rather than private conversion functions. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/neon_helper.c | 56 ++++++++++++++-------------------------------- 1 files changed, 17 insertions(+), 39 deletions(-) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index 97bc1e6..4039036 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -20,29 +20,6 @@ #define NFS (&env->vfp.standard_fp_status) -/* Helper routines to perform bitwise copies between float and int. */ -static inline float32 vfp_itos(uint32_t i) -{ - union { - uint32_t i; - float32 s; - } v; - - v.i = i; - return v.s; -} - -static inline uint32_t vfp_stoi(float32 s) -{ - union { - uint32_t i; - float32 s; - } v; - - v.s = s; - return v.i; -} - #define NEON_TYPE1(name, type) \ typedef struct \ { \ @@ -1795,50 +1772,51 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x) /* NEON Float helpers. */ uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = vfp_itos(a); - float32 f1 = vfp_itos(b); + float32 f0 = make_float32(a); + float32 f1 = make_float32(b); return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b; } uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = vfp_itos(a); - float32 f1 = vfp_itos(b); + float32 f0 = make_float32(a); + float32 f1 = make_float32(b); return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b; } uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = vfp_itos(a); - float32 f1 = vfp_itos(b); - return vfp_stoi((float32_compare_quiet(f0, f1, NFS) == 1) + float32 f0 = make_float32(a); + float32 f1 = make_float32(b); + return float32_val((float32_compare_quiet(f0, f1, NFS) == 1) ? float32_sub(f0, f1, NFS) : float32_sub(f1, f0, NFS)); } uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b) { - return vfp_stoi(float32_add(vfp_itos(a), vfp_itos(b), NFS)); + return float32_val(float32_add(make_float32(a), make_float32(b), NFS)); } uint32_t HELPER(neon_sub_f32)(CPUState *env, uint32_t a, uint32_t b) { - return vfp_stoi(float32_sub(vfp_itos(a), vfp_itos(b), NFS)); + return float32_val(float32_sub(make_float32(a), make_float32(b), NFS)); } uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b) { - return vfp_stoi(float32_mul(vfp_itos(a), vfp_itos(b), NFS)); + return float32_val(float32_mul(make_float32(a), make_float32(b), NFS)); } /* Floating point comparisons produce an integer result. */ #define NEON_VOP_FCMP(name, cmp) \ uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \ { \ - if (float32_compare_quiet(vfp_itos(a), vfp_itos(b), NFS) cmp 0) \ + if (float32_compare_quiet(make_float32(a), make_float32(b), NFS) cmp 0) { \ return ~0; \ - else \ + } else { \ return 0; \ + } \ } NEON_VOP_FCMP(ceq_f32, ==) @@ -1847,15 +1825,15 @@ NEON_VOP_FCMP(cgt_f32, >) uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = float32_abs(vfp_itos(a)); - float32 f1 = float32_abs(vfp_itos(b)); + float32 f0 = float32_abs(make_float32(a)); + float32 f1 = float32_abs(make_float32(b)); return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0; } uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = float32_abs(vfp_itos(a)); - float32 f1 = float32_abs(vfp_itos(b)); + float32 f0 = float32_abs(make_float32(a)); + float32 f1 = float32_abs(make_float32(b)); return (float32_compare_quiet(f0, f1, NFS) > 0) ? ~0 : 0; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros 2011-03-11 18:12 ` [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell @ 2011-03-11 18:29 ` Nathan Froyd 0 siblings, 0 replies; 16+ messages in thread From: Nathan Froyd @ 2011-03-11 18:29 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, patches On Fri, Mar 11, 2011 at 06:12:21PM +0000, Peter Maydell wrote: > Use the softfloat make_float32 and float32_val macros to convert between > softfloat's float32 type and raw uint32_t types, rather than private > conversion functions. Reviewed-by: Nathan Froyd <froydnj@codesourcery.com> -Nathan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 3/7] target-arm: Return right result for Neon comparison with NaNs 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 4/7] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell ` (3 subsequent siblings) 6 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Fix the helper functions implementing the Neon floating point comparison ops (VCGE, VCGT, VCEQ, VACGT, VACGE) to return the right answer when one of the values being compared is a NaN. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/neon_helper.c | 28 ++++++++++++++++++---------- 1 files changed, 18 insertions(+), 10 deletions(-) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index 4039036..8eb4cef 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -1809,32 +1809,40 @@ uint32_t HELPER(neon_mul_f32)(CPUState *env, uint32_t a, uint32_t b) } /* Floating point comparisons produce an integer result. */ -#define NEON_VOP_FCMP(name, cmp) \ +#define NEON_VOP_FCMP(name, ok) \ uint32_t HELPER(neon_##name)(CPUState *env, uint32_t a, uint32_t b) \ { \ - if (float32_compare_quiet(make_float32(a), make_float32(b), NFS) cmp 0) { \ - return ~0; \ - } else { \ - return 0; \ + switch (float32_compare_quiet(make_float32(a), make_float32(b), NFS)) { \ + ok return ~0; \ + default: return 0; \ } \ } -NEON_VOP_FCMP(ceq_f32, ==) -NEON_VOP_FCMP(cge_f32, >=) -NEON_VOP_FCMP(cgt_f32, >) +NEON_VOP_FCMP(ceq_f32, case float_relation_equal:) +NEON_VOP_FCMP(cge_f32, case float_relation_equal: case float_relation_greater:) +NEON_VOP_FCMP(cgt_f32, case float_relation_greater:) uint32_t HELPER(neon_acge_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = float32_abs(make_float32(a)); float32 f1 = float32_abs(make_float32(b)); - return (float32_compare_quiet(f0, f1,NFS) >= 0) ? ~0 : 0; + switch (float32_compare_quiet(f0, f1, NFS)) { + case float_relation_equal: + case float_relation_greater: + return ~0; + default: + return 0; + } } uint32_t HELPER(neon_acgt_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = float32_abs(make_float32(a)); float32 f1 = float32_abs(make_float32(b)); - return (float32_compare_quiet(f0, f1, NFS) > 0) ? ~0 : 0; + if (float32_compare_quiet(f0, f1, NFS) == float_relation_greater) { + return ~0; + } + return 0; } #define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1)) -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 4/7] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell ` (2 preceding siblings ...) 2011-03-11 18:12 ` [Qemu-devel] [PATCH 3/7] target-arm: Return right result for Neon comparison with NaNs Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 5/7] target-arm: Correct ABD's handling of negative zeroes Peter Maydell ` (2 subsequent siblings) 6 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Implementing the floating-point versions of VCLE #0 and VCLT #0 by doing a GT comparison and inverting the result gives the wrong result if the input is a NaN. Implement as a GT comparison with the operands swapped instead. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/translate.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index f4be8dc..adba4bf 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -5639,25 +5639,31 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_neon_rsb(size, tmp, tmp2); tcg_temp_free(tmp2); break; - case 24: case 27: /* Float VCGT #0, Float VCLE #0 */ + case 24: /* Float VCGT #0 */ tmp2 = tcg_const_i32(0); gen_helper_neon_cgt_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); - if (op == 27) - tcg_gen_not_i32(tmp, tmp); break; - case 25: case 28: /* Float VCGE #0, Float VCLT #0 */ + case 25: /* Float VCGE #0 */ tmp2 = tcg_const_i32(0); gen_helper_neon_cge_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); - if (op == 28) - tcg_gen_not_i32(tmp, tmp); break; case 26: /* Float VCEQ #0 */ tmp2 = tcg_const_i32(0); gen_helper_neon_ceq_f32(tmp, cpu_env, tmp, tmp2); tcg_temp_free(tmp2); break; + case 27: /* Float VCLE #0 */ + tmp2 = tcg_const_i32(0); + gen_helper_neon_cge_f32(tmp, cpu_env, tmp2, tmp); + tcg_temp_free(tmp2); + break; + case 28: /* Float VCLT #0 */ + tmp2 = tcg_const_i32(0); + gen_helper_neon_cgt_f32(tmp, cpu_env, tmp2, tmp); + tcg_temp_free(tmp2); + break; case 30: /* Float VABS */ gen_vfp_abs(0); break; -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 5/7] target-arm: Correct ABD's handling of negative zeroes 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell ` (3 preceding siblings ...) 2011-03-11 18:12 ` [Qemu-devel] [PATCH 4/7] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 7/7] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell 6 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Implement ABD by taking the absolute value of the difference of the operands (as the ARM ARM specifies) rather than by flipping the order of the operands to the subtract based on the results of a comparison. The latter approch gives the wrong answers for some edge cases like negative zero. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/neon_helper.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index 8eb4cef..1905545 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -1788,9 +1788,7 @@ uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b) { float32 f0 = make_float32(a); float32 f1 = make_float32(b); - return float32_val((float32_compare_quiet(f0, f1, NFS) == 1) - ? float32_sub(f0, f1, NFS) - : float32_sub(f1, f0, NFS)); + return float32_val(float32_abs(float32_sub(f0, f1, NFS))); } uint32_t HELPER(neon_add_f32)(CPUState *env, uint32_t a, uint32_t b) -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell ` (4 preceding siblings ...) 2011-03-11 18:12 ` [Qemu-devel] [PATCH 5/7] target-arm: Correct ABD's handling of negative zeroes Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 2011-03-11 18:28 ` Nathan Froyd 2011-03-11 18:12 ` [Qemu-devel] [PATCH 7/7] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell 6 siblings, 1 reply; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Add min and max operations to softfloat. This allows us to implement propagation of NaNs and handling of negative zero correctly (unlike the approach of having target helper routines return one of the operands based on the result of a comparison op). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- fpu/softfloat.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ fpu/softfloat.h | 4 ++++ 2 files changed, 53 insertions(+), 0 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 30b07e9..55f02ae 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -6052,6 +6052,55 @@ int float128_compare_quiet( float128 a, float128 b STATUS_PARAM ) return float128_compare_internal(a, b, 1 STATUS_VAR); } +/* min() and max() functions. These can't be implemented as + * 'compare and pick one input' because that would mishandle + * NaNs and +0 vs -0. + */ +#define MINMAX(s, nan_exp) \ +INLINE float ## s float ## s ## _minmax(float ## s a, float ## s b, \ + int ismin STATUS_PARAM ) \ +{ \ + flag aSign, bSign; \ + bits ## s av, bv; \ + a = float ## s ## _squash_input_denormal(a STATUS_VAR); \ + b = float ## s ## _squash_input_denormal(b STATUS_VAR); \ + if (float ## s ## _is_any_nan(a) || \ + float ## s ## _is_any_nan(b)) { \ + return propagateFloat ## s ## NaN(a, b STATUS_VAR); \ + } \ + aSign = extractFloat ## s ## Sign(a); \ + bSign = extractFloat ## s ## Sign(b); \ + av = float ## s ## _val(a); \ + bv = float ## s ## _val(b); \ + if (aSign != bSign) { \ + if (ismin) { \ + return aSign ? a : b; \ + } else { \ + return aSign ? b : a; \ + } \ + } else { \ + if (ismin) { \ + return (aSign ^ (av < bv)) ? a : b; \ + } else { \ + return (aSign ^ (av < bv)) ? b : a; \ + } \ + } \ +} \ + \ +float ## s float ## s ## _min(float ## s a, float ## s b STATUS_PARAM) \ +{ \ + return float ## s ## _minmax(a, b, 1 STATUS_VAR); \ +} \ + \ +float ## s float ## s ## _max(float ## s a, float ## s b STATUS_PARAM) \ +{ \ + return float ## s ## _minmax(a, b, 0 STATUS_VAR); \ +} + +MINMAX(32, 0xff) +MINMAX(64, 0x7ff) + + /* Multiply A by 2 raised to the power N. */ float32 float32_scalbn( float32 a, int n STATUS_PARAM ) { diff --git a/fpu/softfloat.h b/fpu/softfloat.h index fd61dc4..1d37d65 100644 --- a/fpu/softfloat.h +++ b/fpu/softfloat.h @@ -333,6 +333,8 @@ int float32_le_quiet( float32, float32 STATUS_PARAM ); int float32_lt_quiet( float32, float32 STATUS_PARAM ); int float32_compare( float32, float32 STATUS_PARAM ); int float32_compare_quiet( float32, float32 STATUS_PARAM ); +float32 float32_min(float32, float32 STATUS_PARAM); +float32 float32_max(float32, float32 STATUS_PARAM); int float32_is_quiet_nan( float32 ); int float32_is_signaling_nan( float32 ); float32 float32_maybe_silence_nan( float32 ); @@ -445,6 +447,8 @@ int float64_le_quiet( float64, float64 STATUS_PARAM ); int float64_lt_quiet( float64, float64 STATUS_PARAM ); int float64_compare( float64, float64 STATUS_PARAM ); int float64_compare_quiet( float64, float64 STATUS_PARAM ); +float64 float64_min(float64, float64 STATUS_PARAM); +float64 float64_max(float64, float64 STATUS_PARAM); int float64_is_quiet_nan( float64 a ); int float64_is_signaling_nan( float64 ); float64 float64_maybe_silence_nan( float64 ); -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions 2011-03-11 18:12 ` [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions Peter Maydell @ 2011-03-11 18:28 ` Nathan Froyd 2011-03-11 23:31 ` Peter Maydell 0 siblings, 1 reply; 16+ messages in thread From: Nathan Froyd @ 2011-03-11 18:28 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel, patches On Fri, Mar 11, 2011 at 06:12:25PM +0000, Peter Maydell wrote: > Add min and max operations to softfloat. This allows us to implement > propagation of NaNs and handling of negative zero correctly (unlike > the approach of having target helper routines return one of the operands > based on the result of a comparison op). Are these useful anyplace beside ARM? i.e. do they implement the correct AltiVec/VSX/SSE* (any others?) semantics? -Nathan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions 2011-03-11 18:28 ` Nathan Froyd @ 2011-03-11 23:31 ` Peter Maydell 0 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 23:31 UTC (permalink / raw) To: Nathan Froyd; +Cc: qemu-devel, patches On 11 March 2011 18:28, Nathan Froyd <froydnj@codesourcery.com> wrote: > On Fri, Mar 11, 2011 at 06:12:25PM +0000, Peter Maydell wrote: >> Add min and max operations to softfloat. This allows us to implement >> propagation of NaNs and handling of negative zero correctly (unlike >> the approach of having target helper routines return one of the operands >> based on the result of a comparison op). > > Are these useful anyplace beside ARM? i.e. do they implement the > correct AltiVec/VSX/SSE* (any others?) semantics? I had a look at the Altivec manual and I believe they're the right semantics for Altivec/VSX vminfp/vmaxfp. So that's two use cases (currently PPC has some hand-rolled stuff in op_helper.c which specialcases NaNs but a quick glance suggests it will get the -0 case wrong.) Intel's SSE MAXSD/MINSD etc have very weird behaviour for the special cases: if both operands are 0.0s of any sign you always get the second operand, so max(-0,+0) != max(+0,-0), and if only one operand is a NaN then the second operand is returned whether it is the NaN or not (so max(NaN, 3) != max(3, NaN). This is clearly hopelessly target-specific :-) I think the ARM/PPC semantics are the "obvious" ones: * handle input denormals by squashing to zero like other ops (you can always flip the status flags in your target wrapper if a particular instruction behaves differently) * handle NaN propagation in the same way as all other ops * maximum of (-0,+0) is +0, minimum of (-0,+0) is -0 so I think they have a pretty good chance of being useful for other architectures that have a floating point min/max. (They're the Java Math.min() and Math.max() semantics too.) -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH 7/7] target-arm: Use new softfloat min/max functions for VMAX, VMIN 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell ` (5 preceding siblings ...) 2011-03-11 18:12 ` [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions Peter Maydell @ 2011-03-11 18:12 ` Peter Maydell 6 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2011-03-11 18:12 UTC (permalink / raw) To: qemu-devel; +Cc: patches Use the new softfloat min/max functions to implement the Neon VMAX and VMIN instructions. This allows us to get the right behaviour for NaN and negative zero. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/neon_helper.c | 8 ++------ 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index 1905545..19784ab 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -1772,16 +1772,12 @@ uint32_t HELPER(neon_qneg_s32)(CPUState *env, uint32_t x) /* NEON Float helpers. */ uint32_t HELPER(neon_min_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = make_float32(a); - float32 f1 = make_float32(b); - return (float32_compare_quiet(f0, f1, NFS) == -1) ? a : b; + return float32_val(float32_min(make_float32(a), make_float32(b), NFS)); } uint32_t HELPER(neon_max_f32)(CPUState *env, uint32_t a, uint32_t b) { - float32 f0 = make_float32(a); - float32 f1 = make_float32(b); - return (float32_compare_quiet(f0, f1, NFS) == 1) ? a : b; + return float32_val(float32_max(make_float32(a), make_float32(b), NFS)); } uint32_t HELPER(neon_abd_f32)(CPUState *env, uint32_t a, uint32_t b) -- 1.7.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-03-30 18:38 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-11 18:12 [Qemu-devel] [PATCH 0/7] ARM: minor Neon fixes (mostly NaN related) Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 1/7] target-arm: Make Neon helper routines use correct FP status Peter Maydell 2011-03-11 18:30 ` Nathan Froyd 2011-03-11 22:31 ` Peter Maydell 2011-03-14 5:35 ` Nathan Froyd 2011-03-28 14:15 ` Peter Maydell 2011-03-30 18:38 ` Nathan Froyd 2011-03-11 18:12 ` [Qemu-devel] [PATCH 2/7] target-arm/neon_helper.c: Use make_float32/float32_val macros Peter Maydell 2011-03-11 18:29 ` Nathan Froyd 2011-03-11 18:12 ` [Qemu-devel] [PATCH 3/7] target-arm: Return right result for Neon comparison with NaNs Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 4/7] target-arm: Fix VCLE.F32 #0, VCLT.F32 #0 NaN handling Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 5/7] target-arm: Correct ABD's handling of negative zeroes Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 6/7] softfloat: Add float*_min() and float*_max() functions Peter Maydell 2011-03-11 18:28 ` Nathan Froyd 2011-03-11 23:31 ` Peter Maydell 2011-03-11 18:12 ` [Qemu-devel] [PATCH 7/7] target-arm: Use new softfloat min/max functions for VMAX, VMIN Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).