* [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting @ 2011-05-06 12:48 Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 1/7] target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns Peter Maydell ` (7 more replies) 0 siblings, 8 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches This patch series fixes a number of minor bugs in the ARM target where we were not correctly setting the cumulative exception flags in the FPSCR. It includes adding a new flag to softfloat indicating when a denormal result has been flushed to zero (as discussed previously on the list.) Tested with the usual random instruction sequence testing (covering all the neon and vfp data processing instructions which can set FPSCR exception flags). These patches fix all the FPSCR flags bugs I found, with the exception of those in the VCVT float-int and float32-float16 conversion routines, which are a bit trickier to fix because they are bugs in softfloat rather than merely in the arm helper functions. Peter Maydell (7): target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN target-arm: Refactor int-float conversions target-arm: Add separate Neon float-int conversion helpers softfloat: Add new flag for when denormal result is flushed to zero target-arm: Signal Underflow when denormal flushed to zero on output fpu/softfloat.c | 41 ++++++++++-- fpu/softfloat.h | 3 +- target-arm/helper.c | 151 +++++++--------------------------------------- target-arm/helper.h | 70 ++++++++++++--------- target-arm/neon_helper.c | 40 ++++++------- target-arm/op_helper.c | 74 ++++++++++++++++++++++ target-arm/translate.c | 92 ++++++++++++++++------------ 7 files changed, 243 insertions(+), 228 deletions(-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/7] target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 2/7] target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS Peter Maydell ` (6 subsequent siblings) 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches The functions which do the core estimation algorithms for the VRSQRTE and VRECPE instructions should not set floating point exception flags, so use a local fp status for doing these calculations. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helper.c | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 62ae72e..5ff6a9b 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -2749,7 +2749,11 @@ float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUState *env) */ static float64 recip_estimate(float64 a, CPUState *env) { - float_status *s = &env->vfp.standard_fp_status; + /* These calculations mustn't set any fp exception flags, + * so we use a local copy of the fp_status. + */ + float_status dummy_status = env->vfp.standard_fp_status; + float_status *s = &dummy_status; /* q = (int)(a * 512.0) */ float64 q = float64_mul(float64_512, a, s); int64_t q_int = float64_to_int64_round_to_zero(q, s); @@ -2812,7 +2816,11 @@ float32 HELPER(recpe_f32)(float32 a, CPUState *env) */ static float64 recip_sqrt_estimate(float64 a, CPUState *env) { - float_status *s = &env->vfp.standard_fp_status; + /* These calculations mustn't set any fp exception flags, + * so we use a local copy of the fp_status. + */ + float_status dummy_status = env->vfp.standard_fp_status; + float_status *s = &dummy_status; float64 q; int64_t q_int; -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 2/7] target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 1/7] target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 3/7] target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN Peter Maydell ` (5 subsequent siblings) 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches The helpers for VRECPE.F32, VSQRTE.F32, VRECPS and VRSQRTS handle denormals as special cases, so we must set the InputDenormal exception flag ourselves. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helper.c | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 5ff6a9b..f072527 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -2720,6 +2720,9 @@ float32 HELPER(recps_f32)(float32 a, float32 b, CPUState *env) float_status *s = &env->vfp.standard_fp_status; if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) || (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) { + if (!(float32_is_zero(a) || float32_is_zero(b))) { + float_raise(float_flag_input_denormal, s); + } return float32_two; } return float32_sub(float32_two, float32_mul(a, b, s), s); @@ -2731,6 +2734,9 @@ float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUState *env) float32 product; if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) || (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) { + if (!(float32_is_zero(a) || float32_is_zero(b))) { + float_raise(float_flag_input_denormal, s); + } return float32_one_point_five; } product = float32_mul(a, b, s); @@ -2791,6 +2797,9 @@ float32 HELPER(recpe_f32)(float32 a, CPUState *env) } else if (float32_is_infinity(a)) { return float32_set_sign(float32_zero, float32_is_neg(a)); } else if (float32_is_zero_or_denormal(a)) { + if (!float32_is_zero(a)) { + float_raise(float_flag_input_denormal, s); + } float_raise(float_flag_divbyzero, s); return float32_set_sign(float32_infinity, float32_is_neg(a)); } else if (a_exp >= 253) { @@ -2882,6 +2891,9 @@ float32 HELPER(rsqrte_f32)(float32 a, CPUState *env) } return float32_default_nan; } else if (float32_is_zero_or_denormal(a)) { + if (!float32_is_zero(a)) { + float_raise(float_flag_input_denormal, s); + } float_raise(float_flag_divbyzero, s); return float32_set_sign(float32_infinity, float32_is_neg(a)); } else if (float32_is_neg(a)) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 3/7] target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 1/7] target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 2/7] target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions Peter Maydell ` (4 subsequent siblings) 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches If the input to a Neon float comparison is a quiet NaN, the ARM ARM specifies that we should raise InvalidOp if the comparison is GE or GT but not for EQ. (Signaling NaNs raise InvalidOp regardless). This means only EQ should use the _quiet version of the comparison function. We implement this by cleaning up the comparison helpers to call the appopriate versions of the softfloat simple comparison functions (float32_le and friends) rather than the generic float32_compare functions. This makes them simple enough that they are clearer opencoded rather than macroised. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/neon_helper.c | 40 ++++++++++++++++++---------------------- 1 files changed, 18 insertions(+), 22 deletions(-) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index f5b173a..9165519 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -1802,41 +1802,37 @@ uint32_t HELPER(neon_mul_f32)(uint32_t a, uint32_t b) return float32_val(float32_mul(make_float32(a), make_float32(b), NFS)); } -/* Floating point comparisons produce an integer result. */ -#define NEON_VOP_FCMP(name, ok) \ -uint32_t HELPER(neon_##name)(uint32_t a, uint32_t b) \ -{ \ - switch (float32_compare_quiet(make_float32(a), make_float32(b), NFS)) { \ - ok return ~0; \ - default: return 0; \ - } \ +/* Floating point comparisons produce an integer result. + * Note that EQ doesn't signal InvalidOp for QNaNs but GE and GT do. + * Softfloat routines return 0/1, which we convert to the 0/-1 Neon requires. + */ +uint32_t HELPER(neon_ceq_f32)(uint32_t a, uint32_t b) +{ + return -float32_eq_quiet(make_float32(a), make_float32(b), NFS); +} + +uint32_t HELPER(neon_cge_f32)(uint32_t a, uint32_t b) +{ + return -float32_le(make_float32(b), make_float32(a), NFS); } -NEON_VOP_FCMP(ceq_f32, case float_relation_equal:) -NEON_VOP_FCMP(cge_f32, case float_relation_equal: case float_relation_greater:) -NEON_VOP_FCMP(cgt_f32, case float_relation_greater:) +uint32_t HELPER(neon_cgt_f32)(uint32_t a, uint32_t b) +{ + return -float32_lt(make_float32(b), make_float32(a), NFS); +} uint32_t HELPER(neon_acge_f32)(uint32_t a, uint32_t b) { float32 f0 = float32_abs(make_float32(a)); float32 f1 = float32_abs(make_float32(b)); - switch (float32_compare_quiet(f0, f1, NFS)) { - case float_relation_equal: - case float_relation_greater: - return ~0; - default: - return 0; - } + return -float32_le(f1, f0, NFS); } uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b) { float32 f0 = float32_abs(make_float32(a)); float32 f1 = float32_abs(make_float32(b)); - if (float32_compare_quiet(f0, f1, NFS) == float_relation_greater) { - return ~0; - } - return 0; + return -float32_lt(f1, f0, NFS); } #define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1)) -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell ` (2 preceding siblings ...) 2011-05-06 12:48 ` [Qemu-devel] [PATCH 3/7] target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 14:09 ` Paul Brook 2011-05-06 12:48 ` [Qemu-devel] [PATCH 5/7] target-arm: Add separate Neon float-int conversion helpers Peter Maydell ` (3 subsequent siblings) 7 siblings, 1 reply; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches The Neon versions of int-float conversions need their own helper routines because they must use the "standard FPSCR" rather than the default one. Refactor the helper functions to make it easy to add the neon versions. While we're touching the code, move the helpers to op_helper.c so that we can use the global env variable rather than passing it as a parameter. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helper.c | 125 ------------------------------------------------ target-arm/helper.h | 60 +++++++++++----------- target-arm/op_helper.c | 62 ++++++++++++++++++++++++ target-arm/translate.c | 63 +++++++++++++----------- 4 files changed, 127 insertions(+), 183 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index f072527..de00468 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -2526,100 +2526,6 @@ DO_VFP_cmp(s, float32) DO_VFP_cmp(d, float64) #undef DO_VFP_cmp -/* Integer to float conversion. */ -float32 VFP_HELPER(uito, s)(uint32_t x, CPUState *env) -{ - return uint32_to_float32(x, &env->vfp.fp_status); -} - -float64 VFP_HELPER(uito, d)(uint32_t x, CPUState *env) -{ - return uint32_to_float64(x, &env->vfp.fp_status); -} - -float32 VFP_HELPER(sito, s)(uint32_t x, CPUState *env) -{ - return int32_to_float32(x, &env->vfp.fp_status); -} - -float64 VFP_HELPER(sito, d)(uint32_t x, CPUState *env) -{ - return int32_to_float64(x, &env->vfp.fp_status); -} - -/* Float to integer conversion. */ -uint32_t VFP_HELPER(toui, s)(float32 x, CPUState *env) -{ - if (float32_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float32_to_uint32(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(toui, d)(float64 x, CPUState *env) -{ - if (float64_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float64_to_uint32(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(tosi, s)(float32 x, CPUState *env) -{ - if (float32_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float32_to_int32(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(tosi, d)(float64 x, CPUState *env) -{ - if (float64_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float64_to_int32(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(touiz, s)(float32 x, CPUState *env) -{ - if (float32_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float32_to_uint32_round_to_zero(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(touiz, d)(float64 x, CPUState *env) -{ - if (float64_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float64_to_uint32_round_to_zero(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(tosiz, s)(float32 x, CPUState *env) -{ - if (float32_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float32_to_int32_round_to_zero(x, &env->vfp.fp_status); -} - -uint32_t VFP_HELPER(tosiz, d)(float64 x, CPUState *env) -{ - if (float64_is_any_nan(x)) { - float_raise(float_flag_invalid, &env->vfp.fp_status); - return 0; - } - return float64_to_int32_round_to_zero(x, &env->vfp.fp_status); -} - /* floating point conversion */ float64 VFP_HELPER(fcvtd, s)(float32 x, CPUState *env) { @@ -2639,37 +2545,6 @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUState *env) return float32_maybe_silence_nan(r); } -/* VFP3 fixed point conversion. */ -#define VFP_CONV_FIX(name, p, fsz, itype, sign) \ -float##fsz VFP_HELPER(name##to, p)(uint##fsz##_t x, uint32_t shift, \ - CPUState *env) \ -{ \ - float##fsz tmp; \ - tmp = sign##int32_to_##float##fsz ((itype##_t)x, &env->vfp.fp_status); \ - return float##fsz##_scalbn(tmp, -(int)shift, &env->vfp.fp_status); \ -} \ -uint##fsz##_t VFP_HELPER(to##name, p)(float##fsz x, uint32_t shift, \ - CPUState *env) \ -{ \ - float##fsz tmp; \ - if (float##fsz##_is_any_nan(x)) { \ - float_raise(float_flag_invalid, &env->vfp.fp_status); \ - return 0; \ - } \ - tmp = float##fsz##_scalbn(x, shift, &env->vfp.fp_status); \ - return float##fsz##_to_##itype##_round_to_zero(tmp, &env->vfp.fp_status); \ -} - -VFP_CONV_FIX(sh, d, 64, int16, ) -VFP_CONV_FIX(sl, d, 64, int32, ) -VFP_CONV_FIX(uh, d, 64, uint16, u) -VFP_CONV_FIX(ul, d, 64, uint32, u) -VFP_CONV_FIX(sh, s, 32, int16, ) -VFP_CONV_FIX(sl, s, 32, int32, ) -VFP_CONV_FIX(uh, s, 32, uint16, u) -VFP_CONV_FIX(ul, s, 32, uint32, u) -#undef VFP_CONV_FIX - /* Half precision conversions. */ static float32 do_fcvt_f16_to_f32(uint32_t a, CPUState *env, float_status *s) { diff --git a/target-arm/helper.h b/target-arm/helper.h index ae701e8..2c54d5e 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -96,36 +96,36 @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env) DEF_HELPER_2(vfp_fcvtds, f64, f32, env) DEF_HELPER_2(vfp_fcvtsd, f32, f64, env) -DEF_HELPER_2(vfp_uitos, f32, i32, env) -DEF_HELPER_2(vfp_uitod, f64, i32, env) -DEF_HELPER_2(vfp_sitos, f32, i32, env) -DEF_HELPER_2(vfp_sitod, f64, i32, env) - -DEF_HELPER_2(vfp_touis, i32, f32, env) -DEF_HELPER_2(vfp_touid, i32, f64, env) -DEF_HELPER_2(vfp_touizs, i32, f32, env) -DEF_HELPER_2(vfp_touizd, i32, f64, env) -DEF_HELPER_2(vfp_tosis, i32, f32, env) -DEF_HELPER_2(vfp_tosid, i32, f64, env) -DEF_HELPER_2(vfp_tosizs, i32, f32, env) -DEF_HELPER_2(vfp_tosizd, i32, f64, env) - -DEF_HELPER_3(vfp_toshs, i32, f32, i32, env) -DEF_HELPER_3(vfp_tosls, i32, f32, i32, env) -DEF_HELPER_3(vfp_touhs, i32, f32, i32, env) -DEF_HELPER_3(vfp_touls, i32, f32, i32, env) -DEF_HELPER_3(vfp_toshd, i64, f64, i32, env) -DEF_HELPER_3(vfp_tosld, i64, f64, i32, env) -DEF_HELPER_3(vfp_touhd, i64, f64, i32, env) -DEF_HELPER_3(vfp_tould, i64, f64, i32, env) -DEF_HELPER_3(vfp_shtos, f32, i32, i32, env) -DEF_HELPER_3(vfp_sltos, f32, i32, i32, env) -DEF_HELPER_3(vfp_uhtos, f32, i32, i32, env) -DEF_HELPER_3(vfp_ultos, f32, i32, i32, env) -DEF_HELPER_3(vfp_shtod, f64, i64, i32, env) -DEF_HELPER_3(vfp_sltod, f64, i64, i32, env) -DEF_HELPER_3(vfp_uhtod, f64, i64, i32, env) -DEF_HELPER_3(vfp_ultod, f64, i64, i32, env) +DEF_HELPER_1(vfp_uitos, f32, i32) +DEF_HELPER_1(vfp_uitod, f64, i32) +DEF_HELPER_1(vfp_sitos, f32, i32) +DEF_HELPER_1(vfp_sitod, f64, i32) + +DEF_HELPER_1(vfp_touis, i32, f32) +DEF_HELPER_1(vfp_touid, i32, f64) +DEF_HELPER_1(vfp_touizs, i32, f32) +DEF_HELPER_1(vfp_touizd, i32, f64) +DEF_HELPER_1(vfp_tosis, i32, f32) +DEF_HELPER_1(vfp_tosid, i32, f64) +DEF_HELPER_1(vfp_tosizs, i32, f32) +DEF_HELPER_1(vfp_tosizd, i32, f64) + +DEF_HELPER_2(vfp_toshs, i32, f32, i32) +DEF_HELPER_2(vfp_tosls, i32, f32, i32) +DEF_HELPER_2(vfp_touhs, i32, f32, i32) +DEF_HELPER_2(vfp_touls, i32, f32, i32) +DEF_HELPER_2(vfp_toshd, i64, f64, i32) +DEF_HELPER_2(vfp_tosld, i64, f64, i32) +DEF_HELPER_2(vfp_touhd, i64, f64, i32) +DEF_HELPER_2(vfp_tould, i64, f64, i32) +DEF_HELPER_2(vfp_shtos, f32, i32, i32) +DEF_HELPER_2(vfp_sltos, f32, i32, i32) +DEF_HELPER_2(vfp_uhtos, f32, i32, i32) +DEF_HELPER_2(vfp_ultos, f32, i32, i32) +DEF_HELPER_2(vfp_shtod, f64, i64, i32) +DEF_HELPER_2(vfp_sltod, f64, i64, i32) +DEF_HELPER_2(vfp_uhtod, f64, i64, i32) +DEF_HELPER_2(vfp_ultod, f64, i64, i32) DEF_HELPER_2(vfp_fcvt_f16_to_f32, f32, i32, env) DEF_HELPER_2(vfp_fcvt_f32_to_f16, i32, f32, env) diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c index 8334fbc..1afea43 100644 --- a/target-arm/op_helper.c +++ b/target-arm/op_helper.c @@ -424,3 +424,65 @@ uint32_t HELPER(ror_cc)(uint32_t x, uint32_t i) return ((uint32_t)x >> shift) | (x << (32 - shift)); } } + +/* Integer to float and float to integer conversions */ + +#define CONV_ITOF(name, fsz, sign, fpst) \ +float##fsz HELPER(name)(uint32_t x) \ +{ \ + return sign##int32_to_##float##fsz(x, fpst); \ +} + +#define CONV_FTOI(name, fsz, sign, fpst, round) \ +uint32_t HELPER(name)(float##fsz x) \ +{ \ + if (float##fsz##_is_any_nan(x)) { \ + float_raise(float_flag_invalid, fpst); \ + return 0; \ + } \ + return float##fsz##_to_##sign##int32##round(x, fpst); \ +} + +#define VFP_CONVS(name, p, fsz, sign) \ +CONV_ITOF(vfp_##name##to##p, fsz, sign, &env->vfp.fp_status) \ +CONV_FTOI(vfp_##to##name##p, fsz, sign, &env->vfp.fp_status, ) \ +CONV_FTOI(vfp_##to##name##z##p, fsz, sign, &env->vfp.fp_status, _round_to_zero) + +VFP_CONVS(si, s, 32, ) +VFP_CONVS(si, d, 64, ) +VFP_CONVS(ui, s, 32, u) +VFP_CONVS(ui, d, 64, u) + +#undef CONV_ITOF +#undef CONV_FTOI +#undef VFP_CONVS + +/* VFP3 fixed point conversion. */ +#define VFP_CONV_FIX(pfx, name, p, fsz, itype, sign, status) \ +float##fsz HELPER(pfx##name##to##p)(uint##fsz##_t x, uint32_t shift) \ +{ \ + float##fsz tmp; \ + tmp = sign##int32_to_##float##fsz((itype##_t)x, status); \ + return float##fsz##_scalbn(tmp, -(int)shift, status); \ +} \ +uint##fsz##_t HELPER(pfx##to##name##p)(float##fsz x, uint32_t shift) \ +{ \ + float##fsz tmp; \ + if (float##fsz##_is_any_nan(x)) { \ + float_raise(float_flag_invalid, status); \ + return 0; \ + } \ + tmp = float##fsz##_scalbn(x, shift, status); \ + return float##fsz##_to_##itype##_round_to_zero(tmp, status); \ +} + +VFP_CONV_FIX(vfp_, sh, d, 64, int16, , &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, sl, d, 64, int32, , &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, uh, d, 64, uint16, u, &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, ul, d, 64, uint32, u, &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, sh, s, 32, int16, , &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, sl, s, 32, int32, , &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, uh, s, 32, uint16, u, &env->vfp.fp_status) +VFP_CONV_FIX(vfp_, ul, s, 32, uint32, u, &env->vfp.fp_status) + +#undef VFP_CONV_FIX diff --git a/target-arm/translate.c b/target-arm/translate.c index a1af436..195cf30 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -959,60 +959,67 @@ static inline void gen_vfp_F1_ld0(int dp) static inline void gen_vfp_uito(int dp) { - if (dp) - gen_helper_vfp_uitod(cpu_F0d, cpu_F0s, cpu_env); - else - gen_helper_vfp_uitos(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_uitod(cpu_F0d, cpu_F0s); + } else { + gen_helper_vfp_uitos(cpu_F0s, cpu_F0s); + } } static inline void gen_vfp_sito(int dp) { - if (dp) - gen_helper_vfp_sitod(cpu_F0d, cpu_F0s, cpu_env); - else - gen_helper_vfp_sitos(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_sitod(cpu_F0d, cpu_F0s); + } else { + gen_helper_vfp_sitos(cpu_F0s, cpu_F0s); + } } static inline void gen_vfp_toui(int dp) { - if (dp) - gen_helper_vfp_touid(cpu_F0s, cpu_F0d, cpu_env); - else - gen_helper_vfp_touis(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_touid(cpu_F0s, cpu_F0d); + } else { + gen_helper_vfp_touis(cpu_F0s, cpu_F0s); + } } static inline void gen_vfp_touiz(int dp) { - if (dp) - gen_helper_vfp_touizd(cpu_F0s, cpu_F0d, cpu_env); - else - gen_helper_vfp_touizs(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_touizd(cpu_F0s, cpu_F0d); + } else { + gen_helper_vfp_touizs(cpu_F0s, cpu_F0s); + } } static inline void gen_vfp_tosi(int dp) { - if (dp) - gen_helper_vfp_tosid(cpu_F0s, cpu_F0d, cpu_env); - else - gen_helper_vfp_tosis(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_tosid(cpu_F0s, cpu_F0d); + } else { + gen_helper_vfp_tosis(cpu_F0s, cpu_F0s); + } } static inline void gen_vfp_tosiz(int dp) { - if (dp) - gen_helper_vfp_tosizd(cpu_F0s, cpu_F0d, cpu_env); - else - gen_helper_vfp_tosizs(cpu_F0s, cpu_F0s, cpu_env); + if (dp) { + gen_helper_vfp_tosizd(cpu_F0s, cpu_F0d); + } else { + gen_helper_vfp_tosizs(cpu_F0s, cpu_F0s); + } } #define VFP_GEN_FIX(name) \ static inline void gen_vfp_##name(int dp, int shift) \ { \ TCGv tmp_shift = tcg_const_i32(shift); \ - if (dp) \ - gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift, cpu_env);\ - else \ - gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift, cpu_env);\ + if (dp) { \ + gen_helper_vfp_##name##d(cpu_F0d, cpu_F0d, tmp_shift); \ + } else { \ + gen_helper_vfp_##name##s(cpu_F0s, cpu_F0s, tmp_shift); \ + } \ tcg_temp_free_i32(tmp_shift); \ } VFP_GEN_FIX(tosh) -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 12:48 ` [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions Peter Maydell @ 2011-05-06 14:09 ` Paul Brook 2011-05-06 14:42 ` Peter Maydell 2011-05-06 15:30 ` Blue Swirl 0 siblings, 2 replies; 15+ messages in thread From: Paul Brook @ 2011-05-06 14:09 UTC (permalink / raw) To: qemu-devel; +Cc: Peter Maydell, patches > The Neon versions of int-float conversions need their own helper routines > because they must use the "standard FPSCR" rather than the default one. > Refactor the helper functions to make it easy to add the neon versions. > While we're touching the code, move the helpers to op_helper.c so that > we can use the global env variable rather than passing it as a parameter. IMO this is going in the wrong direction. We should in aiming for less implicit accesses to cpu state, not more. Maybe better would be to explicitly pass a pointer the fp status. That way you don't even need separate VFP and NEON variants of these routines. Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 14:09 ` Paul Brook @ 2011-05-06 14:42 ` Peter Maydell 2011-05-06 15:30 ` Blue Swirl 1 sibling, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 14:42 UTC (permalink / raw) To: Paul Brook; +Cc: Blue Swirl, qemu-devel, Aurelien Jarno, patches On 6 May 2011 15:09, Paul Brook <paul@codesourcery.com> wrote: >> The Neon versions of int-float conversions need their own helper routines >> because they must use the "standard FPSCR" rather than the default one. >> Refactor the helper functions to make it easy to add the neon versions. >> While we're touching the code, move the helpers to op_helper.c so that >> we can use the global env variable rather than passing it as a parameter. > > IMO this is going in the wrong direction. We should in aiming for less > implicit accesses to cpu state, not more. I don't have a very strong feeling about this personally, I've just been going in the direction suggested by past discussions eg http://lists.gnu.org/archive/html/qemu-devel/2011-04/msg00183.html > Maybe better would be to explicitly pass a pointer the fp status. That way you > don't even need separate VFP and NEON variants of these routines. If you were otherwise going to pass in a CPUState pointer then just passing the pointer to the fp_status is probably better, yes. -- PMM ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 14:09 ` Paul Brook 2011-05-06 14:42 ` Peter Maydell @ 2011-05-06 15:30 ` Blue Swirl 2011-05-06 16:38 ` Paul Brook 1 sibling, 1 reply; 15+ messages in thread From: Blue Swirl @ 2011-05-06 15:30 UTC (permalink / raw) To: Paul Brook; +Cc: Peter Maydell, qemu-devel, patches On Fri, May 6, 2011 at 5:09 PM, Paul Brook <paul@codesourcery.com> wrote: >> The Neon versions of int-float conversions need their own helper routines >> because they must use the "standard FPSCR" rather than the default one. >> Refactor the helper functions to make it easy to add the neon versions. >> While we're touching the code, move the helpers to op_helper.c so that >> we can use the global env variable rather than passing it as a parameter. > > IMO this is going in the wrong direction. We should in aiming for less > implicit accesses to cpu state, not more. Performance wise global env variable is faster and the register is always available. Do you mean that we should aim to get rid of special status of global env, so that if no op uses it, it could be discarded to free a register? > Maybe better would be to explicitly pass a pointer the fp status. That way you > don't even need separate VFP and NEON variants of these routines. It would be nice to have generic float functions callable directly as TCG helper. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 15:30 ` Blue Swirl @ 2011-05-06 16:38 ` Paul Brook 2011-05-08 10:32 ` Blue Swirl 0 siblings, 1 reply; 15+ messages in thread From: Paul Brook @ 2011-05-06 16:38 UTC (permalink / raw) To: Blue Swirl; +Cc: Peter Maydell, qemu-devel, patches > On Fri, May 6, 2011 at 5:09 PM, Paul Brook <paul@codesourcery.com> wrote: > >> The Neon versions of int-float conversions need their own helper > >> routines because they must use the "standard FPSCR" rather than the > >> default one. Refactor the helper functions to make it easy to add the > >> neon versions. While we're touching the code, move the helpers to > >> op_helper.c so that we can use the global env variable rather than > >> passing it as a parameter. > > > > IMO this is going in the wrong direction. We should in aiming for less > > implicit accesses to cpu state, not more. > > Performance wise global env variable is faster and the register is > always available. Not entirely true. Reserving the global env variable has significant cost, especially on hosts with limited register sets (i.e. x86). It's also a rather fragile hack. There's a fairly long history of nasy hacks and things that just don't work in this context. For example you can't reliably include stdio.h from these files, and they often break if you turn optimization off, which makes debugging much harder than it should be. > Do you mean that we should aim to get rid of special > status of global env, so that if no op uses it, it could be discarded > to free a register? That's already true for most of qemu. IMO more interesting is being able to tell TCG that helpers don't mess with cpu state in opaque ways. In theory it's already possible to do that manually. However that tends to be somewhat fragile, relying on careful maintenance and code code auditing, with mistakes triggering subtle hard-to-debug failures. Much better to avoid the implicit global interface and make all helper arguments explicit. > > Maybe better would be to explicitly pass a pointer the fp status. That > > way you don't even need separate VFP and NEON variants of these > > routines. > > It would be nice to have generic float functions callable directly as > TCG helper. Possibly. I'd have to look quite a bit closer to determine whether exposing the generic float functions is useful in practice, or if you still end up needing wrappers in most cases for most targets. Adding "native" floating point support to the TCG interface is also a possibility. In practice this might end up as wrappers around helper functions, but might provide a nicer programming interface. Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-06 16:38 ` Paul Brook @ 2011-05-08 10:32 ` Blue Swirl 2011-05-14 22:38 ` Aurelien Jarno 0 siblings, 1 reply; 15+ messages in thread From: Blue Swirl @ 2011-05-08 10:32 UTC (permalink / raw) To: Paul Brook; +Cc: Peter Maydell, qemu-devel, patches On Fri, May 6, 2011 at 7:38 PM, Paul Brook <paul@codesourcery.com> wrote: >> On Fri, May 6, 2011 at 5:09 PM, Paul Brook <paul@codesourcery.com> wrote: >> >> The Neon versions of int-float conversions need their own helper >> >> routines because they must use the "standard FPSCR" rather than the >> >> default one. Refactor the helper functions to make it easy to add the >> >> neon versions. While we're touching the code, move the helpers to >> >> op_helper.c so that we can use the global env variable rather than >> >> passing it as a parameter. >> > >> > IMO this is going in the wrong direction. We should in aiming for less >> > implicit accesses to cpu state, not more. >> >> Performance wise global env variable is faster and the register is >> always available. > > Not entirely true. Reserving the global env variable has significant cost, > especially on hosts with limited register sets (i.e. x86). It's also a rather > fragile hack. There's a fairly long history of nasy hacks and things that > just don't work in this context. For example you can't reliably include > stdio.h from these files, and they often break if you turn optimization off, > which makes debugging much harder than it should be. Even if we don't reserve the register, in many cases a corresponding pointer to CPUState will be needed. But there will still be the advantage that this temporary pointer can be discarded while the globally reserved register is reserved forever. >> Do you mean that we should aim to get rid of special >> status of global env, so that if no op uses it, it could be discarded >> to free a register? > > That's already true for most of qemu. IMO more interesting is being able to > tell TCG that helpers don't mess with cpu state in opaque ways. In theory > it's already possible to do that manually. However that tends to be somewhat > fragile, relying on careful maintenance and code code auditing, with mistakes > triggering subtle hard-to-debug failures. Much better to avoid the implicit > global interface and make all helper arguments explicit. OK. This will be a major refactoring, but given the expected performance increase, it should be done. >> > Maybe better would be to explicitly pass a pointer the fp status. That >> > way you don't even need separate VFP and NEON variants of these >> > routines. >> >> It would be nice to have generic float functions callable directly as >> TCG helper. > > Possibly. I'd have to look quite a bit closer to determine whether exposing > the generic float functions is useful in practice, or if you still end up > needing wrappers in most cases for most targets. Adding "native" floating > point support to the TCG interface is also a possibility. In practice this > might end up as wrappers around helper functions, but might provide a nicer > programming interface. > > Paul > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions 2011-05-08 10:32 ` Blue Swirl @ 2011-05-14 22:38 ` Aurelien Jarno 0 siblings, 0 replies; 15+ messages in thread From: Aurelien Jarno @ 2011-05-14 22:38 UTC (permalink / raw) To: Blue Swirl; +Cc: Peter Maydell, patches, Paul Brook, qemu-devel On Sun, May 08, 2011 at 01:32:34PM +0300, Blue Swirl wrote: > On Fri, May 6, 2011 at 7:38 PM, Paul Brook <paul@codesourcery.com> wrote: > >> On Fri, May 6, 2011 at 5:09 PM, Paul Brook <paul@codesourcery.com> wrote: > >> >> The Neon versions of int-float conversions need their own helper > >> >> routines because they must use the "standard FPSCR" rather than the > >> >> default one. Refactor the helper functions to make it easy to add the > >> >> neon versions. While we're touching the code, move the helpers to > >> >> op_helper.c so that we can use the global env variable rather than > >> >> passing it as a parameter. > >> > > >> > IMO this is going in the wrong direction. We should in aiming for less > >> > implicit accesses to cpu state, not more. > >> > >> Performance wise global env variable is faster and the register is > >> always available. > > > > Not entirely true. Reserving the global env variable has significant cost, > > especially on hosts with limited register sets (i.e. x86). It's also a rather > > fragile hack. There's a fairly long history of nasy hacks and things that > > just don't work in this context. For example you can't reliably include > > stdio.h from these files, and they often break if you turn optimization off, > > which makes debugging much harder than it should be. > > Even if we don't reserve the register, in many cases a corresponding > pointer to CPUState will be needed. But there will still be the > advantage that this temporary pointer can be discarded while the > globally reserved register is reserved forever. > > >> Do you mean that we should aim to get rid of special > >> status of global env, so that if no op uses it, it could be discarded > >> to free a register? > > > > That's already true for most of qemu. IMO more interesting is being able to > > tell TCG that helpers don't mess with cpu state in opaque ways. In theory > > it's already possible to do that manually. However that tends to be somewhat > > fragile, relying on careful maintenance and code code auditing, with mistakes > > triggering subtle hard-to-debug failures. Much better to avoid the implicit > > global interface and make all helper arguments explicit. > > OK. This will be a major refactoring, but given the expected > performance increase, it should be done. > We might want to do it from the cleanliness point of view, but i really doubt we should expect performance increase from this (actually i think it will be the contrary). -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 5/7] target-arm: Add separate Neon float-int conversion helpers 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell ` (3 preceding siblings ...) 2011-05-06 12:48 ` [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 6/7] softfloat: Add new flag for when denormal result is flushed to zero Peter Maydell ` (2 subsequent siblings) 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches Add the Neon-specific float-int conversion helper functions which use the standard FPSCR value rather than the VFP FPSCR. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helper.h | 10 ++++++++++ target-arm/op_helper.c | 12 ++++++++++++ target-arm/translate.c | 29 +++++++++++++++++------------ 3 files changed, 39 insertions(+), 12 deletions(-) diff --git a/target-arm/helper.h b/target-arm/helper.h index 2c54d5e..1b4005a 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -127,6 +127,16 @@ DEF_HELPER_2(vfp_sltod, f64, i64, i32) DEF_HELPER_2(vfp_uhtod, f64, i64, i32) DEF_HELPER_2(vfp_ultod, f64, i64, i32) +DEF_HELPER_1(neon_sitos, f32, i32) +DEF_HELPER_1(neon_uitos, f32, i32) +DEF_HELPER_1(neon_tosizs, i32, f32) +DEF_HELPER_1(neon_touizs, i32, f32) + +DEF_HELPER_2(neon_ultos, f32, i32, i32); +DEF_HELPER_2(neon_sltos, f32, i32, i32); +DEF_HELPER_2(neon_touls, i32, f32, i32); +DEF_HELPER_2(neon_tosls, i32, f32, i32); + DEF_HELPER_2(vfp_fcvt_f16_to_f32, f32, i32, env) DEF_HELPER_2(vfp_fcvt_f32_to_f16, i32, f32, env) DEF_HELPER_2(neon_fcvt_f16_to_f32, f32, i32, env) diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c index 1afea43..3998d9c 100644 --- a/target-arm/op_helper.c +++ b/target-arm/op_helper.c @@ -448,14 +448,23 @@ CONV_ITOF(vfp_##name##to##p, fsz, sign, &env->vfp.fp_status) \ CONV_FTOI(vfp_##to##name##p, fsz, sign, &env->vfp.fp_status, ) \ CONV_FTOI(vfp_##to##name##z##p, fsz, sign, &env->vfp.fp_status, _round_to_zero) +#define NEON_CONVS(name, p, fsz, sign) \ +CONV_ITOF(neon_##name##to##p, fsz, sign, &env->vfp.standard_fp_status) \ +CONV_FTOI(neon_##to##name##z##p, fsz, sign, &env->vfp.standard_fp_status, \ + _round_to_zero) + VFP_CONVS(si, s, 32, ) VFP_CONVS(si, d, 64, ) VFP_CONVS(ui, s, 32, u) VFP_CONVS(ui, d, 64, u) +NEON_CONVS(si, s, 32, ) +NEON_CONVS(ui, s, 32, u) + #undef CONV_ITOF #undef CONV_FTOI #undef VFP_CONVS +#undef NEON_CONVS /* VFP3 fixed point conversion. */ #define VFP_CONV_FIX(pfx, name, p, fsz, itype, sign, status) \ @@ -485,4 +494,7 @@ VFP_CONV_FIX(vfp_, sl, s, 32, int32, , &env->vfp.fp_status) VFP_CONV_FIX(vfp_, uh, s, 32, uint16, u, &env->vfp.fp_status) VFP_CONV_FIX(vfp_, ul, s, 32, uint32, u, &env->vfp.fp_status) +VFP_CONV_FIX(neon_, sl, s, 32, int32, , &env->vfp.standard_fp_status) +VFP_CONV_FIX(neon_, ul, s, 32, uint32, u, &env->vfp.standard_fp_status) + #undef VFP_CONV_FIX diff --git a/target-arm/translate.c b/target-arm/translate.c index 195cf30..10592a5 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -5220,6 +5220,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) } } else if (op >= 14) { /* VCVT fixed-point. */ + TCGv tmp_shift; if (!(insn & (1 << 21)) || (q && ((rd | rm) & 1))) { return 1; } @@ -5227,21 +5228,25 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) * hence this 32-shift where the ARM ARM has 64-imm6. */ shift = 32 - shift; + tmp_shift = tcg_const_i32(shift); for (pass = 0; pass < (q ? 4 : 2); pass++) { tcg_gen_ld_f32(cpu_F0s, cpu_env, neon_reg_offset(rm, pass)); if (!(op & 1)) { - if (u) - gen_vfp_ulto(0, shift); - else - gen_vfp_slto(0, shift); + if (u) { + gen_helper_neon_ultos(cpu_F0s, cpu_F0s, tmp_shift); + } else { + gen_helper_neon_sltos(cpu_F0s, cpu_F0s, tmp_shift); + } } else { - if (u) - gen_vfp_toul(0, shift); - else - gen_vfp_tosl(0, shift); + if (u) { + gen_helper_neon_touls(cpu_F0s, cpu_F0s, tmp_shift); + } else { + gen_helper_neon_tosls(cpu_F0s, cpu_F0s, tmp_shift); + } } tcg_gen_st_f32(cpu_F0s, cpu_env, neon_reg_offset(rd, pass)); } + tcg_temp_free_i32(tmp_shift); } else { return 1; } @@ -6051,16 +6056,16 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) gen_helper_rsqrte_f32(cpu_F0s, cpu_F0s, cpu_env); break; case NEON_2RM_VCVT_FS: /* VCVT.F32.S32 */ - gen_vfp_sito(0); + gen_helper_neon_sitos(cpu_F0s, cpu_F0s); break; case NEON_2RM_VCVT_FU: /* VCVT.F32.U32 */ - gen_vfp_uito(0); + gen_helper_neon_uitos(cpu_F0s, cpu_F0s); break; case NEON_2RM_VCVT_SF: /* VCVT.S32.F32 */ - gen_vfp_tosiz(0); + gen_helper_neon_tosizs(cpu_F0s, cpu_F0s); break; case NEON_2RM_VCVT_UF: /* VCVT.U32.F32 */ - gen_vfp_touiz(0); + gen_helper_neon_touizs(cpu_F0s, cpu_F0s); break; default: /* Reserved op values were caught by the -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 6/7] softfloat: Add new flag for when denormal result is flushed to zero 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell ` (4 preceding siblings ...) 2011-05-06 12:48 ` [Qemu-devel] [PATCH 5/7] target-arm: Add separate Neon float-int conversion helpers Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 7/7] target-arm: Signal Underflow when denormal flushed to zero on output Peter Maydell 2011-05-17 18:19 ` [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches Add a new float_flag_output_denormal which is set when the result of a floating point operation would be denormal but is flushed to zero because we are in flush_to_zero mode. This is necessary because some architectures signal this condition as an underflow and others signal it as an inexact result. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- fpu/softfloat.c | 41 ++++++++++++++++++++++++++++++++++------- fpu/softfloat.h | 3 ++- 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index baba1dc..e3cd8a7 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -341,7 +341,10 @@ static float32 roundAndPackFloat32( flag zSign, int16 zExp, uint32_t zSig STATUS return packFloat32( zSign, 0xFF, - ( roundIncrement == 0 )); } if ( zExp < 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat32( zSign, 0, 0 ); + if (STATUS(flush_to_zero)) { + float_raise(float_flag_output_denormal STATUS_VAR); + return packFloat32(zSign, 0, 0); + } isTiny = ( STATUS(float_detect_tininess) == float_tininess_before_rounding ) || ( zExp < -1 ) @@ -520,7 +523,10 @@ static float64 roundAndPackFloat64( flag zSign, int16 zExp, uint64_t zSig STATUS return packFloat64( zSign, 0x7FF, - ( roundIncrement == 0 )); } if ( zExp < 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat64( zSign, 0, 0 ); + if (STATUS(flush_to_zero)) { + float_raise(float_flag_output_denormal STATUS_VAR); + return packFloat64(zSign, 0, 0); + } isTiny = ( STATUS(float_detect_tininess) == float_tininess_before_rounding ) || ( zExp < -1 ) @@ -699,7 +705,10 @@ static floatx80 goto overflow; } if ( zExp <= 0 ) { - if ( STATUS(flush_to_zero) ) return packFloatx80( zSign, 0, 0 ); + if (STATUS(flush_to_zero)) { + float_raise(float_flag_output_denormal STATUS_VAR); + return packFloatx80(zSign, 0, 0); + } isTiny = ( STATUS(float_detect_tininess) == float_tininess_before_rounding ) || ( zExp < 0 ) @@ -1030,7 +1039,10 @@ static float128 return packFloat128( zSign, 0x7FFF, 0, 0 ); } if ( zExp < 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat128( zSign, 0, 0, 0 ); + if (STATUS(flush_to_zero)) { + float_raise(float_flag_output_denormal STATUS_VAR); + return packFloat128(zSign, 0, 0, 0); + } isTiny = ( STATUS(float_detect_tininess) == float_tininess_before_rounding ) || ( zExp < -1 ) @@ -1761,7 +1773,12 @@ static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_PARAM) return a; } if ( aExp == 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat32( zSign, 0, 0 ); + if (STATUS(flush_to_zero)) { + if (aSig | bSig) { + float_raise(float_flag_output_denormal STATUS_VAR); + } + return packFloat32(zSign, 0, 0); + } return packFloat32( zSign, 0, ( aSig + bSig )>>6 ); } zSig = 0x40000000 + aSig + bSig; @@ -3120,7 +3137,12 @@ static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_PARAM ) return a; } if ( aExp == 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat64( zSign, 0, 0 ); + if (STATUS(flush_to_zero)) { + if (aSig | bSig) { + float_raise(float_flag_output_denormal STATUS_VAR); + } + return packFloat64(zSign, 0, 0); + } return packFloat64( zSign, 0, ( aSig + bSig )>>9 ); } zSig = LIT64( 0x4000000000000000 ) + aSig + bSig; @@ -5282,7 +5304,12 @@ static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STATUS_PARAM } add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 ); if ( aExp == 0 ) { - if ( STATUS(flush_to_zero) ) return packFloat128( zSign, 0, 0, 0 ); + if (STATUS(flush_to_zero)) { + if (zSig0 | zSig1) { + float_raise(float_flag_output_denormal STATUS_VAR); + } + return packFloat128(zSign, 0, 0, 0); + } return packFloat128( zSign, 0, zSig0, zSig1 ); } zSig2 = 0; diff --git a/fpu/softfloat.h b/fpu/softfloat.h index 5eff085..58c9b7b 100644 --- a/fpu/softfloat.h +++ b/fpu/softfloat.h @@ -193,7 +193,8 @@ enum { float_flag_overflow = 8, float_flag_underflow = 16, float_flag_inexact = 32, - float_flag_input_denormal = 64 + float_flag_input_denormal = 64, + float_flag_output_denormal = 128 }; typedef struct float_status { -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 7/7] target-arm: Signal Underflow when denormal flushed to zero on output 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell ` (5 preceding siblings ...) 2011-05-06 12:48 ` [Qemu-devel] [PATCH 6/7] softfloat: Add new flag for when denormal result is flushed to zero Peter Maydell @ 2011-05-06 12:48 ` Peter Maydell 2011-05-17 18:19 ` [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-06 12:48 UTC (permalink / raw) To: qemu-devel; +Cc: patches On ARM the architecture mandates that when an output denormal is flushed to zero we must set the FPSCR UFC (underflow) bit, so map softfloat's float_flag_output_denormal accordingly. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target-arm/helper.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index de00468..149fc82 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -2355,7 +2355,7 @@ static inline int vfp_exceptbits_from_host(int host_bits) target_bits |= 2; if (host_bits & float_flag_overflow) target_bits |= 4; - if (host_bits & float_flag_underflow) + if (host_bits & (float_flag_underflow | float_flag_output_denormal)) target_bits |= 8; if (host_bits & float_flag_inexact) target_bits |= 0x10; -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell ` (6 preceding siblings ...) 2011-05-06 12:48 ` [Qemu-devel] [PATCH 7/7] target-arm: Signal Underflow when denormal flushed to zero on output Peter Maydell @ 2011-05-17 18:19 ` Peter Maydell 7 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2011-05-17 18:19 UTC (permalink / raw) To: qemu-devel; +Cc: patches On 6 May 2011 13:48, Peter Maydell <peter.maydell@linaro.org> wrote: > This patch series fixes a number of minor bugs in the ARM target where > we were not correctly setting the cumulative exception flags in the > FPSCR. It includes adding a new flag to softfloat indicating when a > denormal result has been flushed to zero (as discussed previously on > the list.) > Peter Maydell (7): > target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns > target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS > target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN > target-arm: Refactor int-float conversions > target-arm: Add separate Neon float-int conversion helpers > softfloat: Add new flag for when denormal result is flushed to zero > target-arm: Signal Underflow when denormal flushed to zero on output I'm redoing patches 4 and 5 based on review comments; does anybody have any comments on 1,2,3,6,7 ? (if not then they could all be applied now, I guess -- 6 and 7 don't depend on 4 and 5. otherwise I'll put them all into my resend as-is.) thanks -- PMM ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-05-17 18:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-06 12:48 [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 1/7] target-arm: Don't set FP exceptions in recip, recip_sqrt estimate fns Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 2/7] target-arm: Signal InputDenormal for VRECPE, VRSQRTE, VRECPS, VRSQRTS Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 3/7] target-arm: Signal InvalidOp for Neon GE and GT compares of QNaN Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 4/7] target-arm: Refactor int-float conversions Peter Maydell 2011-05-06 14:09 ` Paul Brook 2011-05-06 14:42 ` Peter Maydell 2011-05-06 15:30 ` Blue Swirl 2011-05-06 16:38 ` Paul Brook 2011-05-08 10:32 ` Blue Swirl 2011-05-14 22:38 ` Aurelien Jarno 2011-05-06 12:48 ` [Qemu-devel] [PATCH 5/7] target-arm: Add separate Neon float-int conversion helpers Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 6/7] softfloat: Add new flag for when denormal result is flushed to zero Peter Maydell 2011-05-06 12:48 ` [Qemu-devel] [PATCH 7/7] target-arm: Signal Underflow when denormal flushed to zero on output Peter Maydell 2011-05-17 18:19 ` [Qemu-devel] [PATCH 0/7] target-arm: Fix bugs in fp exception flag setting Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).