* [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups @ 2019-02-09 3:38 Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson ` (14 more replies) 0 siblings, 15 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Changes since v2: * Fix some representational issues with FPSCR. * Use host vector saturation for SQADD/UQADD. This requires changing the internal representation of FPSR.QC. * Fix a latent vector bug, noticed during the rest. Correct RISU results depend on Mark C-A's patch from today, "tcg/i386: fix unsigned vector saturating arithmetic", which will be in my next tcg pull. r~ Richard Henderson (12): target/arm: Rely on optimization within tcg_gen_gvec_or target/arm: Use vector minmax expanders for aarch64 target/arm: Use vector minmax expanders for aarch32 target/arm: Use tcg integer min/max primitives for neon target/arm: Remove neon min/max helpers target/arm: Fix vfp_gdb_get/set_reg vs FPSCR target/arm: Fix arm_cpu_dump_state vs FPSCR target/arm: Split out flags setting from vfp compares target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] target/arm: Split out FPSCR.QC to a vector field target/arm: Use vector operations for saturation target/arm: Add missing clear_tail calls target/arm/cpu.h | 5 +- target/arm/helper.h | 45 ++++++-- target/arm/translate.h | 4 + target/arm/helper.c | 81 +++++++++----- target/arm/neon_helper.c | 14 +-- target/arm/translate-a64.c | 77 ++++++------- target/arm/translate-sve.c | 6 +- target/arm/translate.c | 219 +++++++++++++++++++++++++++++-------- target/arm/vec_helper.c | 134 ++++++++++++++++++++++- 9 files changed, 433 insertions(+), 152 deletions(-) -- 2.17.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson ` (13 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Since we're now handling a == b generically, we no longer need to do it by hand within target/arm/. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/translate-a64.c | 6 +----- target/arm/translate-sve.c | 6 +----- target/arm/translate.c | 12 +++--------- 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index e002251ac6..a12bfac719 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10648,11 +10648,7 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_andc, 0); return; case 2: /* ORR */ - if (rn == rm) { /* MOV */ - gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_mov, 0); - } else { - gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0); - } + gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0); return; case 3: /* ORN */ gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_orc, 0); diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b15b615ceb..3a2eb51566 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -280,11 +280,7 @@ static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a) static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a) { - if (a->rn == a->rm) { /* MOV */ - return do_mov_z(s, a->rd, a->rn); - } else { - return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm); - } + return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm); } static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a) diff --git a/target/arm/translate.c b/target/arm/translate.c index 66cf28c8cb..9d2dba7ed2 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -6294,15 +6294,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size); break; - case 2: - if (rn == rm) { - /* VMOV */ - tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size); - } else { - /* VORR */ - tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs, - vec_size, vec_size); - } + case 2: /* VORR */ + tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs, + vec_size, vec_size); break; case 3: /* VORN */ tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs, -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson ` (12 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/translate-a64.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index a12bfac719..fd5ceb6613 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10948,6 +10948,20 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) } switch (opcode) { + case 0x0c: /* SMAX, UMAX */ + if (u) { + gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size); + } else { + gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smax, size); + } + return; + case 0x0d: /* SMIN, UMIN */ + if (u) { + gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umin, size); + } else { + gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smin, size); + } + return; case 0x10: /* ADD, SUB */ if (u) { gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_sub, size); @@ -11109,27 +11123,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genenvfn = fns[size][u]; break; } - case 0xc: /* SMAX, UMAX */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_max_s8, gen_helper_neon_max_u8 }, - { gen_helper_neon_max_s16, gen_helper_neon_max_u16 }, - { tcg_gen_smax_i32, tcg_gen_umax_i32 }, - }; - genfn = fns[size][u]; - break; - } - - case 0xd: /* SMIN, UMIN */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_min_s8, gen_helper_neon_min_u8 }, - { gen_helper_neon_min_s16, gen_helper_neon_min_u16 }, - { tcg_gen_smin_i32, tcg_gen_umin_i32 }, - }; - genfn = fns[size][u]; - break; - } case 0xe: /* SABD, UABD */ case 0xf: /* SABA, UABA */ { -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson ` (11 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/translate.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/target/arm/translate.c b/target/arm/translate.c index 9d2dba7ed2..df1cd3fa3e 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -6368,6 +6368,25 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size); return 0; + + case NEON_3R_VMAX: + if (u) { + tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs, + vec_size, vec_size); + } else { + tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs, + vec_size, vec_size); + } + return 0; + case NEON_3R_VMIN: + if (u) { + tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs, + vec_size, vec_size); + } else { + tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs, + vec_size, vec_size); + } + return 0; } if (size == 3) { @@ -6533,12 +6552,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case NEON_3R_VQRSHL: GEN_NEON_INTEGER_OP_ENV(qrshl); break; - case NEON_3R_VMAX: - GEN_NEON_INTEGER_OP(max); - break; - case NEON_3R_VMIN: - GEN_NEON_INTEGER_OP(min); - break; case NEON_3R_VABD: GEN_NEON_INTEGER_OP(abd); break; -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (2 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson ` (10 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The 32-bit PMIN/PMAX has been decomposed to scalars, and so can be trivially expanded inline. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/translate.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/target/arm/translate.c b/target/arm/translate.c index df1cd3fa3e..f0101d2788 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -4760,10 +4760,10 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1) } /* 32-bit pairwise ops end up the same as the elementwise versions. */ -#define gen_helper_neon_pmax_s32 gen_helper_neon_max_s32 -#define gen_helper_neon_pmax_u32 gen_helper_neon_max_u32 -#define gen_helper_neon_pmin_s32 gen_helper_neon_min_s32 -#define gen_helper_neon_pmin_u32 gen_helper_neon_min_u32 +#define gen_helper_neon_pmax_s32 tcg_gen_smax_i32 +#define gen_helper_neon_pmax_u32 tcg_gen_umax_i32 +#define gen_helper_neon_pmin_s32 tcg_gen_smin_i32 +#define gen_helper_neon_pmin_u32 tcg_gen_umin_i32 #define GEN_NEON_INTEGER_OP_ENV(name) do { \ switch ((size << 1) | u) { \ -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (3 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson ` (9 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell These are now unused. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/helper.h | 12 ------------ target/arm/neon_helper.c | 12 ------------ 2 files changed, 24 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 53a38188c6..9874c35ea9 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -276,18 +276,6 @@ DEF_HELPER_2(neon_cge_s16, i32, i32, i32) DEF_HELPER_2(neon_cge_u32, i32, i32, i32) DEF_HELPER_2(neon_cge_s32, i32, i32, i32) -DEF_HELPER_2(neon_min_u8, i32, i32, i32) -DEF_HELPER_2(neon_min_s8, i32, i32, i32) -DEF_HELPER_2(neon_min_u16, i32, i32, i32) -DEF_HELPER_2(neon_min_s16, i32, i32, i32) -DEF_HELPER_2(neon_min_u32, i32, i32, i32) -DEF_HELPER_2(neon_min_s32, i32, i32, i32) -DEF_HELPER_2(neon_max_u8, i32, i32, i32) -DEF_HELPER_2(neon_max_s8, i32, i32, i32) -DEF_HELPER_2(neon_max_u16, i32, i32, i32) -DEF_HELPER_2(neon_max_s16, i32, i32, i32) -DEF_HELPER_2(neon_max_u32, i32, i32, i32) -DEF_HELPER_2(neon_max_s32, i32, i32, i32) DEF_HELPER_2(neon_pmin_u8, i32, i32, i32) DEF_HELPER_2(neon_pmin_s8, i32, i32, i32) DEF_HELPER_2(neon_pmin_u16, i32, i32, i32) diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index c2c6491a83..3249005b62 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -581,12 +581,6 @@ NEON_VOP(cge_u32, neon_u32, 1) #undef NEON_FN #define NEON_FN(dest, src1, src2) dest = (src1 < src2) ? src1 : src2 -NEON_VOP(min_s8, neon_s8, 4) -NEON_VOP(min_u8, neon_u8, 4) -NEON_VOP(min_s16, neon_s16, 2) -NEON_VOP(min_u16, neon_u16, 2) -NEON_VOP(min_s32, neon_s32, 1) -NEON_VOP(min_u32, neon_u32, 1) NEON_POP(pmin_s8, neon_s8, 4) NEON_POP(pmin_u8, neon_u8, 4) NEON_POP(pmin_s16, neon_s16, 2) @@ -594,12 +588,6 @@ NEON_POP(pmin_u16, neon_u16, 2) #undef NEON_FN #define NEON_FN(dest, src1, src2) dest = (src1 > src2) ? src1 : src2 -NEON_VOP(max_s8, neon_s8, 4) -NEON_VOP(max_u8, neon_u8, 4) -NEON_VOP(max_s16, neon_s16, 2) -NEON_VOP(max_u16, neon_u16, 2) -NEON_VOP(max_s32, neon_s32, 1) -NEON_VOP(max_u32, neon_u32, 1) NEON_POP(pmax_s8, neon_s8, 4) NEON_POP(pmax_u8, neon_u8, 4) NEON_POP(pmax_s16, neon_s16, 2) -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (4 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson ` (8 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The components of this register is stored in several different locations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/helper.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index 520ceea7a4..6ac81c2ca2 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -81,7 +81,7 @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg) } switch (reg - nregs) { case 0: stl_p(buf, env->vfp.xregs[ARM_VFP_FPSID]); return 4; - case 1: stl_p(buf, env->vfp.xregs[ARM_VFP_FPSCR]); return 4; + case 1: stl_p(buf, vfp_get_fpscr(env)); return 4; case 2: stl_p(buf, env->vfp.xregs[ARM_VFP_FPEXC]); return 4; } return 0; @@ -107,7 +107,7 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg) } switch (reg - nregs) { case 0: env->vfp.xregs[ARM_VFP_FPSID] = ldl_p(buf); return 4; - case 1: env->vfp.xregs[ARM_VFP_FPSCR] = ldl_p(buf); return 4; + case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4; case 2: env->vfp.xregs[ARM_VFP_FPEXC] = ldl_p(buf) & (1 << 30); return 4; } return 0; -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state vs FPSCR 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (5 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson ` (7 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/translate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/translate.c b/target/arm/translate.c index f0101d2788..9b426f4271 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -13641,7 +13641,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf, i * 2 + 1, (uint32_t)(v >> 32), i, v); } - cpu_fprintf(f, "FPSCR: %08x\n", (int)env->vfp.xregs[ARM_VFP_FPSCR]); + cpu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env)); } } -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (6 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson ` (6 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Minimize the code within a macro by splitting out a helper function. Use deposit32 instead of manual bit manipulation. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/helper.c | 45 +++++++++++++++++++++++++++------------------ 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index 6ac81c2ca2..51be3fa16f 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -12752,31 +12752,40 @@ float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env) return float64_sqrt(a, &env->vfp.fp_status); } +static void softfloat_to_vfp_compare(CPUARMState *env, int cmp) +{ + uint32_t flags; + switch (cmp) { + case float_relation_equal: + flags = 0x6; + break; + case float_relation_less: + flags = 0x8; + break; + case float_relation_greater: + flags = 0x2; + break; + case float_relation_unordered: + flags = 0x3; + break; + default: + g_assert_not_reached(); + } + env->vfp.xregs[ARM_VFP_FPSCR] = + deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags); +} + /* XXX: check quiet/signaling case */ #define DO_VFP_cmp(p, type) \ void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env) \ { \ - uint32_t flags; \ - switch(type ## _compare_quiet(a, b, &env->vfp.fp_status)) { \ - case 0: flags = 0x6; break; \ - case -1: flags = 0x8; break; \ - case 1: flags = 0x2; break; \ - default: case 2: flags = 0x3; break; \ - } \ - env->vfp.xregs[ARM_VFP_FPSCR] = (flags << 28) \ - | (env->vfp.xregs[ARM_VFP_FPSCR] & 0x0fffffff); \ + softfloat_to_vfp_compare(env, \ + type ## _compare_quiet(a, b, &env->vfp.fp_status)); \ } \ void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \ { \ - uint32_t flags; \ - switch(type ## _compare(a, b, &env->vfp.fp_status)) { \ - case 0: flags = 0x6; break; \ - case -1: flags = 0x8; break; \ - case 1: flags = 0x2; break; \ - default: case 2: flags = 0x3; break; \ - } \ - env->vfp.xregs[ARM_VFP_FPSCR] = (flags << 28) \ - | (env->vfp.xregs[ARM_VFP_FPSCR] & 0x0fffffff); \ + softfloat_to_vfp_compare(env, \ + type ## _compare(a, b, &env->vfp.fp_status)); \ } DO_VFP_cmp(s, float32) DO_VFP_cmp(d, float64) -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (7 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson ` (5 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Given that we mask bits properly on set, there is no reason to mask them again on get. We failed to clear the exception status bits, 0x9f, which means that the wrong value would be returned on get. Except in the (probably normal) case in which the set clears all of the bits. Simplify the code in set to also clear the RES0 bits. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/helper.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index 51be3fa16f..af22274bd9 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -12588,7 +12588,7 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env) int i; uint32_t fpscr; - fpscr = (env->vfp.xregs[ARM_VFP_FPSCR] & 0xffc8ffff) + fpscr = env->vfp.xregs[ARM_VFP_FPSCR] | (env->vfp.vec_len << 16) | (env->vfp.vec_stride << 20); @@ -12630,7 +12630,7 @@ static inline int vfp_exceptbits_to_host(int target_bits) void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val) { int i; - uint32_t changed; + uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR]; /* When ARMv8.2-FP16 is not supported, FZ16 is RES0. */ if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) { @@ -12639,12 +12639,13 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val) /* * We don't implement trapped exception handling, so the - * trap enable bits are all RAZ/WI (not RES0!) + * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!) + * + * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC + * (which are stored in fp_status), and the other RES0 bits + * in between, then we clear all of the low 16 bits. */ - val &= ~(FPCR_IDE | FPCR_IXE | FPCR_UFE | FPCR_OFE | FPCR_DZE | FPCR_IOE); - - changed = env->vfp.xregs[ARM_VFP_FPSCR]; - env->vfp.xregs[ARM_VFP_FPSCR] = (val & 0xffc8ffff); + env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xffc80000; env->vfp.vec_len = (val >> 16) & 7; env->vfp.vec_stride = (val >> 20) & 3; -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (8 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson ` (4 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Change the representation of this field such that it is easy to set from vector code. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/cpu.h | 5 ++++- target/arm/helper.c | 19 +++++++++++++++---- target/arm/neon_helper.c | 2 +- target/arm/vec_helper.c | 2 +- 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 47238e4245..b96463e8f1 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -577,11 +577,13 @@ typedef struct CPUARMState { ARMPredicateReg preg_tmp; #endif - uint32_t xregs[16]; /* We store these fpcsr fields separately for convenience. */ + uint32_t qc[4] QEMU_ALIGNED(16); int vec_len; int vec_stride; + uint32_t xregs[16]; + /* Scratch space for aa32 neon expansion. */ uint32_t scratch[8]; @@ -1427,6 +1429,7 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val); #define FPCR_FZ16 (1 << 19) /* ARMv8.2+, FP16 flush-to-zero */ #define FPCR_FZ (1 << 24) /* Flush-to-zero enable bit */ #define FPCR_DN (1 << 25) /* Default NaN enable bit */ +#define FPCR_QC (1 << 27) /* Cumulative saturation bit */ static inline uint32_t vfp_get_fpsr(CPUARMState *env) { diff --git a/target/arm/helper.c b/target/arm/helper.c index af22274bd9..7ed9933663 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -12585,8 +12585,7 @@ static inline int vfp_exceptbits_from_host(int host_bits) uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env) { - int i; - uint32_t fpscr; + uint32_t i, fpscr; fpscr = env->vfp.xregs[ARM_VFP_FPSCR] | (env->vfp.vec_len << 16) @@ -12597,8 +12596,11 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env) /* FZ16 does not generate an input denormal exception. */ i |= (get_float_exception_flags(&env->vfp.fp_status_f16) & ~float_flag_input_denormal); - fpscr |= vfp_exceptbits_from_host(i); + + i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3]; + fpscr |= i ? FPCR_QC : 0; + return fpscr; } @@ -12645,10 +12647,19 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val) * (which are stored in fp_status), and the other RES0 bits * in between, then we clear all of the low 16 bits. */ - env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xffc80000; + env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000; env->vfp.vec_len = (val >> 16) & 7; env->vfp.vec_stride = (val >> 20) & 3; + /* + * The bit we set within fpscr_q is arbitrary; the register as a + * whole being zero/non-zero is what counts. + */ + env->vfp.qc[0] = val & FPCR_QC; + env->vfp.qc[1] = 0; + env->vfp.qc[2] = 0; + env->vfp.qc[3] = 0; + changed ^= val; if (changed & (3 << 22)) { i = (val >> 22) & 3; diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c index 3249005b62..ed1c6fc41c 100644 --- a/target/arm/neon_helper.c +++ b/target/arm/neon_helper.c @@ -15,7 +15,7 @@ #define SIGNBIT (uint32_t)0x80000000 #define SIGNBIT64 ((uint64_t)1 << 63) -#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q +#define SET_QC() env->vfp.qc[0] = 1 #define NEON_TYPE1(name, type) \ typedef struct \ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 37f338732e..65a18af4e0 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -36,7 +36,7 @@ #define H4(x) (x) #endif -#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q +#define SET_QC() env->vfp.qc[0] = 1 static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz) { -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (9 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson ` (3 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell For same-sign saturation, we have tcg vector operations. We can compute the QC bit by comparing the saturated value against the unsaturated value. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/helper.h | 33 +++++++ target/arm/translate.h | 4 + target/arm/translate-a64.c | 36 ++++---- target/arm/translate.c | 172 +++++++++++++++++++++++++++++++------ target/arm/vec_helper.c | 130 ++++++++++++++++++++++++++++ 5 files changed, 331 insertions(+), 44 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 9874c35ea9..923e8e1525 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -641,6 +641,39 @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate.h b/target/arm/translate.h index 17748ddfb9..f25fe75685 100644 --- a/target/arm/translate.h +++ b/target/arm/translate.h @@ -214,6 +214,10 @@ extern const GVecGen2i ssra_op[4]; extern const GVecGen2i usra_op[4]; extern const GVecGen2i sri_op[4]; extern const GVecGen2i sli_op[4]; +extern const GVecGen4 uqadd_op[4]; +extern const GVecGen4 sqadd_op[4]; +extern const GVecGen4 uqsub_op[4]; +extern const GVecGen4 sqsub_op[4]; void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); /* diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index fd5ceb6613..af8e4fd4be 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10948,6 +10948,22 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) } switch (opcode) { + case 0x01: /* SQADD, UQADD */ + tcg_gen_gvec_4(vec_full_reg_offset(s, rd), + offsetof(CPUARMState, vfp.qc), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + (u ? uqadd_op : sqadd_op) + size); + return; + case 0x05: /* SQSUB, UQSUB */ + tcg_gen_gvec_4(vec_full_reg_offset(s, rd), + offsetof(CPUARMState, vfp.qc), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + (u ? uqsub_op : sqsub_op) + size); + return; case 0x0c: /* SMAX, UMAX */ if (u) { gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size); @@ -11043,16 +11059,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x1: /* SQADD, UQADD */ - { - static NeonGenTwoOpEnvFn * const fns[3][2] = { - { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 }, - { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 }, - { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 }, - }; - genenvfn = fns[size][u]; - break; - } case 0x2: /* SRHADD, URHADD */ { static NeonGenTwoOpFn * const fns[3][2] = { @@ -11073,16 +11079,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x5: /* SQSUB, UQSUB */ - { - static NeonGenTwoOpEnvFn * const fns[3][2] = { - { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 }, - { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 }, - { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 }, - }; - genenvfn = fns[size][u]; - break; - } case 0x8: /* SSHL, USHL */ { static NeonGenTwoOpFn * const fns[3][2] = { diff --git a/target/arm/translate.c b/target/arm/translate.c index 9b426f4271..dac737f6ca 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -6148,6 +6148,142 @@ const GVecGen3 cmtst_op[4] = { .vece = MO_64 }, }; +static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x = tcg_temp_new_vec_matching(t); + tcg_gen_add_vec(vece, x, a, b); + tcg_gen_usadd_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 uqadd_op[4] = { + { .fniv = gen_uqadd_vec, + .fno = gen_helper_gvec_uqadd_b, + .opc = INDEX_op_usadd_vec, + .write_aofs = true, + .vece = MO_8 }, + { .fniv = gen_uqadd_vec, + .fno = gen_helper_gvec_uqadd_h, + .opc = INDEX_op_usadd_vec, + .write_aofs = true, + .vece = MO_16 }, + { .fniv = gen_uqadd_vec, + .fno = gen_helper_gvec_uqadd_s, + .opc = INDEX_op_usadd_vec, + .write_aofs = true, + .vece = MO_32 }, + { .fniv = gen_uqadd_vec, + .fno = gen_helper_gvec_uqadd_d, + .opc = INDEX_op_usadd_vec, + .write_aofs = true, + .vece = MO_64 }, +}; + +static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x = tcg_temp_new_vec_matching(t); + tcg_gen_add_vec(vece, x, a, b); + tcg_gen_ssadd_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 sqadd_op[4] = { + { .fniv = gen_sqadd_vec, + .fno = gen_helper_gvec_sqadd_b, + .opc = INDEX_op_ssadd_vec, + .write_aofs = true, + .vece = MO_8 }, + { .fniv = gen_sqadd_vec, + .fno = gen_helper_gvec_sqadd_h, + .opc = INDEX_op_ssadd_vec, + .write_aofs = true, + .vece = MO_16 }, + { .fniv = gen_sqadd_vec, + .fno = gen_helper_gvec_sqadd_s, + .opc = INDEX_op_ssadd_vec, + .write_aofs = true, + .vece = MO_32 }, + { .fniv = gen_sqadd_vec, + .fno = gen_helper_gvec_sqadd_d, + .opc = INDEX_op_ssadd_vec, + .write_aofs = true, + .vece = MO_64 }, +}; + +static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x = tcg_temp_new_vec_matching(t); + tcg_gen_sub_vec(vece, x, a, b); + tcg_gen_ussub_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 uqsub_op[4] = { + { .fniv = gen_uqsub_vec, + .fno = gen_helper_gvec_uqsub_b, + .opc = INDEX_op_ussub_vec, + .write_aofs = true, + .vece = MO_8 }, + { .fniv = gen_uqsub_vec, + .fno = gen_helper_gvec_uqsub_h, + .opc = INDEX_op_ussub_vec, + .write_aofs = true, + .vece = MO_16 }, + { .fniv = gen_uqsub_vec, + .fno = gen_helper_gvec_uqsub_s, + .opc = INDEX_op_ussub_vec, + .write_aofs = true, + .vece = MO_32 }, + { .fniv = gen_uqsub_vec, + .fno = gen_helper_gvec_uqsub_d, + .opc = INDEX_op_ussub_vec, + .write_aofs = true, + .vece = MO_64 }, +}; + +static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x = tcg_temp_new_vec_matching(t); + tcg_gen_sub_vec(vece, x, a, b); + tcg_gen_sssub_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 sqsub_op[4] = { + { .fniv = gen_sqsub_vec, + .fno = gen_helper_gvec_sqsub_b, + .opc = INDEX_op_sssub_vec, + .write_aofs = true, + .vece = MO_8 }, + { .fniv = gen_sqsub_vec, + .fno = gen_helper_gvec_sqsub_h, + .opc = INDEX_op_sssub_vec, + .write_aofs = true, + .vece = MO_16 }, + { .fniv = gen_sqsub_vec, + .fno = gen_helper_gvec_sqsub_s, + .opc = INDEX_op_sssub_vec, + .write_aofs = true, + .vece = MO_32 }, + { .fniv = gen_sqsub_vec, + .fno = gen_helper_gvec_sqsub_d, + .opc = INDEX_op_sssub_vec, + .write_aofs = true, + .vece = MO_64 }, +}; + /* Translate a NEON data processing instruction. Return nonzero if the instruction is invalid. We process data in a mixture of 32-bit and 64-bit chunks. @@ -6331,6 +6467,18 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) } return 0; + case NEON_3R_VQADD: + tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), + rn_ofs, rm_ofs, vec_size, vec_size, + (u ? uqadd_op : sqadd_op) + size); + break; + + case NEON_3R_VQSUB: + tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), + rn_ofs, rm_ofs, vec_size, vec_size, + (u ? uqsub_op : sqsub_op) + size); + break; + case NEON_3R_VMUL: /* VMUL */ if (u) { /* Polynomial case allows only P8 and is handled below. */ @@ -6395,24 +6543,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) neon_load_reg64(cpu_V0, rn + pass); neon_load_reg64(cpu_V1, rm + pass); switch (op) { - case NEON_3R_VQADD: - if (u) { - gen_helper_neon_qadd_u64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } else { - gen_helper_neon_qadd_s64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } - break; - case NEON_3R_VQSUB: - if (u) { - gen_helper_neon_qsub_u64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } else { - gen_helper_neon_qsub_s64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } - break; case NEON_3R_VSHL: if (u) { gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0); @@ -6528,18 +6658,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) case NEON_3R_VHADD: GEN_NEON_INTEGER_OP(hadd); break; - case NEON_3R_VQADD: - GEN_NEON_INTEGER_OP_ENV(qadd); - break; case NEON_3R_VRHADD: GEN_NEON_INTEGER_OP(rhadd); break; case NEON_3R_VHSUB: GEN_NEON_INTEGER_OP(hsub); break; - case NEON_3R_VQSUB: - GEN_NEON_INTEGER_OP_ENV(qsub); - break; case NEON_3R_VSHL: GEN_NEON_INTEGER_OP(shl); break; diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 65a18af4e0..10f17e4b5c 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -766,3 +766,133 @@ DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4) DO_FMLA_IDX(gvec_fmla_idx_d, float64, ) #undef DO_FMLA_IDX + +#define DO_SAT(NAME, WTYPE, TYPEN, TYPEM, OP, MIN, MAX) \ +void HELPER(NAME)(void *vd, void *vq, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + TYPEN *d = vd, *n = vn; TYPEM *m = vm; \ + bool q = false; \ + for (i = 0; i < oprsz / sizeof(TYPEN); i++) { \ + WTYPE dd = (WTYPE)n[i] OP m[i]; \ + if (dd < MIN) { \ + dd = MIN; \ + q = true; \ + } else if (dd > MAX) { \ + dd = MAX; \ + q = true; \ + } \ + d[i] = dd; \ + } \ + if (q) { \ + uint32_t *qc = vq; \ + qc[0] = 1; \ + } \ + clear_tail(d, oprsz, simd_maxsz(desc)); \ +} + +DO_SAT(gvec_uqadd_b, int, uint8_t, uint8_t, +, 0, UINT8_MAX) +DO_SAT(gvec_uqadd_h, int, uint16_t, uint16_t, +, 0, UINT16_MAX) +DO_SAT(gvec_uqadd_s, int64_t, uint32_t, uint32_t, +, 0, UINT32_MAX) + +DO_SAT(gvec_sqadd_b, int, int8_t, int8_t, +, INT8_MIN, INT8_MAX) +DO_SAT(gvec_sqadd_h, int, int16_t, int16_t, +, INT16_MIN, INT16_MAX) +DO_SAT(gvec_sqadd_s, int64_t, int32_t, int32_t, +, INT32_MIN, INT32_MAX) + +DO_SAT(gvec_uqsub_b, int, uint8_t, uint8_t, -, 0, UINT8_MAX) +DO_SAT(gvec_uqsub_h, int, uint16_t, uint16_t, -, 0, UINT16_MAX) +DO_SAT(gvec_uqsub_s, int64_t, uint32_t, uint32_t, -, 0, UINT32_MAX) + +DO_SAT(gvec_sqsub_b, int, int8_t, int8_t, -, INT8_MIN, INT8_MAX) +DO_SAT(gvec_sqsub_h, int, int16_t, int16_t, -, INT16_MIN, INT16_MAX) +DO_SAT(gvec_sqsub_s, int64_t, int32_t, int32_t, -, INT32_MIN, INT32_MAX) + +#undef DO_SAT + +void HELPER(gvec_uqadd_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + bool q = false; + + for (i = 0; i < oprsz / 8; i++) { + uint64_t nn = n[i], mm = m[i], dd = nn + mm; + if (dd < nn) { + dd = UINT64_MAX; + q = true; + } + d[i] = dd; + } + if (q) { + uint32_t *qc = vq; + qc[0] = 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_uqsub_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn, *m = vm; + bool q = false; + + for (i = 0; i < oprsz / 8; i++) { + uint64_t nn = n[i], mm = m[i], dd = nn - mm; + if (nn < mm) { + dd = 0; + q = true; + } + d[i] = dd; + } + if (q) { + uint32_t *qc = vq; + qc[0] = 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_sqadd_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc); + int64_t *d = vd, *n = vn, *m = vm; + bool q = false; + + for (i = 0; i < oprsz / 8; i++) { + int64_t nn = n[i], mm = m[i], dd = nn + mm; + if (((dd ^ nn) & ~(nn ^ mm)) & INT64_MIN) { + dd = (nn >> 63) ^ ~INT64_MIN; + q = true; + } + d[i] = dd; + } + if (q) { + uint32_t *qc = vq; + qc[0] = 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc); + int64_t *d = vd, *n = vn, *m = vm; + bool q = false; + + for (i = 0; i < oprsz / 8; i++) { + int64_t nn = n[i], mm = m[i], dd = nn - mm; + if (((dd ^ nn) & (nn ^ mm)) & INT64_MIN) { + dd = (nn >> 63) ^ ~INT64_MIN; + q = true; + } + d[i] = dd; + } + if (q) { + uint32_t *qc = vq; + qc[0] = 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (10 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson @ 2019-02-09 3:38 ` Richard Henderson 2019-02-09 3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply ` (2 subsequent siblings) 14 siblings, 0 replies; 16+ messages in thread From: Richard Henderson @ 2019-02-09 3:38 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell Fortunately, the functions affected are so far only called from SVE, so there is no tail to be cleared. But as we convert more of AdvSIMD to gvec, this will matter. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/vec_helper.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 10f17e4b5c..dfc635cf9a 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -638,6 +638,7 @@ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \ d[i] = FUNC(n[i], stat); \ } \ + clear_tail(d, oprsz, simd_maxsz(desc)); \ } DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16) @@ -688,6 +689,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \ d[i] = FUNC(n[i], m[i], stat); \ } \ + clear_tail(d, oprsz, simd_maxsz(desc)); \ } DO_3OP(gvec_fadd_h, float16_add, float16) -- 2.17.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (11 preceding siblings ...) 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson @ 2019-02-09 3:56 ` no-reply 2019-02-09 4:00 ` no-reply 2019-02-14 16:12 ` Peter Maydell 14 siblings, 0 replies; 16+ messages in thread From: no-reply @ 2019-02-09 3:56 UTC (permalink / raw) To: richard.henderson; +Cc: fam, qemu-devel, peter.maydell Patchew URL: https://patchew.org/QEMU/20190209033847.9014-1-richard.henderson@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Type: series Message-id: 20190209033847.9014-1-richard.henderson@linaro.org === TEST SCRIPT BEGIN === #!/bin/bash git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20190209033847.9014-1-richard.henderson@linaro.org -> patchew/20190209033847.9014-1-richard.henderson@linaro.org Switched to a new branch 'test' 14ab3e4b2d target/arm: Add missing clear_tail calls 6e1ff3edb0 target/arm: Use vector operations for saturation 0e4e921a93 target/arm: Split out FPSCR.QC to a vector field 40b552d504 target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] 3ee1da71e1 target/arm: Split out flags setting from vfp compares 41e8694369 target/arm: Fix arm_cpu_dump_state vs FPSCR 23628c03ad target/arm: Fix vfp_gdb_get/set_reg vs FPSCR 780dc13c0e target/arm: Remove neon min/max helpers fc636d7cdc target/arm: Use tcg integer min/max primitives for neon 8ff6492f62 target/arm: Use vector minmax expanders for aarch32 d86eaa3793 target/arm: Use vector minmax expanders for aarch64 fc0b82c4ab target/arm: Rely on optimization within tcg_gen_gvec_or === OUTPUT BEGIN === 1/12 Checking commit fc0b82c4ab74 (target/arm: Rely on optimization within tcg_gen_gvec_or) 2/12 Checking commit d86eaa379343 (target/arm: Use vector minmax expanders for aarch64) 3/12 Checking commit 8ff6492f628b (target/arm: Use vector minmax expanders for aarch32) 4/12 Checking commit fc636d7cdc94 (target/arm: Use tcg integer min/max primitives for neon) 5/12 Checking commit 780dc13c0ef9 (target/arm: Remove neon min/max helpers) 6/12 Checking commit 23628c03ad1b (target/arm: Fix vfp_gdb_get/set_reg vs FPSCR) ERROR: trailing statements should be on next line #22: FILE: target/arm/helper.c:84: + case 1: stl_p(buf, vfp_get_fpscr(env)); return 4; ERROR: trailing statements should be on next line #31: FILE: target/arm/helper.c:110: + case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4; total: 2 errors, 0 warnings, 16 lines checked Patch 6/12 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 7/12 Checking commit 41e8694369f0 (target/arm: Fix arm_cpu_dump_state vs FPSCR) 8/12 Checking commit 3ee1da71e10e (target/arm: Split out flags setting from vfp compares) 9/12 Checking commit 40b552d5049b (target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]) 10/12 Checking commit 0e4e921a93e3 (target/arm: Split out FPSCR.QC to a vector field) 11/12 Checking commit 6e1ff3edb001 (target/arm: Use vector operations for saturation) ERROR: spaces required around that '*' (ctx:WxV) #357: FILE: target/arm/vec_helper.c:774: + TYPEN *d = vd, *n = vn; TYPEM *m = vm; \ ^ total: 1 errors, 0 warnings, 438 lines checked Patch 11/12 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 12/12 Checking commit 14ab3e4b2dd0 (target/arm: Add missing clear_tail calls) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20190209033847.9014-1-richard.henderson@linaro.org/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (12 preceding siblings ...) 2019-02-09 3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply @ 2019-02-09 4:00 ` no-reply 2019-02-14 16:12 ` Peter Maydell 14 siblings, 0 replies; 16+ messages in thread From: no-reply @ 2019-02-09 4:00 UTC (permalink / raw) To: richard.henderson; +Cc: fam, qemu-devel, peter.maydell Patchew URL: https://patchew.org/QEMU/20190209033847.9014-1-richard.henderson@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Message-id: 20190209033847.9014-1-richard.henderson@linaro.org Type: series === TEST SCRIPT BEGIN === #!/bin/bash git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20190209033847.9014-1-richard.henderson@linaro.org -> patchew/20190209033847.9014-1-richard.henderson@linaro.org Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for path 'capstone' Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc' Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) registered for path 'roms/QemuMacDrivers' Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 'roms/SLOF' Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 'roms/ipxe' Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered for path 'roms/openbios' Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) registered for path 'roms/openhackware' Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) registered for path 'roms/qemu-palcode' Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for path 'roms/seabios' Submodule 'roms/seabios-hppa' (https://github.com/hdeller/seabios-hppa.git) registered for path 'roms/seabios-hppa' Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for path 'roms/sgabios' Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for path 'roms/skiboot' Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for path 'roms/u-boot' Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) registered for path 'roms/u-boot-sam460ex' Submodule 'tests/fp/berkeley-softfloat-3' (https://github.com/cota/berkeley-softfloat-3) registered for path 'tests/fp/berkeley-softfloat-3' Submodule 'tests/fp/berkeley-testfloat-3' (https://github.com/cota/berkeley-testfloat-3) registered for path 'tests/fp/berkeley-testfloat-3' Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into 'capstone'... Submodule path 'capstone': checked out '22ead3e0bfdb87516656453336160e0a37b066bf' Cloning into 'dtc'... Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536' Cloning into 'roms/QemuMacDrivers'... Submodule path 'roms/QemuMacDrivers': checked out '90c488d5f4a407342247b9ea869df1c2d9c8e266' Cloning into 'roms/SLOF'... Submodule path 'roms/SLOF': checked out 'a5b428e1c1eae703bdd62a3f527223c291ee3fdc' Cloning into 'roms/ipxe'... Submodule path 'roms/ipxe': checked out 'de4565cbe76ea9f7913a01f331be3ee901bb6e17' Cloning into 'roms/openbios'... Submodule path 'roms/openbios': checked out '441a84d3a642a10b948369c63f32367e8ff6395b' Cloning into 'roms/openhackware'... Submodule path 'roms/openhackware': checked out 'c559da7c8eec5e45ef1f67978827af6f0b9546f5' Cloning into 'roms/qemu-palcode'... Submodule path 'roms/qemu-palcode': checked out '51c237d7e20d05100eacadee2f61abc17e6bc097' Cloning into 'roms/seabios'... Submodule path 'roms/seabios': checked out 'a698c8995ffb2838296ec284fe3c4ad33dfca307' Cloning into 'roms/seabios-hppa'... Submodule path 'roms/seabios-hppa': checked out '1ef99a01572c2581c30e16e6fe69e9ea2ef92ce0' Cloning into 'roms/sgabios'... Submodule path 'roms/sgabios': checked out 'cbaee52287e5f32373181cff50a00b6c4ac9015a' Cloning into 'roms/skiboot'... Submodule path 'roms/skiboot': checked out 'e0ee24c27a172bcf482f6f2bc905e6211c134bcc' Cloning into 'roms/u-boot'... Submodule path 'roms/u-boot': checked out 'd85ca029f257b53a96da6c2fb421e78a003a9943' Cloning into 'roms/u-boot-sam460ex'... Submodule path 'roms/u-boot-sam460ex': checked out '60b3916f33e617a815973c5a6df77055b2e3a588' Cloning into 'tests/fp/berkeley-softfloat-3'... Submodule path 'tests/fp/berkeley-softfloat-3': checked out 'b64af41c3276f97f0e181920400ee056b9c88037' Cloning into 'tests/fp/berkeley-testfloat-3'... Submodule path 'tests/fp/berkeley-testfloat-3': checked out '5a59dcec19327396a011a17fd924aed4fec416b3' Cloning into 'ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' Switched to a new branch 'test' 14ab3e4 target/arm: Add missing clear_tail calls 6e1ff3e target/arm: Use vector operations for saturation 0e4e921 target/arm: Split out FPSCR.QC to a vector field 40b552d target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] 3ee1da7 target/arm: Split out flags setting from vfp compares 41e8694 target/arm: Fix arm_cpu_dump_state vs FPSCR 23628c0 target/arm: Fix vfp_gdb_get/set_reg vs FPSCR 780dc13 target/arm: Remove neon min/max helpers fc636d7 target/arm: Use tcg integer min/max primitives for neon 8ff6492 target/arm: Use vector minmax expanders for aarch32 d86eaa3 target/arm: Use vector minmax expanders for aarch64 fc0b82c target/arm: Rely on optimization within tcg_gen_gvec_or === OUTPUT BEGIN === 1/12 Checking commit fc0b82c4ab74 (target/arm: Rely on optimization within tcg_gen_gvec_or) 2/12 Checking commit d86eaa379343 (target/arm: Use vector minmax expanders for aarch64) 3/12 Checking commit 8ff6492f628b (target/arm: Use vector minmax expanders for aarch32) 4/12 Checking commit fc636d7cdc94 (target/arm: Use tcg integer min/max primitives for neon) 5/12 Checking commit 780dc13c0ef9 (target/arm: Remove neon min/max helpers) 6/12 Checking commit 23628c03ad1b (target/arm: Fix vfp_gdb_get/set_reg vs FPSCR) ERROR: trailing statements should be on next line #22: FILE: target/arm/helper.c:84: + case 1: stl_p(buf, vfp_get_fpscr(env)); return 4; ERROR: trailing statements should be on next line #31: FILE: target/arm/helper.c:110: + case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4; total: 2 errors, 0 warnings, 16 lines checked Patch 6/12 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 7/12 Checking commit 41e8694369f0 (target/arm: Fix arm_cpu_dump_state vs FPSCR) 8/12 Checking commit 3ee1da71e10e (target/arm: Split out flags setting from vfp compares) 9/12 Checking commit 40b552d5049b (target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]) 10/12 Checking commit 0e4e921a93e3 (target/arm: Split out FPSCR.QC to a vector field) 11/12 Checking commit 6e1ff3edb001 (target/arm: Use vector operations for saturation) ERROR: spaces required around that '*' (ctx:WxV) #357: FILE: target/arm/vec_helper.c:774: + TYPEN *d = vd, *n = vn; TYPEM *m = vm; \ ^ total: 1 errors, 0 warnings, 438 lines checked Patch 11/12 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 12/12 Checking commit 14ab3e4b2dd0 (target/arm: Add missing clear_tail calls) === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20190209033847.9014-1-richard.henderson@linaro.org/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson ` (13 preceding siblings ...) 2019-02-09 4:00 ` no-reply @ 2019-02-14 16:12 ` Peter Maydell 14 siblings, 0 replies; 16+ messages in thread From: Peter Maydell @ 2019-02-14 16:12 UTC (permalink / raw) To: Richard Henderson; +Cc: QEMU Developers On Sat, 9 Feb 2019 at 03:38, Richard Henderson <richard.henderson@linaro.org> wrote: > > Changes since v2: > * Fix some representational issues with FPSCR. > * Use host vector saturation for SQADD/UQADD. > This requires changing the internal representation of FPSR.QC. > * Fix a latent vector bug, noticed during the rest. > > Correct RISU results depend on Mark C-A's patch from today, > "tcg/i386: fix unsigned vector saturating arithmetic", > which will be in my next tcg pull. > Applied to target-arm.next, thanks. (That tcg fix is in master now.) -- PMM ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-02-14 16:12 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-02-09 3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson 2019-02-09 3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson 2019-02-09 3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply 2019-02-09 4:00 ` no-reply 2019-02-14 16:12 ` Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).