* [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b)
@ 2024-05-28 20:30 Richard Henderson
2024-05-28 20:30 ` [PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI Richard Henderson
` (33 more replies)
0 siblings, 34 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Changes for v3:
* Reword prefetch unpredictable patch.
* Validate vector length when qc is an implied operand.
* Adjust some legacy decode based on review.
* Apply r-b.
Patches needing review:
01-target-arm-Diagnose-UNPREDICTABLE-operands-to-PLD.patch
03-target-arm-Assert-oprsz-in-range-when-using-vfp.q.patch
04-target-arm-Convert-SUQADD-and-USQADD-to-gvec.patch
10-target-arm-Convert-SRSHL-and-URSHL-register-to-gv.patch
12-target-arm-Convert-SQSHL-and-UQSHL-register-to-gv.patch
31-target-arm-Convert-SQDMULH-SQRDMULH-to-decodetree.patch
32-target-arm-Convert-FMADD-FMSUB-FNMADD-FNMSUB-to-d.patch
r~
Richard Henderson (33):
target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI
target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB
target/arm: Assert oprsz in range when using vfp.qc
target/arm: Convert SUQADD and USQADD to gvec
target/arm: Inline scalar SUQADD and USQADD
target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB
target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree
target/arm: Convert SUQADD, USQADD to decodetree
target/arm: Convert SSHL, USHL to decodetree
target/arm: Convert SRSHL and URSHL (register) to gvec
target/arm: Convert SRSHL, URSHL to decodetree
target/arm: Convert SQSHL and UQSHL (register) to gvec
target/arm: Convert SQSHL, UQSHL to decodetree
target/arm: Convert SQRSHL and UQRSHL (register) to gvec
target/arm: Convert SQRSHL, UQRSHL to decodetree
target/arm: Convert ADD, SUB (vector) to decodetree
target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ to decodetree
target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32,i64}
target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec
target/arm: Convert SHADD, UHADD to gvec
target/arm: Convert SHADD, UHADD to decodetree
target/arm: Convert SHSUB, UHSUB to gvec
target/arm: Convert SHSUB, UHSUB to decodetree
target/arm: Convert SRHADD, URHADD to gvec
target/arm: Convert SRHADD, URHADD to decodetree
target/arm: Convert SMAX, SMIN, UMAX, UMIN to decodetree
target/arm: Convert SABA, SABD, UABA, UABD to decodetree
target/arm: Convert MUL, PMUL to decodetree
target/arm: Convert MLA, MLS to decodetree
target/arm: Tidy SQDMULH, SQRDMULH (vector)
target/arm: Convert SQDMULH, SQRDMULH to decodetree
target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB to decodetree
target/arm: Convert FCSEL to decodetree
target/arm/helper.h | 96 ++-
target/arm/tcg/translate-a64.h | 14 +
target/arm/tcg/translate.h | 44 +
target/arm/tcg/a32-uncond.decode | 8 +-
target/arm/tcg/a64.decode | 115 +++
target/arm/tcg/neon-dp.decode | 37 +-
target/arm/tcg/t32.decode | 7 +-
target/arm/tcg/gengvec.c | 689 +++++++++++++++-
target/arm/tcg/gengvec64.c | 181 ++++
target/arm/tcg/neon_helper.c | 506 +++---------
target/arm/tcg/translate-a64.c | 1321 ++++++++++--------------------
target/arm/tcg/translate-neon.c | 118 +--
target/arm/tcg/translate.c | 58 ++
target/arm/tcg/vec_helper.c | 128 +++
14 files changed, 1829 insertions(+), 1493 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 02/33] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB Richard Henderson
` (32 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
For all, rm == 15 is UNPREDICTABLE.
Prior to v8, thumb with rm == 13 is UNPREDICTABLE.
For PLDW, rn == 15 is UNPREDICTABLE.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a32-uncond.decode | 8 +++--
target/arm/tcg/t32.decode | 7 ++--
target/arm/tcg/translate.c | 58 ++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 6 deletions(-)
diff --git a/target/arm/tcg/a32-uncond.decode b/target/arm/tcg/a32-uncond.decode
index 2339de2e94..e1b1780d37 100644
--- a/target/arm/tcg/a32-uncond.decode
+++ b/target/arm/tcg/a32-uncond.decode
@@ -24,7 +24,9 @@
&empty !extern
&i !extern imm
+&r !extern rm
&setend E
+&nm rn rm
# Branch with Link and Exchange
@@ -61,9 +63,9 @@ PLD 1111 0101 -101 ---- 1111 ---- ---- ---- # (imm, lit) 5te
PLDW 1111 0101 -001 ---- 1111 ---- ---- ---- # (imm, lit) 7mp
PLI 1111 0100 -101 ---- 1111 ---- ---- ---- # (imm, lit) 7
-PLD 1111 0111 -101 ---- 1111 ----- -- 0 ---- # (register) 5te
-PLDW 1111 0111 -001 ---- 1111 ----- -- 0 ---- # (register) 7mp
-PLI 1111 0110 -101 ---- 1111 ----- -- 0 ---- # (register) 7
+PLD_rr 1111 0111 -101 ---- 1111 ----- -- 0 rm:4 &r
+PLDW_rr 1111 0111 -001 rn:4 1111 ----- -- 0 rm:4 &nm
+PLI_rr 1111 0110 -101 ---- 1111 ----- -- 0 rm:4 &r
# Unallocated memory hints
#
diff --git a/target/arm/tcg/t32.decode b/target/arm/tcg/t32.decode
index d327178829..1ec12442a4 100644
--- a/target/arm/tcg/t32.decode
+++ b/target/arm/tcg/t32.decode
@@ -28,6 +28,7 @@
&rrr_rot !extern rd rn rm rot
&rrr !extern rd rn rm
&rr !extern rd rm
+&nm !extern rn rm
&ri !extern rd imm
&r !extern rm
&i !extern imm
@@ -472,7 +473,7 @@ STR_ri 1111 1000 1100 .... .... ............ @ldst_ri_pos
}
LDRBT_ri 1111 1000 0001 .... .... 1110 ........ @ldst_ri_unp
{
- PLD 1111 1000 0001 ---- 1111 000000 -- ---- # (register)
+ PLD_rr 1111 1000 0001 ---- 1111 000000 -- rm:4 &r
LDRB_rr 1111 1000 0001 .... .... 000000 .. .... @ldst_rr
}
}
@@ -492,7 +493,7 @@ STR_ri 1111 1000 1100 .... .... ............ @ldst_ri_pos
}
LDRHT_ri 1111 1000 0011 .... .... 1110 ........ @ldst_ri_unp
{
- PLDW 1111 1000 0011 ---- 1111 000000 -- ---- # (register)
+ PLDW_rr 1111 1000 0011 rn:4 1111 000000 -- rm:4 &nm
LDRH_rr 1111 1000 0011 .... .... 000000 .. .... @ldst_rr
}
}
@@ -520,7 +521,7 @@ STR_ri 1111 1000 1100 .... .... ............ @ldst_ri_pos
}
LDRSBT_ri 1111 1001 0001 .... .... 1110 ........ @ldst_ri_unp
{
- PLI 1111 1001 0001 ---- 1111 000000 -- ---- # (register)
+ PLI_rr 1111 1001 0001 ---- 1111 000000 -- rm:4 &r
LDRSB_rr 1111 1001 0001 .... .... 000000 .. .... @ldst_rr
}
}
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index c5bc691d92..16b8609ec0 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -7187,6 +7187,64 @@ static bool trans_PLI(DisasContext *s, arg_PLI *a)
return ENABLE_ARCH_7;
}
+/* Check for UNPREDICTABLE rm for prefetch (register). */
+static bool prefetch_check_m(DisasContext *s, int rm)
+{
+ switch (rm) {
+ case 13:
+ /* SP allowed in v8 or with A1 encoding; rejected with T1. */
+ return ENABLE_ARCH_8 || !s->thumb;
+ case 15:
+ /* PC always rejected. */
+ return false;
+ default:
+ return true;
+ }
+}
+
+static bool trans_PLD_rr(DisasContext *s, arg_PLD_rr *a)
+{
+ if (!ENABLE_ARCH_5TE) {
+ return false;
+ }
+ /* Choose UNDEF for UNPREDICTABLE rm. */
+ if (!prefetch_check_m(s, a->rm)) {
+ unallocated_encoding(s);
+ }
+ return true;
+}
+
+static bool trans_PLDW_rr(DisasContext *s, arg_PLDW_rr *a)
+{
+ if (!arm_dc_feature(s, ARM_FEATURE_V7MP)) {
+ return false;
+ }
+ /*
+ * For A1, rn == 15 is UNPREDICTABLE.
+ * For T1, rn == 15 is PLD (literal), and already matched.
+ * Choose UNDEF for UNPREDICTABLE rn or rm.
+ */
+ if (a->rn == 15) {
+ assert(!s->thumb);
+ } else if (prefetch_check_m(s, a->rm)) {
+ return true;
+ }
+ unallocated_encoding(s);
+ return true;
+}
+
+static bool trans_PLI_rr(DisasContext *s, arg_PLI_rr *a)
+{
+ if (!ENABLE_ARCH_7) {
+ return false;
+ }
+ /* Choose UNDEF for UNPREDICTABLE rm. */
+ if (!prefetch_check_m(s, a->rm)) {
+ unallocated_encoding(s);
+ }
+ return true;
+}
+
/*
* If-then
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 02/33] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
2024-05-28 20:30 ` [PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc Richard Henderson
` (31 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
No need for a full comparison; xor produces non-zero bits
for QC just fine.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/gengvec.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 22c9d17dce..bfe6885a01 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1217,21 +1217,21 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
-static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
TCGv_vec x = tcg_temp_new_vec_matching(t);
tcg_gen_add_vec(vece, x, a, b);
tcg_gen_usadd_vec(vece, t, a, b);
- tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
- tcg_gen_or_vec(vece, sat, sat, x);
+ tcg_gen_xor_vec(vece, x, x, t);
+ tcg_gen_or_vec(vece, qc, qc, x);
}
void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
{
static const TCGOpcode vecop_list[] = {
- INDEX_op_usadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+ INDEX_op_usadd_vec, INDEX_op_add_vec, 0
};
static const GVecGen4 ops[4] = {
{ .fniv = gen_uqadd_vec,
@@ -1259,21 +1259,21 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
-static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
TCGv_vec x = tcg_temp_new_vec_matching(t);
tcg_gen_add_vec(vece, x, a, b);
tcg_gen_ssadd_vec(vece, t, a, b);
- tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
- tcg_gen_or_vec(vece, sat, sat, x);
+ tcg_gen_xor_vec(vece, x, x, t);
+ tcg_gen_or_vec(vece, qc, qc, x);
}
void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
{
static const TCGOpcode vecop_list[] = {
- INDEX_op_ssadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+ INDEX_op_ssadd_vec, INDEX_op_add_vec, 0
};
static const GVecGen4 ops[4] = {
{ .fniv = gen_sqadd_vec,
@@ -1301,21 +1301,21 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
-static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
TCGv_vec x = tcg_temp_new_vec_matching(t);
tcg_gen_sub_vec(vece, x, a, b);
tcg_gen_ussub_vec(vece, t, a, b);
- tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
- tcg_gen_or_vec(vece, sat, sat, x);
+ tcg_gen_xor_vec(vece, x, x, t);
+ tcg_gen_or_vec(vece, qc, qc, x);
}
void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
{
static const TCGOpcode vecop_list[] = {
- INDEX_op_ussub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+ INDEX_op_ussub_vec, INDEX_op_sub_vec, 0
};
static const GVecGen4 ops[4] = {
{ .fniv = gen_uqsub_vec,
@@ -1343,21 +1343,21 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
-static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
TCGv_vec x = tcg_temp_new_vec_matching(t);
tcg_gen_sub_vec(vece, x, a, b);
tcg_gen_sssub_vec(vece, t, a, b);
- tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
- tcg_gen_or_vec(vece, sat, sat, x);
+ tcg_gen_xor_vec(vece, x, x, t);
+ tcg_gen_or_vec(vece, qc, qc, x);
}
void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
{
static const TCGOpcode vecop_list[] = {
- INDEX_op_sssub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+ INDEX_op_sssub_vec, INDEX_op_sub_vec, 0
};
static const GVecGen4 ops[4] = {
{ .fniv = gen_sqsub_vec,
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
2024-05-28 20:30 ` [PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI Richard Henderson
2024-05-28 20:30 ` [PATCH v3 02/33] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:14 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec Richard Henderson
` (30 subsequent siblings)
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/gengvec.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index bfe6885a01..3e2d3c21a1 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -29,6 +29,7 @@ static void gen_gvec_fn3_qc(uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs,
{
TCGv_ptr qc_ptr = tcg_temp_new_ptr();
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
tcg_gen_addi_ptr(qc_ptr, tcg_env, offsetof(CPUARMState, vfp.qc));
tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, qc_ptr,
opr_sz, max_sz, 0, fn);
@@ -1255,6 +1256,8 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.opt_opc = vecop_list,
.vece = MO_64 },
};
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
@@ -1297,6 +1300,8 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_64 },
};
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
@@ -1339,6 +1344,8 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_64 },
};
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
@@ -1381,6 +1388,8 @@ void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_64 },
};
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (2 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:18 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 05/33] target/arm: Inline scalar SUQADD and USQADD Richard Henderson
` (29 subsequent siblings)
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 16 +++++
target/arm/tcg/translate-a64.h | 6 ++
target/arm/tcg/gengvec64.c | 110 ++++++++++++++++++++++++++++++++
target/arm/tcg/translate-a64.c | 113 ++++++++++++++-------------------
target/arm/tcg/vec_helper.c | 64 +++++++++++++++++++
5 files changed, 245 insertions(+), 64 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index f830531dd3..de2c5c9aef 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -836,6 +836,22 @@ DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_d, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_d, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(gvec_fmlal_a32, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index 91750f0ca9..b5cb26f8a2 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -197,6 +197,12 @@ void gen_gvec_eor3(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
+void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
+ uint32_t rn_ofs, uint32_t rm_ofs,
+ uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
+ uint32_t rn_ofs, uint32_t rm_ofs,
+ uint32_t opr_sz, uint32_t max_sz);
void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm);
void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm);
diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c
index 093b498b13..b3afabd38b 100644
--- a/target/arm/tcg/gengvec64.c
+++ b/target/arm/tcg/gengvec64.c
@@ -188,3 +188,113 @@ void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &op);
}
+static void gen_suqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
+ TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec max =
+ tcg_constant_vec_matching(t, vece, (1ull << ((8 << vece) - 1)) - 1);
+ TCGv_vec u = tcg_temp_new_vec_matching(t);
+
+ /* Maximum value that can be added to @a without overflow. */
+ tcg_gen_sub_vec(vece, u, max, a);
+
+ /* Constrain addend so that the next addition never overflows. */
+ tcg_gen_umin_vec(vece, u, u, b);
+ tcg_gen_add_vec(vece, t, u, a);
+
+ /* Compute QC by comparing the adjusted @b. */
+ tcg_gen_xor_vec(vece, u, u, b);
+ tcg_gen_or_vec(vece, qc, qc, u);
+}
+
+void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
+ uint32_t rn_ofs, uint32_t rm_ofs,
+ uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_add_vec, INDEX_op_sub_vec, INDEX_op_umin_vec, 0
+ };
+ static const GVecGen4 ops[4] = {
+ { .fniv = gen_suqadd_vec,
+ .fno = gen_helper_gvec_suqadd_b,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_8 },
+ { .fniv = gen_suqadd_vec,
+ .fno = gen_helper_gvec_suqadd_h,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_16 },
+ { .fniv = gen_suqadd_vec,
+ .fno = gen_helper_gvec_suqadd_s,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_32 },
+ { .fniv = gen_suqadd_vec,
+ .fno = gen_helper_gvec_suqadd_d,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_64 },
+ };
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
+ tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+ rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
+}
+
+static void gen_usqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
+ TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec u = tcg_temp_new_vec_matching(t);
+ TCGv_vec z = tcg_constant_vec_matching(t, vece, 0);
+
+ /* Compute unsigned saturation of add for +b and sub for -b. */
+ tcg_gen_neg_vec(vece, t, b);
+ tcg_gen_usadd_vec(vece, u, a, b);
+ tcg_gen_ussub_vec(vece, t, a, t);
+
+ /* Select the correct result depending on the sign of b. */
+ tcg_gen_cmpsel_vec(TCG_COND_LT, vece, t, b, z, t, u);
+
+ /* Compute QC by comparing against the non-saturated result. */
+ tcg_gen_add_vec(vece, u, a, b);
+ tcg_gen_xor_vec(vece, u, u, t);
+ tcg_gen_or_vec(vece, qc, qc, u);
+}
+
+void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
+ uint32_t rn_ofs, uint32_t rm_ofs,
+ uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_neg_vec, INDEX_op_add_vec,
+ INDEX_op_usadd_vec, INDEX_op_ussub_vec,
+ INDEX_op_cmpsel_vec, 0
+ };
+ static const GVecGen4 ops[4] = {
+ { .fniv = gen_usqadd_vec,
+ .fno = gen_helper_gvec_usqadd_b,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_8 },
+ { .fniv = gen_usqadd_vec,
+ .fno = gen_helper_gvec_usqadd_h,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_16 },
+ { .fniv = gen_usqadd_vec,
+ .fno = gen_helper_gvec_usqadd_s,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_32 },
+ { .fniv = gen_usqadd_vec,
+ .fno = gen_helper_gvec_usqadd_d,
+ .opt_opc = vecop_list,
+ .write_aofs = true,
+ .vece = MO_64 },
+ };
+
+ tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
+ tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+ rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
+}
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9167e4d0bd..9f948e033e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9983,83 +9983,68 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
/* Remaining saturating accumulating ops */
static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
- bool is_q, int size, int rn, int rd)
+ bool is_q, unsigned size, int rn, int rd)
{
- bool is_double = (size == 3);
+ if (!is_scalar) {
+ gen_gvec_fn3(s, is_q, rd, rd, rn,
+ is_u ? gen_gvec_usqadd_qc : gen_gvec_suqadd_qc, size);
+ return;
+ }
- if (is_double) {
+ if (size == 3) {
TCGv_i64 tcg_rn = tcg_temp_new_i64();
TCGv_i64 tcg_rd = tcg_temp_new_i64();
- int pass;
- for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) {
- read_vec_element(s, tcg_rn, rn, pass, MO_64);
- read_vec_element(s, tcg_rd, rd, pass, MO_64);
+ read_vec_element(s, tcg_rn, rn, 0, MO_64);
+ read_vec_element(s, tcg_rd, rd, 0, MO_64);
- if (is_u) { /* USQADD */
- gen_helper_neon_uqadd_s64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- } else { /* SUQADD */
- gen_helper_neon_sqadd_u64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- }
- write_vec_element(s, tcg_rd, rd, pass, MO_64);
+ if (is_u) { /* USQADD */
+ gen_helper_neon_uqadd_s64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ } else { /* SUQADD */
+ gen_helper_neon_sqadd_u64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
}
- clear_vec_high(s, !is_scalar, rd);
+ write_vec_element(s, tcg_rd, rd, 0, MO_64);
+ clear_vec_high(s, false, rd);
} else {
TCGv_i32 tcg_rn = tcg_temp_new_i32();
TCGv_i32 tcg_rd = tcg_temp_new_i32();
- int pass, maxpasses;
- if (is_scalar) {
- maxpasses = 1;
- } else {
- maxpasses = is_q ? 4 : 2;
+ read_vec_element_i32(s, tcg_rn, rn, 0, size);
+ read_vec_element_i32(s, tcg_rd, rd, 0, size);
+
+ if (is_u) { /* USQADD */
+ switch (size) {
+ case 0:
+ gen_helper_neon_uqadd_s8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ case 1:
+ gen_helper_neon_uqadd_s16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ case 2:
+ gen_helper_neon_uqadd_s32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ } else { /* SUQADD */
+ switch (size) {
+ case 0:
+ gen_helper_neon_sqadd_u8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ case 1:
+ gen_helper_neon_sqadd_u16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ case 2:
+ gen_helper_neon_sqadd_u32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ break;
+ default:
+ g_assert_not_reached();
+ }
}
- for (pass = 0; pass < maxpasses; pass++) {
- if (is_scalar) {
- read_vec_element_i32(s, tcg_rn, rn, pass, size);
- read_vec_element_i32(s, tcg_rd, rd, pass, size);
- } else {
- read_vec_element_i32(s, tcg_rn, rn, pass, MO_32);
- read_vec_element_i32(s, tcg_rd, rd, pass, MO_32);
- }
-
- if (is_u) { /* USQADD */
- switch (size) {
- case 0:
- gen_helper_neon_uqadd_s8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 1:
- gen_helper_neon_uqadd_s16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 2:
- gen_helper_neon_uqadd_s32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- default:
- g_assert_not_reached();
- }
- } else { /* SUQADD */
- switch (size) {
- case 0:
- gen_helper_neon_sqadd_u8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 1:
- gen_helper_neon_sqadd_u16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 2:
- gen_helper_neon_sqadd_u32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- default:
- g_assert_not_reached();
- }
- }
-
- if (is_scalar) {
- write_vec_element(s, tcg_constant_i64(0), rd, 0, MO_64);
- }
- write_vec_element_i32(s, tcg_rd, rd, pass, MO_32);
- }
- clear_vec_high(s, is_q, rd);
+ write_vec_element(s, tcg_constant_i64(0), rd, 0, MO_64);
+ write_vec_element_i32(s, tcg_rd, rd, 0, MO_32);
+ clear_vec_high(s, false, rd);
}
}
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 56fea14edb..d8e96386be 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1555,6 +1555,14 @@ DO_SAT(gvec_sqsub_b, int, int8_t, int8_t, -, INT8_MIN, INT8_MAX)
DO_SAT(gvec_sqsub_h, int, int16_t, int16_t, -, INT16_MIN, INT16_MAX)
DO_SAT(gvec_sqsub_s, int64_t, int32_t, int32_t, -, INT32_MIN, INT32_MAX)
+DO_SAT(gvec_usqadd_b, int, uint8_t, int8_t, +, 0, UINT8_MAX)
+DO_SAT(gvec_usqadd_h, int, uint16_t, int16_t, +, 0, UINT16_MAX)
+DO_SAT(gvec_usqadd_s, int64_t, uint32_t, int32_t, +, 0, UINT32_MAX)
+
+DO_SAT(gvec_suqadd_b, int, int8_t, uint8_t, +, INT8_MIN, INT8_MAX)
+DO_SAT(gvec_suqadd_h, int, int16_t, uint16_t, +, INT16_MIN, INT16_MAX)
+DO_SAT(gvec_suqadd_s, int64_t, int32_t, uint32_t, +, INT32_MIN, INT32_MAX)
+
#undef DO_SAT
void HELPER(gvec_uqadd_d)(void *vd, void *vq, void *vn,
@@ -1645,6 +1653,62 @@ void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn,
clear_tail(d, oprsz, simd_maxsz(desc));
}
+void HELPER(gvec_usqadd_d)(void *vd, void *vq, void *vn,
+ void *vm, uint32_t desc)
+{
+ intptr_t i, oprsz = simd_oprsz(desc);
+ uint64_t *d = vd, *n = vn, *m = vm;
+ bool q = false;
+
+ for (i = 0; i < oprsz / 8; i++) {
+ uint64_t nn = n[i];
+ int64_t mm = m[i];
+ uint64_t dd = nn + mm;
+
+ if (mm < 0) {
+ if (nn < (uint64_t)-mm) {
+ dd = 0;
+ q = true;
+ }
+ } else {
+ if (dd < nn) {
+ dd = UINT64_MAX;
+ q = true;
+ }
+ }
+ d[i] = dd;
+ }
+ if (q) {
+ uint32_t *qc = vq;
+ qc[0] = 1;
+ }
+ clear_tail(d, oprsz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_suqadd_d)(void *vd, void *vq, void *vn,
+ void *vm, uint32_t desc)
+{
+ intptr_t i, oprsz = simd_oprsz(desc);
+ uint64_t *d = vd, *n = vn, *m = vm;
+ bool q = false;
+
+ for (i = 0; i < oprsz / 8; i++) {
+ int64_t nn = n[i];
+ uint64_t mm = m[i];
+ int64_t dd = nn + mm;
+
+ if (mm > (uint64_t)(INT64_MAX - nn)) {
+ dd = INT64_MAX;
+ q = true;
+ }
+ d[i] = dd;
+ }
+ if (q) {
+ uint32_t *qc = vq;
+ qc[0] = 1;
+ }
+ clear_tail(d, oprsz, simd_maxsz(desc));
+}
#define DO_SRA(NAME, TYPE) \
void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 05/33] target/arm: Inline scalar SUQADD and USQADD
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (3 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 06/33] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB Richard Henderson
` (28 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
This eliminates the last uses of these neon helpers.
Incorporate the MO_64 expanders as an option to the vector expander.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 8 --
target/arm/tcg/translate-a64.h | 8 ++
target/arm/tcg/gengvec64.c | 71 ++++++++++++++
target/arm/tcg/neon_helper.c | 165 ---------------------------------
target/arm/tcg/translate-a64.c | 73 +++++----------
5 files changed, 103 insertions(+), 222 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index de2c5c9aef..c76158d6d3 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -274,14 +274,6 @@ DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32)
DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s64, TCG_CALL_NO_RWG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_3(neon_sqadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u64, TCG_CALL_NO_RWG, i64, env, i64, i64)
DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32)
DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32)
DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32)
diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index b5cb26f8a2..0fcf7cb63a 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -197,9 +197,17 @@ void gen_gvec_eor3(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
+
+void gen_suqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_suqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
uint32_t rn_ofs, uint32_t rm_ofs,
uint32_t opr_sz, uint32_t max_sz);
+
+void gen_usqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_usqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
uint32_t rn_ofs, uint32_t rm_ofs,
uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c
index b3afabd38b..2617cde0a5 100644
--- a/target/arm/tcg/gengvec64.c
+++ b/target/arm/tcg/gengvec64.c
@@ -188,6 +188,38 @@ void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &op);
}
+/*
+ * Set @res to the correctly saturated result.
+ * Set @qc non-zero if saturation occured.
+ */
+void gen_suqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ TCGv_i64 max = tcg_constant_i64((1ull << ((8 << esz) - 1)) - 1);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(t, a, b);
+ tcg_gen_smin_i64(res, t, max);
+ tcg_gen_xor_i64(t, t, res);
+ tcg_gen_or_i64(qc, qc, t);
+}
+
+void gen_suqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 max = tcg_constant_i64(INT64_MAX);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ /* Maximum value that can be added to @a without overflow. */
+ tcg_gen_sub_i64(t, max, a);
+
+ /* Constrain addend so that the next addition never overflows. */
+ tcg_gen_umin_i64(t, t, b);
+ tcg_gen_add_i64(res, a, t);
+
+ tcg_gen_xor_i64(t, t, b);
+ tcg_gen_or_i64(qc, qc, t);
+}
+
static void gen_suqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -231,6 +263,7 @@ void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
.write_aofs = true,
.vece = MO_32 },
{ .fniv = gen_suqadd_vec,
+ .fni8 = gen_suqadd_d,
.fno = gen_helper_gvec_suqadd_d,
.opt_opc = vecop_list,
.write_aofs = true,
@@ -242,6 +275,43 @@ void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_usqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ TCGv_i64 max = tcg_constant_i64(MAKE_64BIT_MASK(0, 8 << esz));
+ TCGv_i64 zero = tcg_constant_i64(0);
+ TCGv_i64 tmp = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(tmp, a, b);
+ tcg_gen_smin_i64(res, tmp, max);
+ tcg_gen_smax_i64(res, res, zero);
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_usqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 tmp = tcg_temp_new_i64();
+ TCGv_i64 tneg = tcg_temp_new_i64();
+ TCGv_i64 tpos = tcg_temp_new_i64();
+ TCGv_i64 max = tcg_constant_i64(UINT64_MAX);
+ TCGv_i64 zero = tcg_constant_i64(0);
+
+ tcg_gen_add_i64(tmp, a, b);
+
+ /* If @b is positive, saturate if (a + b) < a, aka unsigned overflow. */
+ tcg_gen_movcond_i64(TCG_COND_LTU, tpos, tmp, a, max, tmp);
+
+ /* If @b is negative, saturate if a < -b, ie subtraction is negative. */
+ tcg_gen_neg_i64(tneg, b);
+ tcg_gen_movcond_i64(TCG_COND_LTU, tneg, a, tneg, zero, tmp);
+
+ /* Select correct result from sign of @b. */
+ tcg_gen_movcond_i64(TCG_COND_LT, res, b, zero, tneg, tpos);
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
static void gen_usqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -288,6 +358,7 @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
.write_aofs = true,
.vece = MO_32 },
{ .fniv = gen_usqadd_vec,
+ .fni8 = gen_usqadd_d,
.fno = gen_helper_gvec_usqadd_d,
.opt_opc = vecop_list,
.write_aofs = true,
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index a0b51c8809..9505a5fd18 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -236,171 +236,6 @@ uint64_t HELPER(neon_qadd_s64)(CPUARMState *env, uint64_t src1, uint64_t src2)
return res;
}
-/* Unsigned saturating accumulate of signed value
- *
- * Op1/Rn is treated as signed
- * Op2/Rd is treated as unsigned
- *
- * Explicit casting is used to ensure the correct sign extension of
- * inputs. The result is treated as a unsigned value and saturated as such.
- *
- * We use a macro for the 8/16 bit cases which expects signed integers of va,
- * vb, and vr for interim calculation and an unsigned 32 bit result value r.
- */
-
-#define USATACC(bits, shift) \
- do { \
- va = sextract32(a, shift, bits); \
- vb = extract32(b, shift, bits); \
- vr = va + vb; \
- if (vr > UINT##bits##_MAX) { \
- SET_QC(); \
- vr = UINT##bits##_MAX; \
- } else if (vr < 0) { \
- SET_QC(); \
- vr = 0; \
- } \
- r = deposit32(r, shift, bits, vr); \
- } while (0)
-
-uint32_t HELPER(neon_uqadd_s8)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int16_t va, vb, vr;
- uint32_t r = 0;
-
- USATACC(8, 0);
- USATACC(8, 8);
- USATACC(8, 16);
- USATACC(8, 24);
- return r;
-}
-
-uint32_t HELPER(neon_uqadd_s16)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int32_t va, vb, vr;
- uint64_t r = 0;
-
- USATACC(16, 0);
- USATACC(16, 16);
- return r;
-}
-
-#undef USATACC
-
-uint32_t HELPER(neon_uqadd_s32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int64_t va = (int32_t)a;
- int64_t vb = (uint32_t)b;
- int64_t vr = va + vb;
- if (vr > UINT32_MAX) {
- SET_QC();
- vr = UINT32_MAX;
- } else if (vr < 0) {
- SET_QC();
- vr = 0;
- }
- return vr;
-}
-
-uint64_t HELPER(neon_uqadd_s64)(CPUARMState *env, uint64_t a, uint64_t b)
-{
- uint64_t res;
- res = a + b;
- /* We only need to look at the pattern of SIGN bits to detect
- * +ve/-ve saturation
- */
- if (~a & b & ~res & SIGNBIT64) {
- SET_QC();
- res = UINT64_MAX;
- } else if (a & ~b & res & SIGNBIT64) {
- SET_QC();
- res = 0;
- }
- return res;
-}
-
-/* Signed saturating accumulate of unsigned value
- *
- * Op1/Rn is treated as unsigned
- * Op2/Rd is treated as signed
- *
- * The result is treated as a signed value and saturated as such
- *
- * We use a macro for the 8/16 bit cases which expects signed integers of va,
- * vb, and vr for interim calculation and an unsigned 32 bit result value r.
- */
-
-#define SSATACC(bits, shift) \
- do { \
- va = extract32(a, shift, bits); \
- vb = sextract32(b, shift, bits); \
- vr = va + vb; \
- if (vr > INT##bits##_MAX) { \
- SET_QC(); \
- vr = INT##bits##_MAX; \
- } else if (vr < INT##bits##_MIN) { \
- SET_QC(); \
- vr = INT##bits##_MIN; \
- } \
- r = deposit32(r, shift, bits, vr); \
- } while (0)
-
-uint32_t HELPER(neon_sqadd_u8)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int16_t va, vb, vr;
- uint32_t r = 0;
-
- SSATACC(8, 0);
- SSATACC(8, 8);
- SSATACC(8, 16);
- SSATACC(8, 24);
- return r;
-}
-
-uint32_t HELPER(neon_sqadd_u16)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int32_t va, vb, vr;
- uint32_t r = 0;
-
- SSATACC(16, 0);
- SSATACC(16, 16);
-
- return r;
-}
-
-#undef SSATACC
-
-uint32_t HELPER(neon_sqadd_u32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- int64_t res;
- int64_t op1 = (uint32_t)a;
- int64_t op2 = (int32_t)b;
- res = op1 + op2;
- if (res > INT32_MAX) {
- SET_QC();
- res = INT32_MAX;
- } else if (res < INT32_MIN) {
- SET_QC();
- res = INT32_MIN;
- }
- return res;
-}
-
-uint64_t HELPER(neon_sqadd_u64)(CPUARMState *env, uint64_t a, uint64_t b)
-{
- uint64_t res;
- res = a + b;
- /* We only need to look at the pattern of SIGN bits to detect an overflow */
- if (((a & res)
- | (~b & res)
- | (a & ~b)) & SIGNBIT64) {
- SET_QC();
- res = INT64_MAX;
- }
- return res;
-}
-
-
#define NEON_USAT(dest, src1, src2, type) do { \
uint32_t tmp = (uint32_t)src1 - (uint32_t)src2; \
if (tmp != (type)tmp) { \
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9f948e033e..781b224972 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9985,67 +9985,42 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
bool is_q, unsigned size, int rn, int rd)
{
+ TCGv_i64 res, qc, a, b;
+
if (!is_scalar) {
gen_gvec_fn3(s, is_q, rd, rd, rn,
is_u ? gen_gvec_usqadd_qc : gen_gvec_suqadd_qc, size);
return;
}
- if (size == 3) {
- TCGv_i64 tcg_rn = tcg_temp_new_i64();
- TCGv_i64 tcg_rd = tcg_temp_new_i64();
+ res = tcg_temp_new_i64();
+ qc = tcg_temp_new_i64();
+ a = tcg_temp_new_i64();
+ b = tcg_temp_new_i64();
- read_vec_element(s, tcg_rn, rn, 0, MO_64);
- read_vec_element(s, tcg_rd, rd, 0, MO_64);
+ /* Read and extend scalar inputs to 64-bits. */
+ read_vec_element(s, a, rd, 0, size | (is_u ? 0 : MO_SIGN));
+ read_vec_element(s, b, rn, 0, size | (is_u ? MO_SIGN : 0));
+ tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- if (is_u) { /* USQADD */
- gen_helper_neon_uqadd_s64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- } else { /* SUQADD */
- gen_helper_neon_sqadd_u64(tcg_rd, tcg_env, tcg_rn, tcg_rd);
+ if (size == MO_64) {
+ if (is_u) {
+ gen_usqadd_d(res, qc, a, b);
+ } else {
+ gen_suqadd_d(res, qc, a, b);
}
- write_vec_element(s, tcg_rd, rd, 0, MO_64);
- clear_vec_high(s, false, rd);
} else {
- TCGv_i32 tcg_rn = tcg_temp_new_i32();
- TCGv_i32 tcg_rd = tcg_temp_new_i32();
-
- read_vec_element_i32(s, tcg_rn, rn, 0, size);
- read_vec_element_i32(s, tcg_rd, rd, 0, size);
-
- if (is_u) { /* USQADD */
- switch (size) {
- case 0:
- gen_helper_neon_uqadd_s8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 1:
- gen_helper_neon_uqadd_s16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 2:
- gen_helper_neon_uqadd_s32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- default:
- g_assert_not_reached();
- }
- } else { /* SUQADD */
- switch (size) {
- case 0:
- gen_helper_neon_sqadd_u8(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 1:
- gen_helper_neon_sqadd_u16(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- case 2:
- gen_helper_neon_sqadd_u32(tcg_rd, tcg_env, tcg_rn, tcg_rd);
- break;
- default:
- g_assert_not_reached();
- }
+ if (is_u) {
+ gen_usqadd_bhs(res, qc, a, b, size);
+ } else {
+ gen_suqadd_bhs(res, qc, a, b, size);
+ /* Truncate signed 64-bit result for writeback. */
+ tcg_gen_ext_i64(res, res, size);
}
-
- write_vec_element(s, tcg_constant_i64(0), rd, 0, MO_64);
- write_vec_element_i32(s, tcg_rd, rd, 0, MO_32);
- clear_vec_high(s, false, rd);
}
+
+ write_fp_dreg(s, rd, res);
+ tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
}
/* AdvSIMD scalar two reg misc
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 06/33] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (4 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 05/33] target/arm: Inline scalar SUQADD and USQADD Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 07/33] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree Richard Henderson
` (27 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
This eliminates the last uses of these neon helpers.
Incorporate the MO_64 expanders as an option to the vector expander.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 17 ----
target/arm/tcg/translate.h | 15 +++
target/arm/tcg/gengvec.c | 116 +++++++++++++++++++++++
target/arm/tcg/neon_helper.c | 162 ---------------------------------
target/arm/tcg/translate-a64.c | 67 ++++++++------
5 files changed, 169 insertions(+), 208 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index c76158d6d3..a14c040451 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,23 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
/* neon_helper.c */
-DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s16, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u32, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s32, i32, env, i32, i32)
-DEF_HELPER_3(neon_qadd_u64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qadd_s64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qsub_u64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qsub_s64, i64, env, i64, i64)
-
DEF_HELPER_2(neon_hadd_s8, i32, i32, i32)
DEF_HELPER_2(neon_hadd_u8, i32, i32, i32)
DEF_HELPER_2(neon_hadd_s16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 3abdbedfe5..87439dcc61 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -466,12 +466,27 @@ void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_uqadd_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_sqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_sqadd_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_uqsub_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_uqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_sqsub_bhs(TCGv_i64 res, TCGv_i64 qc,
+ TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_sqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 3e2d3c21a1..740f3f864e 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1218,6 +1218,28 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
+ TCGv_i64 tmp = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(tmp, a, b);
+ tcg_gen_umin_i64(res, tmp, tcg_constant_i64(max));
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_uqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(t, a, b);
+ tcg_gen_movcond_i64(TCG_COND_LTU, res, t, a,
+ tcg_constant_i64(UINT64_MAX), t);
+ tcg_gen_xor_i64(t, t, res);
+ tcg_gen_or_i64(qc, qc, t);
+}
+
static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -1251,6 +1273,7 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.opt_opc = vecop_list,
.vece = MO_32 },
{ .fniv = gen_uqadd_vec,
+ .fni8 = gen_uqadd_d,
.fno = gen_helper_gvec_uqadd_d,
.write_aofs = true,
.opt_opc = vecop_list,
@@ -1262,6 +1285,41 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_sqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ int64_t max = MAKE_64BIT_MASK(0, (8 << esz) - 1);
+ int64_t min = -1ll - max;
+ TCGv_i64 tmp = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(tmp, a, b);
+ tcg_gen_smin_i64(res, tmp, tcg_constant_i64(max));
+ tcg_gen_smax_i64(res, res, tcg_constant_i64(min));
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_sqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+ TCGv_i64 t2 = tcg_temp_new_i64();
+
+ tcg_gen_add_i64(t0, a, b);
+
+ /* Compute signed overflow indication into T1 */
+ tcg_gen_xor_i64(t1, a, b);
+ tcg_gen_xor_i64(t2, t0, a);
+ tcg_gen_andc_i64(t1, t2, t1);
+
+ /* Compute saturated value into T2 */
+ tcg_gen_sari_i64(t2, a, 63);
+ tcg_gen_xori_i64(t2, t2, INT64_MAX);
+
+ tcg_gen_movcond_i64(TCG_COND_LT, res, t1, tcg_constant_i64(0), t2, t0);
+ tcg_gen_xor_i64(t0, t0, res);
+ tcg_gen_or_i64(qc, qc, t0);
+}
+
static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -1295,6 +1353,7 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_32 },
{ .fniv = gen_sqadd_vec,
+ .fni8 = gen_sqadd_d,
.fno = gen_helper_gvec_sqadd_d,
.opt_opc = vecop_list,
.write_aofs = true,
@@ -1306,6 +1365,26 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_uqsub_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ TCGv_i64 tmp = tcg_temp_new_i64();
+
+ tcg_gen_sub_i64(tmp, a, b);
+ tcg_gen_smax_i64(res, tmp, tcg_constant_i64(0));
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_uqsub_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_sub_i64(t, a, b);
+ tcg_gen_movcond_i64(TCG_COND_LTU, res, a, b, tcg_constant_i64(0), t);
+ tcg_gen_xor_i64(t, t, res);
+ tcg_gen_or_i64(qc, qc, t);
+}
+
static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -1339,6 +1418,7 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_32 },
{ .fniv = gen_uqsub_vec,
+ .fni8 = gen_uqsub_d,
.fno = gen_helper_gvec_uqsub_d,
.opt_opc = vecop_list,
.write_aofs = true,
@@ -1350,6 +1430,41 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_sqsub_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+ int64_t max = MAKE_64BIT_MASK(0, (8 << esz) - 1);
+ int64_t min = -1ll - max;
+ TCGv_i64 tmp = tcg_temp_new_i64();
+
+ tcg_gen_sub_i64(tmp, a, b);
+ tcg_gen_smin_i64(res, tmp, tcg_constant_i64(max));
+ tcg_gen_smax_i64(res, res, tcg_constant_i64(min));
+ tcg_gen_xor_i64(tmp, tmp, res);
+ tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_sqsub_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+ TCGv_i64 t2 = tcg_temp_new_i64();
+
+ tcg_gen_sub_i64(t0, a, b);
+
+ /* Compute signed overflow indication into T1 */
+ tcg_gen_xor_i64(t1, a, b);
+ tcg_gen_xor_i64(t2, t0, a);
+ tcg_gen_and_i64(t1, t1, t2);
+
+ /* Compute saturated value into T2 */
+ tcg_gen_sari_i64(t2, a, 63);
+ tcg_gen_xori_i64(t2, t2, INT64_MAX);
+
+ tcg_gen_movcond_i64(TCG_COND_LT, res, t1, tcg_constant_i64(0), t2, t0);
+ tcg_gen_xor_i64(t0, t0, res);
+ tcg_gen_or_i64(qc, qc, t0);
+}
+
static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
{
@@ -1383,6 +1498,7 @@ void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
.write_aofs = true,
.vece = MO_32 },
{ .fniv = gen_sqsub_vec,
+ .fni8 = gen_sqsub_d,
.fno = gen_helper_gvec_sqsub_d,
.opt_opc = vecop_list,
.write_aofs = true,
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 9505a5fd18..0af15e9f6e 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -155,168 +155,6 @@ uint32_t HELPER(glue(neon_,name))(uint32_t arg) \
return arg; \
}
-
-#define NEON_USAT(dest, src1, src2, type) do { \
- uint32_t tmp = (uint32_t)src1 + (uint32_t)src2; \
- if (tmp != (type)tmp) { \
- SET_QC(); \
- dest = ~0; \
- } else { \
- dest = tmp; \
- }} while(0)
-#define NEON_FN(dest, src1, src2) NEON_USAT(dest, src1, src2, uint8_t)
-NEON_VOP_ENV(qadd_u8, neon_u8, 4)
-#undef NEON_FN
-#define NEON_FN(dest, src1, src2) NEON_USAT(dest, src1, src2, uint16_t)
-NEON_VOP_ENV(qadd_u16, neon_u16, 2)
-#undef NEON_FN
-#undef NEON_USAT
-
-uint32_t HELPER(neon_qadd_u32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- uint32_t res = a + b;
- if (res < a) {
- SET_QC();
- res = ~0;
- }
- return res;
-}
-
-uint64_t HELPER(neon_qadd_u64)(CPUARMState *env, uint64_t src1, uint64_t src2)
-{
- uint64_t res;
-
- res = src1 + src2;
- if (res < src1) {
- SET_QC();
- res = ~(uint64_t)0;
- }
- return res;
-}
-
-#define NEON_SSAT(dest, src1, src2, type) do { \
- int32_t tmp = (uint32_t)src1 + (uint32_t)src2; \
- if (tmp != (type)tmp) { \
- SET_QC(); \
- if (src2 > 0) { \
- tmp = (1 << (sizeof(type) * 8 - 1)) - 1; \
- } else { \
- tmp = 1 << (sizeof(type) * 8 - 1); \
- } \
- } \
- dest = tmp; \
- } while(0)
-#define NEON_FN(dest, src1, src2) NEON_SSAT(dest, src1, src2, int8_t)
-NEON_VOP_ENV(qadd_s8, neon_s8, 4)
-#undef NEON_FN
-#define NEON_FN(dest, src1, src2) NEON_SSAT(dest, src1, src2, int16_t)
-NEON_VOP_ENV(qadd_s16, neon_s16, 2)
-#undef NEON_FN
-#undef NEON_SSAT
-
-uint32_t HELPER(neon_qadd_s32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- uint32_t res = a + b;
- if (((res ^ a) & SIGNBIT) && !((a ^ b) & SIGNBIT)) {
- SET_QC();
- res = ~(((int32_t)a >> 31) ^ SIGNBIT);
- }
- return res;
-}
-
-uint64_t HELPER(neon_qadd_s64)(CPUARMState *env, uint64_t src1, uint64_t src2)
-{
- uint64_t res;
-
- res = src1 + src2;
- if (((res ^ src1) & SIGNBIT64) && !((src1 ^ src2) & SIGNBIT64)) {
- SET_QC();
- res = ((int64_t)src1 >> 63) ^ ~SIGNBIT64;
- }
- return res;
-}
-
-#define NEON_USAT(dest, src1, src2, type) do { \
- uint32_t tmp = (uint32_t)src1 - (uint32_t)src2; \
- if (tmp != (type)tmp) { \
- SET_QC(); \
- dest = 0; \
- } else { \
- dest = tmp; \
- }} while(0)
-#define NEON_FN(dest, src1, src2) NEON_USAT(dest, src1, src2, uint8_t)
-NEON_VOP_ENV(qsub_u8, neon_u8, 4)
-#undef NEON_FN
-#define NEON_FN(dest, src1, src2) NEON_USAT(dest, src1, src2, uint16_t)
-NEON_VOP_ENV(qsub_u16, neon_u16, 2)
-#undef NEON_FN
-#undef NEON_USAT
-
-uint32_t HELPER(neon_qsub_u32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- uint32_t res = a - b;
- if (res > a) {
- SET_QC();
- res = 0;
- }
- return res;
-}
-
-uint64_t HELPER(neon_qsub_u64)(CPUARMState *env, uint64_t src1, uint64_t src2)
-{
- uint64_t res;
-
- if (src1 < src2) {
- SET_QC();
- res = 0;
- } else {
- res = src1 - src2;
- }
- return res;
-}
-
-#define NEON_SSAT(dest, src1, src2, type) do { \
- int32_t tmp = (uint32_t)src1 - (uint32_t)src2; \
- if (tmp != (type)tmp) { \
- SET_QC(); \
- if (src2 < 0) { \
- tmp = (1 << (sizeof(type) * 8 - 1)) - 1; \
- } else { \
- tmp = 1 << (sizeof(type) * 8 - 1); \
- } \
- } \
- dest = tmp; \
- } while(0)
-#define NEON_FN(dest, src1, src2) NEON_SSAT(dest, src1, src2, int8_t)
-NEON_VOP_ENV(qsub_s8, neon_s8, 4)
-#undef NEON_FN
-#define NEON_FN(dest, src1, src2) NEON_SSAT(dest, src1, src2, int16_t)
-NEON_VOP_ENV(qsub_s16, neon_s16, 2)
-#undef NEON_FN
-#undef NEON_SSAT
-
-uint32_t HELPER(neon_qsub_s32)(CPUARMState *env, uint32_t a, uint32_t b)
-{
- uint32_t res = a - b;
- if (((res ^ a) & SIGNBIT) && ((a ^ b) & SIGNBIT)) {
- SET_QC();
- res = ~(((int32_t)a >> 31) ^ SIGNBIT);
- }
- return res;
-}
-
-uint64_t HELPER(neon_qsub_s64)(CPUARMState *env, uint64_t src1, uint64_t src2)
-{
- uint64_t res;
-
- res = src1 - src2;
- if (((res ^ src1) & SIGNBIT64) && ((src1 ^ src2) & SIGNBIT64)) {
- SET_QC();
- res = ((int64_t)src1 >> 63) ^ ~SIGNBIT64;
- }
- return res;
-}
-
#define NEON_FN(dest, src1, src2) dest = (src1 + src2) >> 1
NEON_VOP(hadd_s8, neon_s8, 4)
NEON_VOP(hadd_u8, neon_u8, 4)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 781b224972..ca7ba6b1e8 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9291,21 +9291,28 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
* or scalar-three-reg-same groups.
*/
TCGCond cond;
+ TCGv_i64 qc;
switch (opcode) {
case 0x1: /* SQADD */
+ qc = tcg_temp_new_i64();
+ tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
if (u) {
- gen_helper_neon_qadd_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
+ gen_uqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
} else {
- gen_helper_neon_qadd_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
+ gen_sqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
}
+ tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
break;
case 0x5: /* SQSUB */
+ qc = tcg_temp_new_i64();
+ tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
if (u) {
- gen_helper_neon_qsub_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
+ gen_uqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
} else {
- gen_helper_neon_qsub_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
+ gen_sqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
}
+ tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
break;
case 0x6: /* CMGT, CMHI */
cond = u ? TCG_COND_GTU : TCG_COND_GT;
@@ -9425,35 +9432,16 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
* OPTME: special-purpose helpers would avoid doing some
* unnecessary work in the helper for the 8 and 16 bit cases.
*/
- NeonGenTwoOpEnvFn *genenvfn;
- TCGv_i32 tcg_rn = tcg_temp_new_i32();
- TCGv_i32 tcg_rm = tcg_temp_new_i32();
- TCGv_i32 tcg_rd32 = tcg_temp_new_i32();
-
- read_vec_element_i32(s, tcg_rn, rn, 0, size);
- read_vec_element_i32(s, tcg_rm, rm, 0, size);
+ NeonGenTwoOpEnvFn *genenvfn = NULL;
+ void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
switch (opcode) {
case 0x1: /* SQADD, UQADD */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
- { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
- { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
- };
- genenvfn = fns[size][u];
+ genfn = u ? gen_uqadd_bhs : gen_sqadd_bhs;
break;
- }
case 0x5: /* SQSUB, UQSUB */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
- { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
- { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
- };
- genenvfn = fns[size][u];
+ genfn = u ? gen_uqsub_bhs : gen_sqsub_bhs;
break;
- }
case 0x9: /* SQSHL, UQSHL */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -9488,8 +9476,29 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
g_assert_not_reached();
}
- genenvfn(tcg_rd32, tcg_env, tcg_rn, tcg_rm);
- tcg_gen_extu_i32_i64(tcg_rd, tcg_rd32);
+ if (genenvfn) {
+ TCGv_i32 tcg_rn = tcg_temp_new_i32();
+ TCGv_i32 tcg_rm = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, tcg_rn, rn, 0, size);
+ read_vec_element_i32(s, tcg_rm, rm, 0, size);
+ genenvfn(tcg_rn, tcg_env, tcg_rn, tcg_rm);
+ tcg_gen_extu_i32_i64(tcg_rd, tcg_rn);
+ } else {
+ TCGv_i64 tcg_rn = tcg_temp_new_i64();
+ TCGv_i64 tcg_rm = tcg_temp_new_i64();
+ TCGv_i64 qc = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_rn, rn, 0, size | (u ? 0 : MO_SIGN));
+ read_vec_element(s, tcg_rm, rm, 0, size | (u ? 0 : MO_SIGN));
+ tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+ genfn(tcg_rd, qc, tcg_rn, tcg_rm, size);
+ tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+ if (!u) {
+ /* Truncate signed 64-bit result for writeback. */
+ tcg_gen_ext_i64(tcg_rd, tcg_rd, size);
+ }
+ }
}
write_fp_dreg(s, rd, tcg_rd);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 07/33] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (5 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 06/33] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 08/33] target/arm: Convert SUQADD, USQADD " Richard Henderson
` (26 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 11 ++++
target/arm/tcg/translate-a64.c | 96 +++++++++++++++++++---------------
2 files changed, 64 insertions(+), 43 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f48adef5bb..19010af03b 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -44,6 +44,7 @@
@rrr_h ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=1
@rrr_sd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_sd
@rrr_hsd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_hsd
+@rrr_e ........ esz:2 . rm:5 ...... rn:5 rd:5 &rrr_e
@rrx_h ........ .. .. rm:4 .... . . rn:5 rd:5 &rrx_e esz=1 idx=%hlm
@rrx_s ........ .. . rm:5 .... . . rn:5 rd:5 &rrx_e esz=2 idx=%hl
@@ -744,6 +745,11 @@ FRECPS_s 0101 1110 0.1 ..... 11111 1 ..... ..... @rrr_sd
FRSQRTS_s 0101 1110 110 ..... 00111 1 ..... ..... @rrr_h
FRSQRTS_s 0101 1110 1.1 ..... 11111 1 ..... ..... @rrr_sd
+SQADD_s 0101 1110 ..1 ..... 00001 1 ..... ..... @rrr_e
+UQADD_s 0111 1110 ..1 ..... 00001 1 ..... ..... @rrr_e
+SQSUB_s 0101 1110 ..1 ..... 00101 1 ..... ..... @rrr_e
+UQSUB_s 0111 1110 ..1 ..... 00101 1 ..... ..... @rrr_e
+
### Advanced SIMD scalar pairwise
FADDP_s 0101 1110 0011 0000 1101 10 ..... ..... @rr_h
@@ -857,6 +863,11 @@ BSL_v 0.10 1110 011 ..... 00011 1 ..... ..... @qrrr_b
BIT_v 0.10 1110 101 ..... 00011 1 ..... ..... @qrrr_b
BIF_v 0.10 1110 111 ..... 00011 1 ..... ..... @qrrr_b
+SQADD_v 0.00 1110 ..1 ..... 00001 1 ..... ..... @qrrr_e
+UQADD_v 0.10 1110 ..1 ..... 00001 1 ..... ..... @qrrr_e
+SQSUB_v 0.00 1110 ..1 ..... 00101 1 ..... ..... @qrrr_e
+UQSUB_v 0.10 1110 ..1 ..... 00101 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index ca7ba6b1e8..3956c41543 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5060,6 +5060,43 @@ static const FPScalar f_scalar_frsqrts = {
};
TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+static bool do_satacc_s(DisasContext *s, arg_rrr_e *a,
+ MemOp sgn_n, MemOp sgn_m,
+ void (*gen_bhs)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp),
+ void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+{
+ TCGv_i64 t0, t1, t2, qc;
+ MemOp esz = a->esz;
+
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ t0 = tcg_temp_new_i64();
+ t1 = tcg_temp_new_i64();
+ t2 = tcg_temp_new_i64();
+ qc = tcg_temp_new_i64();
+ read_vec_element(s, t1, a->rn, 0, esz | sgn_n);
+ read_vec_element(s, t2, a->rm, 0, esz | sgn_m);
+ tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+
+ if (esz == MO_64) {
+ gen_d(t0, qc, t1, t2);
+ } else {
+ gen_bhs(t0, qc, t1, t2, esz);
+ tcg_gen_ext_i64(t0, t0, esz);
+ }
+
+ write_fp_dreg(s, a->rd, t0);
+ tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+ return true;
+}
+
+TRANS(SQADD_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqadd_bhs, gen_sqadd_d)
+TRANS(SQSUB_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqsub_bhs, gen_sqsub_d)
+TRANS(UQADD_s, do_satacc_s, a, 0, 0, gen_uqadd_bhs, gen_uqadd_d)
+TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, gen_uqsub_d)
+
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5298,6 +5335,11 @@ TRANS(BSL_v, do_bitsel, a->q, a->rd, a->rd, a->rn, a->rm)
TRANS(BIT_v, do_bitsel, a->q, a->rd, a->rm, a->rn, a->rd)
TRANS(BIF_v, do_bitsel, a->q, a->rd, a->rm, a->rd, a->rn)
+TRANS(SQADD_v, do_gvec_fn3, a, gen_gvec_sqadd_qc)
+TRANS(UQADD_v, do_gvec_fn3, a, gen_gvec_uqadd_qc)
+TRANS(SQSUB_v, do_gvec_fn3, a, gen_gvec_sqsub_qc)
+TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -9291,29 +9333,8 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
* or scalar-three-reg-same groups.
*/
TCGCond cond;
- TCGv_i64 qc;
switch (opcode) {
- case 0x1: /* SQADD */
- qc = tcg_temp_new_i64();
- tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- if (u) {
- gen_uqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
- } else {
- gen_sqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
- }
- tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- break;
- case 0x5: /* SQSUB */
- qc = tcg_temp_new_i64();
- tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- if (u) {
- gen_uqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
- } else {
- gen_sqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
- }
- tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- break;
case 0x6: /* CMGT, CMHI */
cond = u ? TCG_COND_GTU : TCG_COND_GT;
do_cmop:
@@ -9366,6 +9387,8 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
break;
default:
+ case 0x1: /* SQADD / UQADD */
+ case 0x5: /* SQSUB / UQSUB */
g_assert_not_reached();
}
}
@@ -9387,8 +9410,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
TCGv_i64 tcg_rd;
switch (opcode) {
- case 0x1: /* SQADD, UQADD */
- case 0x5: /* SQSUB, UQSUB */
case 0x9: /* SQSHL, UQSHL */
case 0xb: /* SQRSHL, UQRSHL */
break;
@@ -9410,6 +9431,8 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x1: /* SQADD, UQADD */
+ case 0x5: /* SQSUB, UQSUB */
unallocated_encoding(s);
return;
}
@@ -9436,12 +9459,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
switch (opcode) {
- case 0x1: /* SQADD, UQADD */
- genfn = u ? gen_uqadd_bhs : gen_sqadd_bhs;
- break;
- case 0x5: /* SQSUB, UQSUB */
- genfn = u ? gen_uqsub_bhs : gen_sqsub_bhs;
- break;
case 0x9: /* SQSHL, UQSHL */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -9473,6 +9490,8 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
break;
}
default:
+ case 0x1: /* SQADD, UQADD */
+ case 0x5: /* SQSUB, UQSUB */
g_assert_not_reached();
}
@@ -10933,6 +10952,11 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
return;
}
break;
+
+ case 0x01: /* SQADD, UQADD */
+ case 0x05: /* SQSUB, UQSUB */
+ unallocated_encoding(s);
+ return;
}
if (!fp_access_check(s)) {
@@ -10940,20 +10964,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x01: /* SQADD, UQADD */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uqadd_qc, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqadd_qc, size);
- }
- return;
- case 0x05: /* SQSUB, UQSUB */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uqsub_qc, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqsub_qc, size);
- }
- return;
case 0x08: /* SSHL, USHL */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_ushl, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 08/33] target/arm: Convert SUQADD, USQADD to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (6 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 07/33] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 09/33] target/arm: Convert SSHL, USHL " Richard Henderson
` (25 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
These are faux 2-operand instructions, reading from rd.
Sort them next to the other three-operand same insns for clarity.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 8 +++++
target/arm/tcg/translate-a64.c | 64 ++++------------------------------
2 files changed, 14 insertions(+), 58 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 19010af03b..7c350ba833 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -45,6 +45,7 @@
@rrr_sd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_sd
@rrr_hsd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_hsd
@rrr_e ........ esz:2 . rm:5 ...... rn:5 rd:5 &rrr_e
+@r2r_e ........ esz:2 . ..... ...... rm:5 rd:5 &rrr_e rn=%rd
@rrx_h ........ .. .. rm:4 .... . . rn:5 rd:5 &rrx_e esz=1 idx=%hlm
@rrx_s ........ .. . rm:5 .... . . rn:5 rd:5 &rrx_e esz=2 idx=%hl
@@ -60,6 +61,7 @@
@qrrr_h . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=1
@qrrr_sd . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=%esz_sd
@qrrr_e . q:1 ...... esz:2 . rm:5 ...... rn:5 rd:5 &qrrr_e
+@qr2r_e . q:1 ...... esz:2 . ..... ...... rm:5 rd:5 &qrrr_e rn=%rd
@qrrx_h . q:1 .. .... .. .. rm:4 .... . . rn:5 rd:5 \
&qrrx_e esz=1 idx=%hlm
@@ -750,6 +752,9 @@ UQADD_s 0111 1110 ..1 ..... 00001 1 ..... ..... @rrr_e
SQSUB_s 0101 1110 ..1 ..... 00101 1 ..... ..... @rrr_e
UQSUB_s 0111 1110 ..1 ..... 00101 1 ..... ..... @rrr_e
+SUQADD_s 0101 1110 ..1 00000 00111 0 ..... ..... @r2r_e
+USQADD_s 0111 1110 ..1 00000 00111 0 ..... ..... @r2r_e
+
### Advanced SIMD scalar pairwise
FADDP_s 0101 1110 0011 0000 1101 10 ..... ..... @rr_h
@@ -868,6 +873,9 @@ UQADD_v 0.10 1110 ..1 ..... 00001 1 ..... ..... @qrrr_e
SQSUB_v 0.00 1110 ..1 ..... 00101 1 ..... ..... @qrrr_e
UQSUB_v 0.10 1110 ..1 ..... 00101 1 ..... ..... @qrrr_e
+SUQADD_v 0.00 1110 ..1 00000 00111 0 ..... ..... @qr2r_e
+USQADD_v 0.10 1110 ..1 00000 00111 0 ..... ..... @qr2r_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3956c41543..c0637bda0f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5096,6 +5096,8 @@ TRANS(SQADD_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqadd_bhs, gen_sqadd_d)
TRANS(SQSUB_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqsub_bhs, gen_sqsub_d)
TRANS(UQADD_s, do_satacc_s, a, 0, 0, gen_uqadd_bhs, gen_uqadd_d)
TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, gen_uqsub_d)
+TRANS(SUQADD_s, do_satacc_s, a, MO_SIGN, 0, gen_suqadd_bhs, gen_suqadd_d)
+TRANS(USQADD_s, do_satacc_s, a, 0, MO_SIGN, gen_usqadd_bhs, gen_usqadd_d)
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
@@ -5339,6 +5341,8 @@ TRANS(SQADD_v, do_gvec_fn3, a, gen_gvec_sqadd_qc)
TRANS(UQADD_v, do_gvec_fn3, a, gen_gvec_uqadd_qc)
TRANS(SQSUB_v, do_gvec_fn3, a, gen_gvec_sqsub_qc)
TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
+TRANS(SUQADD_v, do_gvec_fn3, a, gen_gvec_suqadd_qc)
+TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -10009,48 +10013,6 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
clear_vec_high(s, is_q, rd);
}
-/* Remaining saturating accumulating ops */
-static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
- bool is_q, unsigned size, int rn, int rd)
-{
- TCGv_i64 res, qc, a, b;
-
- if (!is_scalar) {
- gen_gvec_fn3(s, is_q, rd, rd, rn,
- is_u ? gen_gvec_usqadd_qc : gen_gvec_suqadd_qc, size);
- return;
- }
-
- res = tcg_temp_new_i64();
- qc = tcg_temp_new_i64();
- a = tcg_temp_new_i64();
- b = tcg_temp_new_i64();
-
- /* Read and extend scalar inputs to 64-bits. */
- read_vec_element(s, a, rd, 0, size | (is_u ? 0 : MO_SIGN));
- read_vec_element(s, b, rn, 0, size | (is_u ? MO_SIGN : 0));
- tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-
- if (size == MO_64) {
- if (is_u) {
- gen_usqadd_d(res, qc, a, b);
- } else {
- gen_suqadd_d(res, qc, a, b);
- }
- } else {
- if (is_u) {
- gen_usqadd_bhs(res, qc, a, b, size);
- } else {
- gen_suqadd_bhs(res, qc, a, b, size);
- /* Truncate signed 64-bit result for writeback. */
- tcg_gen_ext_i64(res, res, size);
- }
- }
-
- write_fp_dreg(s, rd, res);
- tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-}
-
/* AdvSIMD scalar two reg misc
* 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0
* +-----+---+-----------+------+-----------+--------+-----+------+------+
@@ -10070,12 +10032,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
TCGv_ptr tcg_fpstatus;
switch (opcode) {
- case 0x3: /* USQADD / SUQADD*/
- if (!fp_access_check(s)) {
- return;
- }
- handle_2misc_satacc(s, true, u, false, size, rn, rd);
- return;
case 0x7: /* SQABS / SQNEG */
break;
case 0xa: /* CMLT */
@@ -10175,6 +10131,7 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x3: /* USQADD / SUQADD */
unallocated_encoding(s);
return;
}
@@ -11662,16 +11619,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
return;
}
break;
- case 0x3: /* SUQADD, USQADD */
- if (size == 3 && !is_q) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
- handle_2misc_satacc(s, false, u, is_q, size, rn, rd);
- return;
case 0x7: /* SQABS, SQNEG */
if (size == 3 && !is_q) {
unallocated_encoding(s);
@@ -11846,6 +11793,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
break;
}
default:
+ case 0x3: /* SUQADD, USQADD */
unallocated_encoding(s);
return;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 09/33] target/arm: Convert SSHL, USHL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (7 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 08/33] target/arm: Convert SUQADD, USQADD " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec Richard Henderson
` (24 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 7 ++++++
target/arm/tcg/translate-a64.c | 40 +++++++++++++++++++++-------------
2 files changed, 32 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7c350ba833..ea897d6732 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -42,6 +42,7 @@
@rr_sd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_sd
@rrr_h ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=1
+@rrr_d ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=3
@rrr_sd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_sd
@rrr_hsd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_hsd
@rrr_e ........ esz:2 . rm:5 ...... rn:5 rd:5 &rrr_e
@@ -755,6 +756,9 @@ UQSUB_s 0111 1110 ..1 ..... 00101 1 ..... ..... @rrr_e
SUQADD_s 0101 1110 ..1 00000 00111 0 ..... ..... @r2r_e
USQADD_s 0111 1110 ..1 00000 00111 0 ..... ..... @r2r_e
+SSHL_s 0101 1110 111 ..... 01000 1 ..... ..... @rrr_d
+USHL_s 0111 1110 111 ..... 01000 1 ..... ..... @rrr_d
+
### Advanced SIMD scalar pairwise
FADDP_s 0101 1110 0011 0000 1101 10 ..... ..... @rr_h
@@ -876,6 +880,9 @@ UQSUB_v 0.10 1110 ..1 ..... 00101 1 ..... ..... @qrrr_e
SUQADD_v 0.00 1110 ..1 00000 00111 0 ..... ..... @qr2r_e
USQADD_v 0.10 1110 ..1 00000 00111 0 ..... ..... @qr2r_e
+SSHL_v 0.00 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
+USHL_v 0.10 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c0637bda0f..7c7a22985b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5099,6 +5099,24 @@ TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, gen_uqsub_d)
TRANS(SUQADD_s, do_satacc_s, a, MO_SIGN, 0, gen_suqadd_bhs, gen_suqadd_d)
TRANS(USQADD_s, do_satacc_s, a, 0, MO_SIGN, gen_usqadd_bhs, gen_usqadd_d)
+static bool do_int3_scalar_d(DisasContext *s, arg_rrr_e *a,
+ void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+ if (fp_access_check(s)) {
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+
+ read_vec_element(s, t0, a->rn, 0, MO_64);
+ read_vec_element(s, t1, a->rm, 0, MO_64);
+ fn(t0, t0, t1);
+ write_fp_dreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
+TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
+
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5344,6 +5362,10 @@ TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
TRANS(SUQADD_v, do_gvec_fn3, a, gen_gvec_suqadd_qc)
TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
+TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
+TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
+
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -9355,13 +9377,6 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
break;
- case 0x8: /* SSHL, USHL */
- if (u) {
- gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
- } else {
- gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
- }
- break;
case 0x9: /* SQSHL, UQSHL */
if (u) {
gen_helper_neon_qshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9393,6 +9408,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
default:
case 0x1: /* SQADD / UQADD */
case 0x5: /* SQSUB / UQSUB */
+ case 0x8: /* SSHL, USHL */
g_assert_not_reached();
}
}
@@ -9417,7 +9433,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x9: /* SQSHL, UQSHL */
case 0xb: /* SQRSHL, UQRSHL */
break;
- case 0x8: /* SSHL, USHL */
case 0xa: /* SRSHL, URSHL */
case 0x6: /* CMGT, CMHI */
case 0x7: /* CMGE, CMHS */
@@ -9437,6 +9452,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
default:
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
+ case 0x8: /* SSHL, USHL */
unallocated_encoding(s);
return;
}
@@ -10912,6 +10928,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x01: /* SQADD, UQADD */
case 0x05: /* SQSUB, UQSUB */
+ case 0x08: /* SSHL, USHL */
unallocated_encoding(s);
return;
}
@@ -10921,13 +10938,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x08: /* SSHL, USHL */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_ushl, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sshl, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (8 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 09/33] target/arm: Convert SSHL, USHL " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:12 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 11/33] target/arm: Convert SRSHL, URSHL to decodetree Richard Henderson
` (23 subsequent siblings)
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 10 +++++++++
target/arm/tcg/translate.h | 4 ++++
target/arm/tcg/neon-dp.decode | 10 ++-------
target/arm/tcg/gengvec.c | 22 +++++++++++++++++++
target/arm/tcg/neon_helper.c | 38 ++++++++++++++++++++++++++++++++-
target/arm/tcg/translate-a64.c | 17 ++++++---------
target/arm/tcg/translate-neon.c | 6 ++----
7 files changed, 84 insertions(+), 23 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index a14c040451..25eb7bf5df 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -327,6 +327,16 @@ DEF_HELPER_3(neon_qrshl_s32, i32, env, i32, i32)
DEF_HELPER_3(neon_qrshl_u64, i64, env, i64, i64)
DEF_HELPER_3(neon_qrshl_s64, i64, env, i64, i64)
+DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_urshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
DEF_HELPER_2(neon_add_u8, i32, i32, i32)
DEF_HELPER_2(neon_add_u16, i32, i32, i32)
DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 87439dcc61..ea63ffc47b 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -459,6 +459,10 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_ushl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index fd3a01bfa0..8525c65c0d 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -117,14 +117,8 @@ VSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
VQSHL_U64_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
VQSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_rev
}
-{
- VRSHL_S64_3s 1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
- VRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_rev
-}
-{
- VRSHL_U64_3s 1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
- VRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_rev
-}
+VRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_rev
+VRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_rev
{
VQRSHL_S64_3s 1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
VQRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_rev
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 740f3f864e..216a9f81e3 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1218,6 +1218,28 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
}
+void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3 * const fns[] = {
+ gen_helper_gvec_srshl_b, gen_helper_gvec_srshl_h,
+ gen_helper_gvec_srshl_s, gen_helper_gvec_srshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3 * const fns[] = {
+ gen_helper_gvec_urshl_b, gen_helper_gvec_urshl_h,
+ gen_helper_gvec_urshl_s, gen_helper_gvec_urshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
{
uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 0af15e9f6e..516ecc1dcb 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -6,10 +6,11 @@
*
* This code is licensed under the GNU GPL v2.
*/
-#include "qemu/osdep.h"
+#include "qemu/osdep.h"
#include "cpu.h"
#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
#include "fpu/softfloat.h"
#include "vec_internal.h"
@@ -117,6 +118,17 @@ NEON_VOP_BODY(vtype, n)
uint32_t HELPER(glue(neon_,name))(CPUARMState *env, uint32_t arg1, uint32_t arg2) \
NEON_VOP_BODY(vtype, n)
+#define NEON_GVEC_VOP2(name, vtype) \
+void HELPER(name)(void *vd, void *vn, void *vm, uint32_t desc) \
+{ \
+ intptr_t i, opr_sz = simd_oprsz(desc); \
+ vtype *d = vd, *n = vn, *m = vm; \
+ for (i = 0; i < opr_sz / sizeof(vtype); i++) { \
+ NEON_FN(d[i], n[i], m[i]); \
+ } \
+ clear_tail(d, opr_sz, simd_maxsz(desc)); \
+}
+
/* Pairwise operations. */
/* For 32-bit elements each segment only contains a single element, so
the elementwise and pairwise operations are the same. */
@@ -263,11 +275,23 @@ NEON_VOP(shl_s16, neon_s16, 2)
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 8, true, NULL))
NEON_VOP(rshl_s8, neon_s8, 4)
+NEON_GVEC_VOP2(gvec_srshl_b, int8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 16, true, NULL))
NEON_VOP(rshl_s16, neon_s16, 2)
+NEON_GVEC_VOP2(gvec_srshl_h, int16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, true, NULL))
+NEON_GVEC_VOP2(gvec_srshl_s, int32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_d(src1, (int8_t)src2, true, NULL))
+NEON_GVEC_VOP2(gvec_srshl_d, int64_t)
#undef NEON_FN
uint32_t HELPER(neon_rshl_s32)(uint32_t val, uint32_t shift)
@@ -283,11 +307,23 @@ uint64_t HELPER(neon_rshl_s64)(uint64_t val, uint64_t shift)
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 8, true, NULL))
NEON_VOP(rshl_u8, neon_u8, 4)
+NEON_GVEC_VOP2(gvec_urshl_b, uint8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 16, true, NULL))
NEON_VOP(rshl_u16, neon_u16, 2)
+NEON_GVEC_VOP2(gvec_urshl_h, uint16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, true, NULL))
+NEON_GVEC_VOP2(gvec_urshl_s, int32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_d(src1, (int8_t)src2, true, NULL))
+NEON_GVEC_VOP2(gvec_urshl_d, int64_t)
#undef NEON_FN
uint32_t HELPER(neon_rshl_u32)(uint32_t val, uint32_t shift)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7c7a22985b..7e981f8d01 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10938,6 +10938,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x0a: /* SRSHL, URSHL */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urshl, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srshl, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11083,16 +11090,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genenvfn = fns[size][u];
break;
}
- case 0xa: /* SRSHL, URSHL */
- {
- static NeonGenTwoOpFn * const fns[3][2] = {
- { gen_helper_neon_rshl_s8, gen_helper_neon_rshl_u8 },
- { gen_helper_neon_rshl_s16, gen_helper_neon_rshl_u16 },
- { gen_helper_neon_rshl_s32, gen_helper_neon_rshl_u32 },
- };
- genfn = fns[size][u];
- break;
- }
case 0xb: /* SQRSHL, UQRSHL */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 18b048611b..337488bbf1 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -794,6 +794,8 @@ DO_3SAME(VQADD_S, gen_gvec_sqadd_qc)
DO_3SAME(VQADD_U, gen_gvec_uqadd_qc)
DO_3SAME(VQSUB_S, gen_gvec_sqsub_qc)
DO_3SAME(VQSUB_U, gen_gvec_uqsub_qc)
+DO_3SAME(VRSHL_S, gen_gvec_srshl)
+DO_3SAME(VRSHL_U, gen_gvec_urshl)
/* These insns are all gvec_bitsel but with the inputs in various orders. */
#define DO_3SAME_BITSEL(INSN, O1, O2, O3) \
@@ -929,8 +931,6 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
} \
DO_3SAME_64(INSN, gen_##INSN##_elt)
-DO_3SAME_64(VRSHL_S64, gen_helper_neon_rshl_s64)
-DO_3SAME_64(VRSHL_U64, gen_helper_neon_rshl_u64)
DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
@@ -999,8 +999,6 @@ DO_3SAME_32(VHSUB_S, hsub_s)
DO_3SAME_32(VHSUB_U, hsub_u)
DO_3SAME_32(VRHADD_S, rhadd_s)
DO_3SAME_32(VRHADD_U, rhadd_u)
-DO_3SAME_32(VRSHL_S, rshl_s)
-DO_3SAME_32(VRSHL_U, rshl_u)
DO_3SAME_32_ENV(VQSHL_S, qshl_s)
DO_3SAME_32_ENV(VQSHL_U, qshl_u)
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 11/33] target/arm: Convert SRSHL, URSHL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (9 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec Richard Henderson
` (22 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++++
target/arm/tcg/translate-a64.c | 22 +++++++---------------
2 files changed, 11 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index ea897d6732..9e02776036 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -758,6 +758,8 @@ USQADD_s 0111 1110 ..1 00000 00111 0 ..... ..... @r2r_e
SSHL_s 0101 1110 111 ..... 01000 1 ..... ..... @rrr_d
USHL_s 0111 1110 111 ..... 01000 1 ..... ..... @rrr_d
+SRSHL_s 0101 1110 111 ..... 01010 1 ..... ..... @rrr_d
+URSHL_s 0111 1110 111 ..... 01010 1 ..... ..... @rrr_d
### Advanced SIMD scalar pairwise
@@ -882,6 +884,8 @@ USQADD_v 0.10 1110 ..1 00000 00111 0 ..... ..... @qr2r_e
SSHL_v 0.00 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
USHL_v 0.10 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
+SRSHL_v 0.00 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
+URSHL_v 0.10 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7e981f8d01..c751da78ef 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5116,6 +5116,8 @@ static bool do_int3_scalar_d(DisasContext *s, arg_rrr_e *a,
TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
+TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
+TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
@@ -5364,6 +5366,8 @@ TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
+TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
+TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
/*
@@ -9384,13 +9388,6 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
gen_helper_neon_qshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
}
break;
- case 0xa: /* SRSHL, URSHL */
- if (u) {
- gen_helper_neon_rshl_u64(tcg_rd, tcg_rn, tcg_rm);
- } else {
- gen_helper_neon_rshl_s64(tcg_rd, tcg_rn, tcg_rm);
- }
- break;
case 0xb: /* SQRSHL, UQRSHL */
if (u) {
gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9409,6 +9406,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
case 0x1: /* SQADD / UQADD */
case 0x5: /* SQSUB / UQSUB */
case 0x8: /* SSHL, USHL */
+ case 0xa: /* SRSHL, URSHL */
g_assert_not_reached();
}
}
@@ -9433,7 +9431,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x9: /* SQSHL, UQSHL */
case 0xb: /* SQRSHL, UQRSHL */
break;
- case 0xa: /* SRSHL, URSHL */
case 0x6: /* CMGT, CMHI */
case 0x7: /* CMGE, CMHS */
case 0x11: /* CMTST, CMEQ */
@@ -9453,6 +9450,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
case 0x8: /* SSHL, USHL */
+ case 0xa: /* SRSHL, URSHL */
unallocated_encoding(s);
return;
}
@@ -10929,6 +10927,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x01: /* SQADD, UQADD */
case 0x05: /* SQSUB, UQSUB */
case 0x08: /* SSHL, USHL */
+ case 0x0a: /* SRSHL, URSHL */
unallocated_encoding(s);
return;
}
@@ -10938,13 +10937,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0a: /* SRSHL, URSHL */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urshl, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srshl, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (10 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 11/33] target/arm: Convert SRSHL, URSHL to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:12 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 13/33] target/arm: Convert SQSHL, UQSHL to decodetree Richard Henderson
` (21 subsequent siblings)
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 8 ++++++++
target/arm/tcg/translate.h | 4 ++++
target/arm/tcg/neon-dp.decode | 10 ++-------
target/arm/tcg/gengvec.c | 24 ++++++++++++++++++++++
target/arm/tcg/neon_helper.c | 36 +++++++++++++++++++++++++++++++++
target/arm/tcg/translate-a64.c | 17 +++++++---------
target/arm/tcg/translate-neon.c | 6 ++----
7 files changed, 83 insertions(+), 22 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 25eb7bf5df..f345087ddb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -326,6 +326,14 @@ DEF_HELPER_3(neon_qrshl_u32, i32, env, i32, i32)
DEF_HELPER_3(neon_qrshl_s32, i32, env, i32, i32)
DEF_HELPER_3(neon_qrshl_u64, i64, env, i64, i64)
DEF_HELPER_3(neon_qrshl_s64, i64, env, i64, i64)
+DEF_HELPER_FLAGS_5(neon_sqshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index ea63ffc47b..6c6d4d49e7 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -463,6 +463,10 @@ void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index 8525c65c0d..6d4996b8d8 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -109,14 +109,8 @@ VSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
@3same_64_rev .... ... . . . 11 .... .... .... . q:1 . . .... \
&3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
-{
- VQSHL_S64_3s 1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
- VQSHL_S_3s 1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_rev
-}
-{
- VQSHL_U64_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
- VQSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_rev
-}
+VQSHL_S_3s 1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_rev
+VQSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_rev
VRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_rev
VRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_rev
{
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 216a9f81e3..63c3ec2e73 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1240,6 +1240,30 @@ void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
}
+void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[] = {
+ gen_helper_neon_sqshl_b, gen_helper_neon_sqshl_h,
+ gen_helper_neon_sqshl_s, gen_helper_neon_sqshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+ opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[] = {
+ gen_helper_neon_uqshl_b, gen_helper_neon_uqshl_h,
+ gen_helper_neon_uqshl_s, gen_helper_neon_uqshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+ opr_sz, max_sz, 0, fns[vece]);
+}
+
void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
{
uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 516ecc1dcb..88301f0dcb 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -129,6 +129,18 @@ void HELPER(name)(void *vd, void *vn, void *vm, uint32_t desc) \
clear_tail(d, opr_sz, simd_maxsz(desc)); \
}
+#define NEON_GVEC_VOP2_ENV(name, vtype) \
+void HELPER(name)(void *vd, void *vn, void *vm, void *venv, uint32_t desc) \
+{ \
+ intptr_t i, opr_sz = simd_oprsz(desc); \
+ vtype *d = vd, *n = vn, *m = vm; \
+ CPUARMState *env = venv; \
+ for (i = 0; i < opr_sz / sizeof(vtype); i++) { \
+ NEON_FN(d[i], n[i], m[i]); \
+ } \
+ clear_tail(d, opr_sz, simd_maxsz(desc)); \
+}
+
/* Pairwise operations. */
/* For 32-bit elements each segment only contains a single element, so
the elementwise and pairwise operations are the same. */
@@ -339,11 +351,23 @@ uint64_t HELPER(neon_rshl_u64)(uint64_t val, uint64_t shift)
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 8, false, env->vfp.qc))
NEON_VOP_ENV(qshl_u8, neon_u8, 4)
+NEON_GVEC_VOP2_ENV(neon_uqshl_b, uint8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 16, false, env->vfp.qc))
NEON_VOP_ENV(qshl_u16, neon_u16, 2)
+NEON_GVEC_VOP2_ENV(neon_uqshl_h, uint16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, false, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_uqshl_s, uint32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_d(src1, (int8_t)src2, false, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_uqshl_d, uint64_t)
#undef NEON_FN
uint32_t HELPER(neon_qshl_u32)(CPUARMState *env, uint32_t val, uint32_t shift)
@@ -359,11 +383,23 @@ uint64_t HELPER(neon_qshl_u64)(CPUARMState *env, uint64_t val, uint64_t shift)
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 8, false, env->vfp.qc))
NEON_VOP_ENV(qshl_s8, neon_s8, 4)
+NEON_GVEC_VOP2_ENV(neon_sqshl_b, int8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 16, false, env->vfp.qc))
NEON_VOP_ENV(qshl_s16, neon_s16, 2)
+NEON_GVEC_VOP2_ENV(neon_sqshl_h, int16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, false, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_sqshl_s, int32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_d(src1, (int8_t)src2, false, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_sqshl_d, int64_t)
#undef NEON_FN
uint32_t HELPER(neon_qshl_s32)(CPUARMState *env, uint32_t val, uint32_t shift)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c751da78ef..c88702dad6 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10937,6 +10937,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x09: /* SQSHL, UQSHL */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_uqshl, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_sqshl, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11072,16 +11079,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn = fns[size][u];
break;
}
- case 0x9: /* SQSHL, UQSHL */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
- { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
- { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
- };
- genenvfn = fns[size][u];
- break;
- }
case 0xb: /* SQRSHL, UQRSHL */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 337488bbf1..a3eec47092 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -796,6 +796,8 @@ DO_3SAME(VQSUB_S, gen_gvec_sqsub_qc)
DO_3SAME(VQSUB_U, gen_gvec_uqsub_qc)
DO_3SAME(VRSHL_S, gen_gvec_srshl)
DO_3SAME(VRSHL_U, gen_gvec_urshl)
+DO_3SAME(VQSHL_S, gen_neon_sqshl)
+DO_3SAME(VQSHL_U, gen_neon_uqshl)
/* These insns are all gvec_bitsel but with the inputs in various orders. */
#define DO_3SAME_BITSEL(INSN, O1, O2, O3) \
@@ -931,8 +933,6 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
} \
DO_3SAME_64(INSN, gen_##INSN##_elt)
-DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
-DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
@@ -1000,8 +1000,6 @@ DO_3SAME_32(VHSUB_U, hsub_u)
DO_3SAME_32(VRHADD_S, rhadd_s)
DO_3SAME_32(VRHADD_U, rhadd_u)
-DO_3SAME_32_ENV(VQSHL_S, qshl_s)
-DO_3SAME_32_ENV(VQSHL_U, qshl_u)
DO_3SAME_32_ENV(VQRSHL_S, qrshl_s)
DO_3SAME_32_ENV(VQRSHL_U, qrshl_u)
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 13/33] target/arm: Convert SQSHL, UQSHL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (11 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 14/33] target/arm: Convert SQRSHL and UQRSHL (register) to gvec Richard Henderson
` (20 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++
target/arm/tcg/translate-a64.c | 74 ++++++++++++++++++++++------------
2 files changed, 53 insertions(+), 25 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 9e02776036..85caf37948 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -760,6 +760,8 @@ SSHL_s 0101 1110 111 ..... 01000 1 ..... ..... @rrr_d
USHL_s 0111 1110 111 ..... 01000 1 ..... ..... @rrr_d
SRSHL_s 0101 1110 111 ..... 01010 1 ..... ..... @rrr_d
URSHL_s 0111 1110 111 ..... 01010 1 ..... ..... @rrr_d
+SQSHL_s 0101 1110 ..1 ..... 01001 1 ..... ..... @rrr_e
+UQSHL_s 0111 1110 ..1 ..... 01001 1 ..... ..... @rrr_e
### Advanced SIMD scalar pairwise
@@ -886,6 +888,8 @@ SSHL_v 0.00 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
USHL_v 0.10 1110 ..1 ..... 01000 1 ..... ..... @qrrr_e
SRSHL_v 0.00 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
URSHL_v 0.10 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
+SQSHL_v 0.00 1110 ..1 ..... 01001 1 ..... ..... @qrrr_e
+UQSHL_v 0.10 1110 ..1 ..... 01001 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c88702dad6..97bd69eb3f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5119,6 +5119,49 @@ TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
+typedef struct ENVScalar2 {
+ NeonGenTwoOpEnvFn *gen_bhs[3];
+ NeonGenTwo64OpEnvFn *gen_d;
+} ENVScalar2;
+
+static bool do_env_scalar2(DisasContext *s, arg_rrr_e *a, const ENVScalar2 *f)
+{
+ if (!fp_access_check(s)) {
+ return true;
+ }
+ if (a->esz == MO_64) {
+ TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+ TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+ f->gen_d(t0, tcg_env, t0, t1);
+ write_fp_dreg(s, a->rd, t0);
+ } else {
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, 0, a->esz);
+ f->gen_bhs[a->esz](t0, tcg_env, t0, t1);
+ write_fp_sreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+static const ENVScalar2 f_scalar_sqshl = {
+ { gen_helper_neon_qshl_s8,
+ gen_helper_neon_qshl_s16,
+ gen_helper_neon_qshl_s32 },
+ gen_helper_neon_qshl_s64,
+};
+TRANS(SQSHL_s, do_env_scalar2, a, &f_scalar_sqshl)
+
+static const ENVScalar2 f_scalar_uqshl = {
+ { gen_helper_neon_qshl_u8,
+ gen_helper_neon_qshl_u16,
+ gen_helper_neon_qshl_u32 },
+ gen_helper_neon_qshl_u64,
+};
+TRANS(UQSHL_s, do_env_scalar2, a, &f_scalar_uqshl)
+
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5368,6 +5411,8 @@ TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
+TRANS(SQSHL_v, do_gvec_fn3, a, gen_neon_sqshl)
+TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
/*
@@ -9381,13 +9426,6 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
break;
- case 0x9: /* SQSHL, UQSHL */
- if (u) {
- gen_helper_neon_qshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
- } else {
- gen_helper_neon_qshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
- }
- break;
case 0xb: /* SQRSHL, UQRSHL */
if (u) {
gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9406,6 +9444,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
case 0x1: /* SQADD / UQADD */
case 0x5: /* SQSUB / UQSUB */
case 0x8: /* SSHL, USHL */
+ case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
g_assert_not_reached();
}
@@ -9428,7 +9467,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
TCGv_i64 tcg_rd;
switch (opcode) {
- case 0x9: /* SQSHL, UQSHL */
case 0xb: /* SQRSHL, UQRSHL */
break;
case 0x6: /* CMGT, CMHI */
@@ -9450,6 +9488,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
case 0x8: /* SSHL, USHL */
+ case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
unallocated_encoding(s);
return;
@@ -9477,16 +9516,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
switch (opcode) {
- case 0x9: /* SQSHL, UQSHL */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
- { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
- { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
- };
- genenvfn = fns[size][u];
- break;
- }
case 0xb: /* SQRSHL, UQRSHL */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -9510,6 +9539,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
default:
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
+ case 0x9: /* SQSHL, UQSHL */
g_assert_not_reached();
}
@@ -10927,6 +10957,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x01: /* SQADD, UQADD */
case 0x05: /* SQSUB, UQSUB */
case 0x08: /* SSHL, USHL */
+ case 0x09: /* SQSHL, UQSHL */
case 0x0a: /* SRSHL, URSHL */
unallocated_encoding(s);
return;
@@ -10937,13 +10968,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x09: /* SQSHL, UQSHL */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_uqshl, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_sqshl, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 14/33] target/arm: Convert SQRSHL and UQRSHL (register) to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (12 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 13/33] target/arm: Convert SQSHL, UQSHL to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 15/33] target/arm: Convert SQRSHL, UQRSHL to decodetree Richard Henderson
` (19 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 8 ++++++
target/arm/tcg/translate.h | 4 +++
target/arm/tcg/neon-dp.decode | 17 ++----------
target/arm/tcg/gengvec.c | 24 ++++++++++++++++
target/arm/tcg/neon_helper.c | 24 ++++++++++++++++
target/arm/tcg/translate-a64.c | 17 +++++-------
target/arm/tcg/translate-neon.c | 49 ++-------------------------------
7 files changed, 71 insertions(+), 72 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index f345087ddb..9a89c9cea7 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -334,6 +334,14 @@ DEF_HELPER_FLAGS_5(neon_uqshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(neon_uqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(neon_uqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(neon_uqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 6c6d4d49e7..048cb45ebe 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -467,6 +467,10 @@ void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_uqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index 6d4996b8d8..788578c8fa 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -102,25 +102,12 @@ VCGE_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
VSHL_S_3s 1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same_rev
VSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
-
-# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
-# The _rev suffix indicates that Vn and Vm are reversed (as explained
-# by the comment for the @3same_rev format).
-@3same_64_rev .... ... . . . 11 .... .... .... . q:1 . . .... \
- &3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
-
VQSHL_S_3s 1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_rev
VQSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_rev
VRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_rev
VRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_rev
-{
- VQRSHL_S64_3s 1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
- VQRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_rev
-}
-{
- VQRSHL_U64_3s 1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
- VQRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_rev
-}
+VQRSHL_S_3s 1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_rev
+VQRSHL_U_3s 1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_rev
VMAX_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
VMAX_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 63c3ec2e73..6dc96269d5 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1264,6 +1264,30 @@ void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
opr_sz, max_sz, 0, fns[vece]);
}
+void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[] = {
+ gen_helper_neon_sqrshl_b, gen_helper_neon_sqrshl_h,
+ gen_helper_neon_sqrshl_s, gen_helper_neon_sqrshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+ opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_neon_uqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[] = {
+ gen_helper_neon_uqrshl_b, gen_helper_neon_uqrshl_h,
+ gen_helper_neon_uqrshl_s, gen_helper_neon_uqrshl_d,
+ };
+ tcg_debug_assert(vece <= MO_64);
+ tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+ opr_sz, max_sz, 0, fns[vece]);
+}
+
void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp esz)
{
uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 88301f0dcb..b29a7c725f 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -435,11 +435,23 @@ uint64_t HELPER(neon_qshlu_s64)(CPUARMState *env, uint64_t val, uint64_t shift)
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 8, true, env->vfp.qc))
NEON_VOP_ENV(qrshl_u8, neon_u8, 4)
+NEON_GVEC_VOP2_ENV(neon_uqrshl_b, uint8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_uqrshl_bhs(src1, (int8_t)src2, 16, true, env->vfp.qc))
NEON_VOP_ENV(qrshl_u16, neon_u16, 2)
+NEON_GVEC_VOP2_ENV(neon_uqrshl_h, uint16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, true, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_uqrshl_s, uint32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_uqrshl_d(src1, (int8_t)src2, true, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_uqrshl_d, uint64_t)
#undef NEON_FN
uint32_t HELPER(neon_qrshl_u32)(CPUARMState *env, uint32_t val, uint32_t shift)
@@ -455,11 +467,23 @@ uint64_t HELPER(neon_qrshl_u64)(CPUARMState *env, uint64_t val, uint64_t shift)
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 8, true, env->vfp.qc))
NEON_VOP_ENV(qrshl_s8, neon_s8, 4)
+NEON_GVEC_VOP2_ENV(neon_sqrshl_b, int8_t)
#undef NEON_FN
#define NEON_FN(dest, src1, src2) \
(dest = do_sqrshl_bhs(src1, (int8_t)src2, 16, true, env->vfp.qc))
NEON_VOP_ENV(qrshl_s16, neon_s16, 2)
+NEON_GVEC_VOP2_ENV(neon_sqrshl_h, int16_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, true, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_sqrshl_s, int32_t)
+#undef NEON_FN
+
+#define NEON_FN(dest, src1, src2) \
+ (dest = do_sqrshl_d(src1, (int8_t)src2, true, env->vfp.qc))
+NEON_GVEC_VOP2_ENV(neon_sqrshl_d, int64_t)
#undef NEON_FN
uint32_t HELPER(neon_qrshl_s32)(CPUARMState *env, uint32_t val, uint32_t shift)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 97bd69eb3f..b9d577f620 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10968,6 +10968,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x0b: /* SQRSHL, UQRSHL */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_uqrshl, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_sqrshl, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11103,16 +11110,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn = fns[size][u];
break;
}
- case 0xb: /* SQRSHL, UQRSHL */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
- { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
- { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
- };
- genenvfn = fns[size][u];
- break;
- }
default:
g_assert_not_reached();
}
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index a3eec47092..5f1576393e 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -798,6 +798,8 @@ DO_3SAME(VRSHL_S, gen_gvec_srshl)
DO_3SAME(VRSHL_U, gen_gvec_urshl)
DO_3SAME(VQSHL_S, gen_neon_sqshl)
DO_3SAME(VQSHL_U, gen_neon_uqshl)
+DO_3SAME(VQRSHL_S, gen_neon_sqrshl)
+DO_3SAME(VQRSHL_U, gen_neon_uqrshl)
/* These insns are all gvec_bitsel but with the inputs in various orders. */
#define DO_3SAME_BITSEL(INSN, O1, O2, O3) \
@@ -916,26 +918,6 @@ DO_SHA2(SHA256H, gen_helper_crypto_sha256h)
DO_SHA2(SHA256H2, gen_helper_crypto_sha256h2)
DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
-#define DO_3SAME_64(INSN, FUNC) \
- static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
- uint32_t rn_ofs, uint32_t rm_ofs, \
- uint32_t oprsz, uint32_t maxsz) \
- { \
- static const GVecGen3 op = { .fni8 = FUNC }; \
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &op); \
- } \
- DO_3SAME(INSN, gen_##INSN##_3s)
-
-#define DO_3SAME_64_ENV(INSN, FUNC) \
- static void gen_##INSN##_elt(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m) \
- { \
- FUNC(d, tcg_env, n, m); \
- } \
- DO_3SAME_64(INSN, gen_##INSN##_elt)
-
-DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
-DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
-
#define DO_3SAME_32(INSN, FUNC) \
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
uint32_t rn_ofs, uint32_t rm_ofs, \
@@ -969,30 +951,6 @@ DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
FUNC(d, tcg_env, n, m); \
}
-#define DO_3SAME_32_ENV(INSN, FUNC) \
- WRAP_ENV_FN(gen_##INSN##_tramp8, gen_helper_neon_##FUNC##8); \
- WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##16); \
- WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##32); \
- static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
- uint32_t rn_ofs, uint32_t rm_ofs, \
- uint32_t oprsz, uint32_t maxsz) \
- { \
- static const GVecGen3 ops[4] = { \
- { .fni4 = gen_##INSN##_tramp8 }, \
- { .fni4 = gen_##INSN##_tramp16 }, \
- { .fni4 = gen_##INSN##_tramp32 }, \
- { 0 }, \
- }; \
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece]); \
- } \
- static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a) \
- { \
- if (a->size > 2) { \
- return false; \
- } \
- return do_3same(s, a, gen_##INSN##_3s); \
- }
-
DO_3SAME_32(VHADD_S, hadd_s)
DO_3SAME_32(VHADD_U, hadd_u)
DO_3SAME_32(VHSUB_S, hsub_s)
@@ -1000,9 +958,6 @@ DO_3SAME_32(VHSUB_U, hsub_u)
DO_3SAME_32(VRHADD_S, rhadd_s)
DO_3SAME_32(VRHADD_U, rhadd_u)
-DO_3SAME_32_ENV(VQRSHL_S, qrshl_s)
-DO_3SAME_32_ENV(VQRSHL_U, qrshl_u)
-
#define DO_3SAME_VQDMULH(INSN, FUNC) \
WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16); \
WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32); \
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 15/33] target/arm: Convert SQRSHL, UQRSHL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (13 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 14/33] target/arm: Convert SQRSHL and UQRSHL (register) to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 16/33] target/arm: Convert ADD, SUB (vector) " Richard Henderson
` (18 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 +++
target/arm/tcg/translate-a64.c | 48 ++++++++++++++++------------------
2 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 85caf37948..96ce35ad40 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -762,6 +762,8 @@ SRSHL_s 0101 1110 111 ..... 01010 1 ..... ..... @rrr_d
URSHL_s 0111 1110 111 ..... 01010 1 ..... ..... @rrr_d
SQSHL_s 0101 1110 ..1 ..... 01001 1 ..... ..... @rrr_e
UQSHL_s 0111 1110 ..1 ..... 01001 1 ..... ..... @rrr_e
+SQRSHL_s 0101 1110 ..1 ..... 01011 1 ..... ..... @rrr_e
+UQRSHL_s 0111 1110 ..1 ..... 01011 1 ..... ..... @rrr_e
### Advanced SIMD scalar pairwise
@@ -890,6 +892,8 @@ SRSHL_v 0.00 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
URSHL_v 0.10 1110 ..1 ..... 01010 1 ..... ..... @qrrr_e
SQSHL_v 0.00 1110 ..1 ..... 01001 1 ..... ..... @qrrr_e
UQSHL_v 0.10 1110 ..1 ..... 01001 1 ..... ..... @qrrr_e
+SQRSHL_v 0.00 1110 ..1 ..... 01011 1 ..... ..... @qrrr_e
+UQRSHL_v 0.10 1110 ..1 ..... 01011 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index b9d577f620..2424c6d314 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5162,6 +5162,22 @@ static const ENVScalar2 f_scalar_uqshl = {
};
TRANS(UQSHL_s, do_env_scalar2, a, &f_scalar_uqshl)
+static const ENVScalar2 f_scalar_sqrshl = {
+ { gen_helper_neon_qrshl_s8,
+ gen_helper_neon_qrshl_s16,
+ gen_helper_neon_qrshl_s32 },
+ gen_helper_neon_qrshl_s64,
+};
+TRANS(SQRSHL_s, do_env_scalar2, a, &f_scalar_sqrshl)
+
+static const ENVScalar2 f_scalar_uqrshl = {
+ { gen_helper_neon_qrshl_u8,
+ gen_helper_neon_qrshl_u16,
+ gen_helper_neon_qrshl_u32 },
+ gen_helper_neon_qrshl_u64,
+};
+TRANS(UQRSHL_s, do_env_scalar2, a, &f_scalar_uqrshl)
+
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5413,6 +5429,8 @@ TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
TRANS(SQSHL_v, do_gvec_fn3, a, gen_neon_sqshl)
TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
+TRANS(SQRSHL_v, do_gvec_fn3, a, gen_neon_sqrshl)
+TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
/*
@@ -9426,13 +9444,6 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
break;
- case 0xb: /* SQRSHL, UQRSHL */
- if (u) {
- gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
- } else {
- gen_helper_neon_qrshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
- }
- break;
case 0x10: /* ADD, SUB */
if (u) {
tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
@@ -9446,6 +9457,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
case 0x8: /* SSHL, USHL */
case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
g_assert_not_reached();
}
}
@@ -9467,8 +9479,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
TCGv_i64 tcg_rd;
switch (opcode) {
- case 0xb: /* SQRSHL, UQRSHL */
- break;
case 0x6: /* CMGT, CMHI */
case 0x7: /* CMGE, CMHS */
case 0x11: /* CMTST, CMEQ */
@@ -9490,6 +9500,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x8: /* SSHL, USHL */
case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
unallocated_encoding(s);
return;
}
@@ -9516,16 +9527,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
switch (opcode) {
- case 0xb: /* SQRSHL, UQRSHL */
- {
- static NeonGenTwoOpEnvFn * const fns[3][2] = {
- { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
- { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
- { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
- };
- genenvfn = fns[size][u];
- break;
- }
case 0x16: /* SQDMULH, SQRDMULH */
{
static NeonGenTwoOpEnvFn * const fns[2][2] = {
@@ -9540,6 +9541,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
case 0x9: /* SQSHL, UQSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
g_assert_not_reached();
}
@@ -10959,6 +10961,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x08: /* SSHL, USHL */
case 0x09: /* SQSHL, UQSHL */
case 0x0a: /* SRSHL, URSHL */
+ case 0x0b: /* SQRSHL, UQRSHL */
unallocated_encoding(s);
return;
}
@@ -10968,13 +10971,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0b: /* SQRSHL, UQRSHL */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_uqrshl, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_neon_sqrshl, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 16/33] target/arm: Convert ADD, SUB (vector) to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (14 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 15/33] target/arm: Convert SQRSHL, UQRSHL to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 17/33] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ " Richard Henderson
` (17 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 6 ++++++
target/arm/tcg/translate-a64.c | 22 +++++++---------------
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 96ce35ad40..44383b4fc7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -765,6 +765,9 @@ UQSHL_s 0111 1110 ..1 ..... 01001 1 ..... ..... @rrr_e
SQRSHL_s 0101 1110 ..1 ..... 01011 1 ..... ..... @rrr_e
UQRSHL_s 0111 1110 ..1 ..... 01011 1 ..... ..... @rrr_e
+ADD_s 0101 1110 111 ..... 10000 1 ..... ..... @rrr_d
+SUB_s 0111 1110 111 ..... 10000 1 ..... ..... @rrr_d
+
### Advanced SIMD scalar pairwise
FADDP_s 0101 1110 0011 0000 1101 10 ..... ..... @rr_h
@@ -895,6 +898,9 @@ UQSHL_v 0.10 1110 ..1 ..... 01001 1 ..... ..... @qrrr_e
SQRSHL_v 0.00 1110 ..1 ..... 01011 1 ..... ..... @qrrr_e
UQRSHL_v 0.10 1110 ..1 ..... 01011 1 ..... ..... @qrrr_e
+ADD_v 0.00 1110 ..1 ..... 10000 1 ..... ..... @qrrr_e
+SUB_v 0.10 1110 ..1 ..... 10000 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2424c6d314..77a64923e7 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5118,6 +5118,8 @@ TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
+TRANS(ADD_s, do_int3_scalar_d, a, tcg_gen_add_i64)
+TRANS(SUB_s, do_int3_scalar_d, a, tcg_gen_sub_i64)
typedef struct ENVScalar2 {
NeonGenTwoOpEnvFn *gen_bhs[3];
@@ -5432,6 +5434,8 @@ TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
TRANS(SQRSHL_v, do_gvec_fn3, a, gen_neon_sqrshl)
TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
+TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
+TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -9444,13 +9448,6 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
break;
- case 0x10: /* ADD, SUB */
- if (u) {
- tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
- } else {
- tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
- }
- break;
default:
case 0x1: /* SQADD / UQADD */
case 0x5: /* SQSUB / UQSUB */
@@ -9458,6 +9455,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
case 0xb: /* SQRSHL, UQRSHL */
+ case 0x10: /* ADD, SUB */
g_assert_not_reached();
}
}
@@ -9482,7 +9480,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x6: /* CMGT, CMHI */
case 0x7: /* CMGE, CMHS */
case 0x11: /* CMTST, CMEQ */
- case 0x10: /* ADD, SUB (vector) */
if (size != 3) {
unallocated_encoding(s);
return;
@@ -9501,6 +9498,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
case 0xb: /* SQRSHL, UQRSHL */
+ case 0x10: /* ADD, SUB (vector) */
unallocated_encoding(s);
return;
}
@@ -10962,6 +10960,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x09: /* SQSHL, UQSHL */
case 0x0a: /* SRSHL, URSHL */
case 0x0b: /* SQRSHL, UQRSHL */
+ case 0x10: /* ADD, SUB */
unallocated_encoding(s);
return;
}
@@ -10999,13 +10998,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_saba, size);
}
return;
- case 0x10: /* ADD, SUB */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_sub, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_add, size);
- }
- return;
case 0x13: /* MUL, PMUL */
if (!u) { /* MUL */
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 17/33] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (15 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 16/33] target/arm: Convert ADD, SUB (vector) " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 18/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32, i64} Richard Henderson
` (16 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 12 +++
target/arm/tcg/translate-a64.c | 132 ++++++++++++---------------------
2 files changed, 60 insertions(+), 84 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 44383b4fc7..3061e26242 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -767,6 +767,12 @@ UQRSHL_s 0111 1110 ..1 ..... 01011 1 ..... ..... @rrr_e
ADD_s 0101 1110 111 ..... 10000 1 ..... ..... @rrr_d
SUB_s 0111 1110 111 ..... 10000 1 ..... ..... @rrr_d
+CMGT_s 0101 1110 111 ..... 00110 1 ..... ..... @rrr_d
+CMHI_s 0111 1110 111 ..... 00110 1 ..... ..... @rrr_d
+CMGE_s 0101 1110 111 ..... 00111 1 ..... ..... @rrr_d
+CMHS_s 0111 1110 111 ..... 00111 1 ..... ..... @rrr_d
+CMTST_s 0101 1110 111 ..... 10001 1 ..... ..... @rrr_d
+CMEQ_s 0111 1110 111 ..... 10001 1 ..... ..... @rrr_d
### Advanced SIMD scalar pairwise
@@ -900,6 +906,12 @@ UQRSHL_v 0.10 1110 ..1 ..... 01011 1 ..... ..... @qrrr_e
ADD_v 0.00 1110 ..1 ..... 10000 1 ..... ..... @qrrr_e
SUB_v 0.10 1110 ..1 ..... 10000 1 ..... ..... @qrrr_e
+CMGT_v 0.00 1110 ..1 ..... 00110 1 ..... ..... @qrrr_e
+CMHI_v 0.10 1110 ..1 ..... 00110 1 ..... ..... @qrrr_e
+CMGE_v 0.00 1110 ..1 ..... 00111 1 ..... ..... @qrrr_e
+CMHS_v 0.10 1110 ..1 ..... 00111 1 ..... ..... @qrrr_e
+CMTST_v 0.00 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
+CMEQ_v 0.10 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 77a64923e7..3c6cfc2952 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5180,6 +5180,24 @@ static const ENVScalar2 f_scalar_uqrshl = {
};
TRANS(UQRSHL_s, do_env_scalar2, a, &f_scalar_uqrshl)
+static bool do_cmop_d(DisasContext *s, arg_rrr_e *a, TCGCond cond)
+{
+ if (fp_access_check(s)) {
+ TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+ TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+ tcg_gen_negsetcond_i64(cond, t0, t0, t1);
+ write_fp_dreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+TRANS(CMGT_s, do_cmop_d, a, TCG_COND_GT)
+TRANS(CMHI_s, do_cmop_d, a, TCG_COND_GTU)
+TRANS(CMGE_s, do_cmop_d, a, TCG_COND_GE)
+TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
+TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
+TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
+
static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5437,6 +5455,28 @@ TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
+static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
+{
+ if (a->esz == MO_64 && !a->q) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ tcg_gen_gvec_cmp(cond, a->esz,
+ vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ a->q ? 16 : 8, vec_full_reg_size(s));
+ }
+ return true;
+}
+
+TRANS(CMGT_v, do_cmop_v, a, TCG_COND_GT)
+TRANS(CMHI_v, do_cmop_v, a, TCG_COND_GTU)
+TRANS(CMGE_v, do_cmop_v, a, TCG_COND_GE)
+TRANS(CMHS_v, do_cmop_v, a, TCG_COND_GEU)
+TRANS(CMEQ_v, do_cmop_v, a, TCG_COND_EQ)
+TRANS(CMTST_v, do_gvec_fn3, a, gen_gvec_cmtst)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -9421,45 +9461,6 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-static void handle_3same_64(DisasContext *s, int opcode, bool u,
- TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
-{
- /* Handle 64x64->64 opcodes which are shared between the scalar
- * and vector 3-same groups. We cover every opcode where size == 3
- * is valid in either the three-reg-same (integer, not pairwise)
- * or scalar-three-reg-same groups.
- */
- TCGCond cond;
-
- switch (opcode) {
- case 0x6: /* CMGT, CMHI */
- cond = u ? TCG_COND_GTU : TCG_COND_GT;
- do_cmop:
- /* 64 bit integer comparison, result = test ? -1 : 0. */
- tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
- break;
- case 0x7: /* CMGE, CMHS */
- cond = u ? TCG_COND_GEU : TCG_COND_GE;
- goto do_cmop;
- case 0x11: /* CMTST, CMEQ */
- if (u) {
- cond = TCG_COND_EQ;
- goto do_cmop;
- }
- gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
- break;
- default:
- case 0x1: /* SQADD / UQADD */
- case 0x5: /* SQSUB / UQSUB */
- case 0x8: /* SSHL, USHL */
- case 0x9: /* SQSHL, UQSHL */
- case 0xa: /* SRSHL, URSHL */
- case 0xb: /* SQRSHL, UQRSHL */
- case 0x10: /* ADD, SUB */
- g_assert_not_reached();
- }
-}
-
/* AdvSIMD scalar three same
* 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
* +-----+---+-----------+------+---+------+--------+---+------+------+
@@ -9477,14 +9478,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
TCGv_i64 tcg_rd;
switch (opcode) {
- case 0x6: /* CMGT, CMHI */
- case 0x7: /* CMGE, CMHS */
- case 0x11: /* CMTST, CMEQ */
- if (size != 3) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x16: /* SQDMULH, SQRDMULH (vector) */
if (size != 1 && size != 2) {
unallocated_encoding(s);
@@ -9494,11 +9487,14 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
default:
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
+ case 0x6: /* CMGT, CMHI */
+ case 0x7: /* CMGE, CMHS */
case 0x8: /* SSHL, USHL */
case 0x9: /* SQSHL, UQSHL */
case 0xa: /* SRSHL, URSHL */
case 0xb: /* SQRSHL, UQRSHL */
case 0x10: /* ADD, SUB (vector) */
+ case 0x11: /* CMTST, CMEQ */
unallocated_encoding(s);
return;
}
@@ -9510,10 +9506,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
tcg_rd = tcg_temp_new_i64();
if (size == 3) {
- TCGv_i64 tcg_rn = read_fp_dreg(s, rn);
- TCGv_i64 tcg_rm = read_fp_dreg(s, rm);
-
- handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+ g_assert_not_reached();
} else {
/* Do a single operation on the lowest element in the vector.
* We use the standard Neon helpers and rely on 0 OP 0 == 0 with
@@ -10919,7 +10912,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
int pass;
- TCGCond cond;
switch (opcode) {
case 0x13: /* MUL, PMUL */
@@ -10956,11 +10948,14 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x01: /* SQADD, UQADD */
case 0x05: /* SQSUB, UQSUB */
+ case 0x06: /* CMGT, CMHI */
+ case 0x07: /* CMGE, CMHS */
case 0x08: /* SSHL, USHL */
case 0x09: /* SQSHL, UQSHL */
case 0x0a: /* SRSHL, URSHL */
case 0x0b: /* SQRSHL, UQRSHL */
case 0x10: /* ADD, SUB */
+ case 0x11: /* CMTST, CMEQ */
unallocated_encoding(s);
return;
}
@@ -11021,41 +11016,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
gen_gvec_op3_qc(s, is_q, rd, rn, rm, fns[size - 1][u]);
}
return;
- case 0x11:
- if (!u) { /* CMTST */
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_cmtst, size);
- return;
- }
- /* else CMEQ */
- cond = TCG_COND_EQ;
- goto do_gvec_cmp;
- case 0x06: /* CMGT, CMHI */
- cond = u ? TCG_COND_GTU : TCG_COND_GT;
- goto do_gvec_cmp;
- case 0x07: /* CMGE, CMHS */
- cond = u ? TCG_COND_GEU : TCG_COND_GE;
- do_gvec_cmp:
- tcg_gen_gvec_cmp(cond, size, vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- is_q ? 16 : 8, vec_full_reg_size(s));
- return;
}
if (size == 3) {
- assert(is_q);
- for (pass = 0; pass < 2; pass++) {
- TCGv_i64 tcg_op1 = tcg_temp_new_i64();
- TCGv_i64 tcg_op2 = tcg_temp_new_i64();
- TCGv_i64 tcg_res = tcg_temp_new_i64();
-
- read_vec_element(s, tcg_op1, rn, pass, MO_64);
- read_vec_element(s, tcg_op2, rm, pass, MO_64);
-
- handle_3same_64(s, opcode, u, tcg_res, tcg_op1, tcg_op2);
-
- write_vec_element(s, tcg_res, rd, pass, MO_64);
- }
+ g_assert_not_reached();
} else {
for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
TCGv_i32 tcg_op1 = tcg_temp_new_i32();
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 18/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32, i64}
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (16 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 17/33] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 19/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec Richard Henderson
` (15 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/gengvec.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 6dc96269d5..e64ca02e0c 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -934,14 +934,12 @@ void gen_gvec_mls(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
/* CMTST : test is "if (X & Y != 0)". */
static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
{
- tcg_gen_and_i32(d, a, b);
- tcg_gen_negsetcond_i32(TCG_COND_NE, d, d, tcg_constant_i32(0));
+ tcg_gen_negsetcond_i32(TCG_COND_TSTNE, d, a, b);
}
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
{
- tcg_gen_and_i64(d, a, b);
- tcg_gen_negsetcond_i64(TCG_COND_NE, d, d, tcg_constant_i64(0));
+ tcg_gen_negsetcond_i64(TCG_COND_TSTNE, d, a, b);
}
static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 19/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (17 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 18/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32, i64} Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 20/33] target/arm: Convert SHADD, UHADD to gvec Richard Henderson
` (14 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/gengvec.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index e64ca02e0c..2451d23823 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -944,9 +944,7 @@ void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
{
- tcg_gen_and_vec(vece, d, a, b);
- tcg_gen_dupi_vec(vece, a, 0);
- tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
+ tcg_gen_cmp_vec(TCG_COND_TSTNE, vece, d, a, b);
}
void gen_gvec_cmtst(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 20/33] target/arm: Convert SHADD, UHADD to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (18 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 19/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 21/33] target/arm: Convert SHADD, UHADD to decodetree Richard Henderson
` (13 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 6 --
target/arm/tcg/translate.h | 5 ++
target/arm/tcg/gengvec.c | 144 ++++++++++++++++++++++++++++++++
target/arm/tcg/neon_helper.c | 27 ------
target/arm/tcg/translate-a64.c | 17 ++--
target/arm/tcg/translate-neon.c | 4 +-
6 files changed, 158 insertions(+), 45 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 9a89c9cea7..b26bfcb079 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,12 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
/* neon_helper.c */
-DEF_HELPER_2(neon_hadd_s8, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_u8, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_s16, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_u16, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_s32, s32, s32, s32)
-DEF_HELPER_2(neon_hadd_u32, i32, i32, i32)
DEF_HELPER_2(neon_rhadd_s8, i32, i32, i32)
DEF_HELPER_2(neon_rhadd_u8, i32, i32, i32)
DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 048cb45ebe..dd99d76bf2 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -472,6 +472,11 @@ void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
void gen_neon_uqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 2451d23823..c0627a787b 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1861,3 +1861,147 @@ void gen_gvec_uminp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_debug_assert(vece <= MO_32);
tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
}
+
+static void gen_shadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_and_i64(t, a, b);
+ tcg_gen_vec_sar8i_i64(a, a, 1);
+ tcg_gen_vec_sar8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_add8_i64(d, a, b);
+ tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_shadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_and_i64(t, a, b);
+ tcg_gen_vec_sar16i_i64(a, a, 1);
+ tcg_gen_vec_sar16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_add16_i64(d, a, b);
+ tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_shadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_and_i32(t, a, b);
+ tcg_gen_sari_i32(a, a, 1);
+ tcg_gen_sari_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_add_i32(d, a, b);
+ tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_shadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_and_vec(vece, t, a, b);
+ tcg_gen_sari_vec(vece, a, a, 1);
+ tcg_gen_sari_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_add_vec(vece, d, a, b);
+ tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 g[] = {
+ { .fni8 = gen_shadd8_i64,
+ .fniv = gen_shadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_shadd16_i64,
+ .fniv = gen_shadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_shadd_i32,
+ .fniv = gen_shadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ tcg_debug_assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
+
+static void gen_uhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_and_i64(t, a, b);
+ tcg_gen_vec_shr8i_i64(a, a, 1);
+ tcg_gen_vec_shr8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_add8_i64(d, a, b);
+ tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_uhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_and_i64(t, a, b);
+ tcg_gen_vec_shr16i_i64(a, a, 1);
+ tcg_gen_vec_shr16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_add16_i64(d, a, b);
+ tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_uhadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_and_i32(t, a, b);
+ tcg_gen_shri_i32(a, a, 1);
+ tcg_gen_shri_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_add_i32(d, a, b);
+ tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_uhadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_and_vec(vece, t, a, b);
+ tcg_gen_shri_vec(vece, a, a, 1);
+ tcg_gen_shri_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_add_vec(vece, d, a, b);
+ tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 g[] = {
+ { .fni8 = gen_uhadd8_i64,
+ .fniv = gen_uhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_uhadd16_i64,
+ .fniv = gen_uhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_uhadd_i32,
+ .fniv = gen_uhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ tcg_debug_assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index b29a7c725f..defd28a6f7 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -179,33 +179,6 @@ uint32_t HELPER(glue(neon_,name))(uint32_t arg) \
return arg; \
}
-#define NEON_FN(dest, src1, src2) dest = (src1 + src2) >> 1
-NEON_VOP(hadd_s8, neon_s8, 4)
-NEON_VOP(hadd_u8, neon_u8, 4)
-NEON_VOP(hadd_s16, neon_s16, 2)
-NEON_VOP(hadd_u16, neon_u16, 2)
-#undef NEON_FN
-
-int32_t HELPER(neon_hadd_s32)(int32_t src1, int32_t src2)
-{
- int32_t dest;
-
- dest = (src1 >> 1) + (src2 >> 1);
- if (src1 & src2 & 1)
- dest++;
- return dest;
-}
-
-uint32_t HELPER(neon_hadd_u32)(uint32_t src1, uint32_t src2)
-{
- uint32_t dest;
-
- dest = (src1 >> 1) + (src2 >> 1);
- if (src1 & src2 & 1)
- dest++;
- return dest;
-}
-
#define NEON_FN(dest, src1, src2) dest = (src1 + src2 + 1) >> 1
NEON_VOP(rhadd_s8, neon_s8, 4)
NEON_VOP(rhadd_u8, neon_u8, 4)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3c6cfc2952..5f3423513d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10965,6 +10965,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x00: /* SHADD, UHADD */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhadd, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shadd, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11032,16 +11039,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
switch (opcode) {
- case 0x0: /* SHADD, UHADD */
- {
- static NeonGenTwoOpFn * const fns[3][2] = {
- { gen_helper_neon_hadd_s8, gen_helper_neon_hadd_u8 },
- { gen_helper_neon_hadd_s16, gen_helper_neon_hadd_u16 },
- { gen_helper_neon_hadd_s32, gen_helper_neon_hadd_u32 },
- };
- genfn = fns[size][u];
- break;
- }
case 0x2: /* SRHADD, URHADD */
{
static NeonGenTwoOpFn * const fns[3][2] = {
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 5f1576393e..29e5c4a0a3 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -841,6 +841,8 @@ DO_3SAME_NO_SZ_3(VPMAX_S, gen_gvec_smaxp)
DO_3SAME_NO_SZ_3(VPMIN_S, gen_gvec_sminp)
DO_3SAME_NO_SZ_3(VPMAX_U, gen_gvec_umaxp)
DO_3SAME_NO_SZ_3(VPMIN_U, gen_gvec_uminp)
+DO_3SAME_NO_SZ_3(VHADD_S, gen_gvec_shadd)
+DO_3SAME_NO_SZ_3(VHADD_U, gen_gvec_uhadd)
#define DO_3SAME_CMP(INSN, COND) \
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
@@ -951,8 +953,6 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
FUNC(d, tcg_env, n, m); \
}
-DO_3SAME_32(VHADD_S, hadd_s)
-DO_3SAME_32(VHADD_U, hadd_u)
DO_3SAME_32(VHSUB_S, hsub_s)
DO_3SAME_32(VHSUB_U, hsub_u)
DO_3SAME_32(VRHADD_S, rhadd_s)
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 21/33] target/arm: Convert SHADD, UHADD to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (19 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 20/33] target/arm: Convert SHADD, UHADD to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 22/33] target/arm: Convert SHSUB, UHSUB to gvec Richard Henderson
` (12 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 ++
target/arm/tcg/translate-a64.c | 11 +++--------
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 3061e26242..e33d91fd0a 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -912,6 +912,8 @@ CMGE_v 0.00 1110 ..1 ..... 00111 1 ..... ..... @qrrr_e
CMHS_v 0.10 1110 ..1 ..... 00111 1 ..... ..... @qrrr_e
CMTST_v 0.00 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
CMEQ_v 0.10 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
+SHADD_v 0.00 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
+UHADD_v 0.10 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5f3423513d..00c04425c1 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5454,6 +5454,8 @@ TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
+TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
+TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -10920,7 +10922,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
return;
}
/* fall through */
- case 0x0: /* SHADD, UHADD */
case 0x2: /* SRHADD, URHADD */
case 0x4: /* SHSUB, UHSUB */
case 0xc: /* SMAX, UMAX */
@@ -10946,6 +10947,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
break;
+ case 0x0: /* SHADD, UHADD */
case 0x01: /* SQADD, UQADD */
case 0x05: /* SQSUB, UQSUB */
case 0x06: /* CMGT, CMHI */
@@ -10965,13 +10967,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x00: /* SHADD, UHADD */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhadd, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shadd, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 22/33] target/arm: Convert SHSUB, UHSUB to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (20 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 21/33] target/arm: Convert SHADD, UHADD to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 23/33] target/arm: Convert SHSUB, UHSUB to decodetree Richard Henderson
` (11 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 6 --
target/arm/tcg/translate.h | 4 +
target/arm/tcg/gengvec.c | 144 ++++++++++++++++++++++++++++++++
target/arm/tcg/neon_helper.c | 27 ------
target/arm/tcg/translate-a64.c | 17 ++--
target/arm/tcg/translate-neon.c | 4 +-
6 files changed, 157 insertions(+), 45 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index b26bfcb079..b95f24ed0a 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -274,12 +274,6 @@ DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
DEF_HELPER_2(neon_rhadd_u16, i32, i32, i32)
DEF_HELPER_2(neon_rhadd_s32, s32, s32, s32)
DEF_HELPER_2(neon_rhadd_u32, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s8, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_u8, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s16, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_u16, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s32, s32, s32, s32)
-DEF_HELPER_2(neon_hsub_u32, i32, i32, i32)
DEF_HELPER_2(neon_pmin_u8, i32, i32, i32)
DEF_HELPER_2(neon_pmin_s8, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index dd99d76bf2..315e0afd04 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -476,6 +476,10 @@ void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index c0627a787b..c46365c3a6 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2005,3 +2005,147 @@ void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
tcg_debug_assert(vece <= MO_32);
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
}
+
+static void gen_shsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_andc_i64(t, b, a);
+ tcg_gen_vec_sar8i_i64(a, a, 1);
+ tcg_gen_vec_sar8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_sub8_i64(d, a, b);
+ tcg_gen_vec_sub8_i64(d, d, t);
+}
+
+static void gen_shsub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_andc_i64(t, b, a);
+ tcg_gen_vec_sar16i_i64(a, a, 1);
+ tcg_gen_vec_sar16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_sub16_i64(d, a, b);
+ tcg_gen_vec_sub16_i64(d, d, t);
+}
+
+static void gen_shsub_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_andc_i32(t, b, a);
+ tcg_gen_sari_i32(a, a, 1);
+ tcg_gen_sari_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_sub_i32(d, a, b);
+ tcg_gen_sub_i32(d, d, t);
+}
+
+static void gen_shsub_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_andc_vec(vece, t, b, a);
+ tcg_gen_sari_vec(vece, a, a, 1);
+ tcg_gen_sari_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_sub_vec(vece, d, a, b);
+ tcg_gen_sub_vec(vece, d, d, t);
+}
+
+void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 g[4] = {
+ { .fni8 = gen_shsub8_i64,
+ .fniv = gen_shsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_shsub16_i64,
+ .fniv = gen_shsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_shsub_i32,
+ .fniv = gen_shsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
+
+static void gen_uhsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_andc_i64(t, b, a);
+ tcg_gen_vec_shr8i_i64(a, a, 1);
+ tcg_gen_vec_shr8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_sub8_i64(d, a, b);
+ tcg_gen_vec_sub8_i64(d, d, t);
+}
+
+static void gen_uhsub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_andc_i64(t, b, a);
+ tcg_gen_vec_shr16i_i64(a, a, 1);
+ tcg_gen_vec_shr16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_sub16_i64(d, a, b);
+ tcg_gen_vec_sub16_i64(d, d, t);
+}
+
+static void gen_uhsub_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_andc_i32(t, b, a);
+ tcg_gen_shri_i32(a, a, 1);
+ tcg_gen_shri_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_sub_i32(d, a, b);
+ tcg_gen_sub_i32(d, d, t);
+}
+
+static void gen_uhsub_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_andc_vec(vece, t, b, a);
+ tcg_gen_shri_vec(vece, a, a, 1);
+ tcg_gen_shri_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_sub_vec(vece, d, a, b);
+ tcg_gen_sub_vec(vece, d, d, t);
+}
+
+void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 g[4] = {
+ { .fni8 = gen_uhsub8_i64,
+ .fniv = gen_uhsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_uhsub16_i64,
+ .fniv = gen_uhsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_uhsub_i32,
+ .fniv = gen_uhsub_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index defd28a6f7..d1641a5252 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -206,33 +206,6 @@ uint32_t HELPER(neon_rhadd_u32)(uint32_t src1, uint32_t src2)
return dest;
}
-#define NEON_FN(dest, src1, src2) dest = (src1 - src2) >> 1
-NEON_VOP(hsub_s8, neon_s8, 4)
-NEON_VOP(hsub_u8, neon_u8, 4)
-NEON_VOP(hsub_s16, neon_s16, 2)
-NEON_VOP(hsub_u16, neon_u16, 2)
-#undef NEON_FN
-
-int32_t HELPER(neon_hsub_s32)(int32_t src1, int32_t src2)
-{
- int32_t dest;
-
- dest = (src1 >> 1) - (src2 >> 1);
- if ((~src1) & src2 & 1)
- dest--;
- return dest;
-}
-
-uint32_t HELPER(neon_hsub_u32)(uint32_t src1, uint32_t src2)
-{
- uint32_t dest;
-
- dest = (src1 >> 1) - (src2 >> 1);
- if ((~src1) & src2 & 1)
- dest--;
- return dest;
-}
-
#define NEON_FN(dest, src1, src2) dest = (src1 < src2) ? src1 : src2
NEON_POP(pmin_s8, neon_s8, 4)
NEON_POP(pmin_u8, neon_u8, 4)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 00c04425c1..63f7a59f94 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10967,6 +10967,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x04: /* SHSUB, UHSUB */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhsub, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shsub, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11044,16 +11051,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn = fns[size][u];
break;
}
- case 0x4: /* SHSUB, UHSUB */
- {
- static NeonGenTwoOpFn * const fns[3][2] = {
- { gen_helper_neon_hsub_s8, gen_helper_neon_hsub_u8 },
- { gen_helper_neon_hsub_s16, gen_helper_neon_hsub_u16 },
- { gen_helper_neon_hsub_s32, gen_helper_neon_hsub_u32 },
- };
- genfn = fns[size][u];
- break;
- }
default:
g_assert_not_reached();
}
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 29e5c4a0a3..d59d5804c5 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -843,6 +843,8 @@ DO_3SAME_NO_SZ_3(VPMAX_U, gen_gvec_umaxp)
DO_3SAME_NO_SZ_3(VPMIN_U, gen_gvec_uminp)
DO_3SAME_NO_SZ_3(VHADD_S, gen_gvec_shadd)
DO_3SAME_NO_SZ_3(VHADD_U, gen_gvec_uhadd)
+DO_3SAME_NO_SZ_3(VHSUB_S, gen_gvec_shsub)
+DO_3SAME_NO_SZ_3(VHSUB_U, gen_gvec_uhsub)
#define DO_3SAME_CMP(INSN, COND) \
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
@@ -953,8 +955,6 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
FUNC(d, tcg_env, n, m); \
}
-DO_3SAME_32(VHSUB_S, hsub_s)
-DO_3SAME_32(VHSUB_U, hsub_u)
DO_3SAME_32(VRHADD_S, rhadd_s)
DO_3SAME_32(VRHADD_U, rhadd_u)
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 23/33] target/arm: Convert SHSUB, UHSUB to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (21 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 22/33] target/arm: Convert SHSUB, UHSUB to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 24/33] target/arm: Convert SRHADD, URHADD to gvec Richard Henderson
` (10 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 ++
target/arm/tcg/translate-a64.c | 11 +++--------
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index e33d91fd0a..b1bbcb144e 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -914,6 +914,8 @@ CMTST_v 0.00 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
CMEQ_v 0.10 1110 ..1 ..... 10001 1 ..... ..... @qrrr_e
SHADD_v 0.00 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
UHADD_v 0.10 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
+SHSUB_v 0.00 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
+UHSUB_v 0.10 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 63f7a59f94..6571b999f4 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5456,6 +5456,8 @@ TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
+TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
+TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -10923,7 +10925,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
/* fall through */
case 0x2: /* SRHADD, URHADD */
- case 0x4: /* SHSUB, UHSUB */
case 0xc: /* SMAX, UMAX */
case 0xd: /* SMIN, UMIN */
case 0xe: /* SABD, UABD */
@@ -10949,6 +10950,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x0: /* SHADD, UHADD */
case 0x01: /* SQADD, UQADD */
+ case 0x04: /* SHSUB, UHSUB */
case 0x05: /* SQSUB, UQSUB */
case 0x06: /* CMGT, CMHI */
case 0x07: /* CMGE, CMHS */
@@ -10967,13 +10969,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x04: /* SHSUB, UHSUB */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhsub, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shsub, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 24/33] target/arm: Convert SRHADD, URHADD to gvec
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (22 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 23/33] target/arm: Convert SHSUB, UHSUB to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 25/33] target/arm: Convert SRHADD, URHADD to decodetree Richard Henderson
` (9 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 7 --
target/arm/tcg/translate.h | 4 +
target/arm/tcg/gengvec.c | 144 ++++++++++++++++++++++++++++++++
target/arm/tcg/neon_helper.c | 27 ------
target/arm/tcg/translate-a64.c | 48 ++---------
target/arm/tcg/translate-neon.c | 26 +-----
6 files changed, 158 insertions(+), 98 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index b95f24ed0a..85f9302563 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,13 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
/* neon_helper.c */
-DEF_HELPER_2(neon_rhadd_s8, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_u8, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_u16, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_s32, s32, s32, s32)
-DEF_HELPER_2(neon_rhadd_u32, i32, i32, i32)
-
DEF_HELPER_2(neon_pmin_u8, i32, i32, i32)
DEF_HELPER_2(neon_pmin_s8, i32, i32, i32)
DEF_HELPER_2(neon_pmin_u16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 315e0afd04..3b1e68b779 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -480,6 +480,10 @@ void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_srhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_urhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index c46365c3a6..119826bf28 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2149,3 +2149,147 @@ void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
assert(vece <= MO_32);
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
}
+
+static void gen_srhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_or_i64(t, a, b);
+ tcg_gen_vec_sar8i_i64(a, a, 1);
+ tcg_gen_vec_sar8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_add8_i64(d, a, b);
+ tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_srhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_or_i64(t, a, b);
+ tcg_gen_vec_sar16i_i64(a, a, 1);
+ tcg_gen_vec_sar16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_add16_i64(d, a, b);
+ tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_srhadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_or_i32(t, a, b);
+ tcg_gen_sari_i32(a, a, 1);
+ tcg_gen_sari_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_add_i32(d, a, b);
+ tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_srhadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_or_vec(vece, t, a, b);
+ tcg_gen_sari_vec(vece, a, a, 1);
+ tcg_gen_sari_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_add_vec(vece, d, a, b);
+ tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_srhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 g[] = {
+ { .fni8 = gen_srhadd8_i64,
+ .fniv = gen_srhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_srhadd16_i64,
+ .fniv = gen_srhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_srhadd_i32,
+ .fniv = gen_srhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
+
+static void gen_urhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_or_i64(t, a, b);
+ tcg_gen_vec_shr8i_i64(a, a, 1);
+ tcg_gen_vec_shr8i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+ tcg_gen_vec_add8_i64(d, a, b);
+ tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_urhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_or_i64(t, a, b);
+ tcg_gen_vec_shr16i_i64(a, a, 1);
+ tcg_gen_vec_shr16i_i64(b, b, 1);
+ tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+ tcg_gen_vec_add16_i64(d, a, b);
+ tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_urhadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+ TCGv_i32 t = tcg_temp_new_i32();
+
+ tcg_gen_or_i32(t, a, b);
+ tcg_gen_shri_i32(a, a, 1);
+ tcg_gen_shri_i32(b, b, 1);
+ tcg_gen_andi_i32(t, t, 1);
+ tcg_gen_add_i32(d, a, b);
+ tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_urhadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+ tcg_gen_or_vec(vece, t, a, b);
+ tcg_gen_shri_vec(vece, a, a, 1);
+ tcg_gen_shri_vec(vece, b, b, 1);
+ tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+ tcg_gen_add_vec(vece, d, a, b);
+ tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_urhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 g[] = {
+ { .fni8 = gen_urhadd8_i64,
+ .fniv = gen_urhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_8 },
+ { .fni8 = gen_urhadd16_i64,
+ .fniv = gen_urhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_16 },
+ { .fni4 = gen_urhadd_i32,
+ .fniv = gen_urhadd_vec,
+ .opt_opc = vecop_list,
+ .vece = MO_32 },
+ };
+ assert(vece <= MO_32);
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]);
+}
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index d1641a5252..082bfd88ad 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -179,33 +179,6 @@ uint32_t HELPER(glue(neon_,name))(uint32_t arg) \
return arg; \
}
-#define NEON_FN(dest, src1, src2) dest = (src1 + src2 + 1) >> 1
-NEON_VOP(rhadd_s8, neon_s8, 4)
-NEON_VOP(rhadd_u8, neon_u8, 4)
-NEON_VOP(rhadd_s16, neon_s16, 2)
-NEON_VOP(rhadd_u16, neon_u16, 2)
-#undef NEON_FN
-
-int32_t HELPER(neon_rhadd_s32)(int32_t src1, int32_t src2)
-{
- int32_t dest;
-
- dest = (src1 >> 1) + (src2 >> 1);
- if ((src1 | src2) & 1)
- dest++;
- return dest;
-}
-
-uint32_t HELPER(neon_rhadd_u32)(uint32_t src1, uint32_t src2)
-{
- uint32_t dest;
-
- dest = (src1 >> 1) + (src2 >> 1);
- if ((src1 | src2) & 1)
- dest++;
- return dest;
-}
-
#define NEON_FN(dest, src1, src2) dest = (src1 < src2) ? src1 : src2
NEON_POP(pmin_s8, neon_s8, 4)
NEON_POP(pmin_u8, neon_u8, 4)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6571b999f4..40aa7a9d57 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10915,7 +10915,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
int rm = extract32(insn, 16, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- int pass;
switch (opcode) {
case 0x13: /* MUL, PMUL */
@@ -10969,6 +10968,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
+ case 0x02: /* SRHADD, URHADD */
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urhadd, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srhadd, size);
+ }
+ return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11021,45 +11027,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
return;
}
-
- if (size == 3) {
- g_assert_not_reached();
- } else {
- for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
- TCGv_i32 tcg_op1 = tcg_temp_new_i32();
- TCGv_i32 tcg_op2 = tcg_temp_new_i32();
- TCGv_i32 tcg_res = tcg_temp_new_i32();
- NeonGenTwoOpFn *genfn = NULL;
- NeonGenTwoOpEnvFn *genenvfn = NULL;
-
- read_vec_element_i32(s, tcg_op1, rn, pass, MO_32);
- read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
-
- switch (opcode) {
- case 0x2: /* SRHADD, URHADD */
- {
- static NeonGenTwoOpFn * const fns[3][2] = {
- { gen_helper_neon_rhadd_s8, gen_helper_neon_rhadd_u8 },
- { gen_helper_neon_rhadd_s16, gen_helper_neon_rhadd_u16 },
- { gen_helper_neon_rhadd_s32, gen_helper_neon_rhadd_u32 },
- };
- genfn = fns[size][u];
- break;
- }
- default:
- g_assert_not_reached();
- }
-
- if (genenvfn) {
- genenvfn(tcg_res, tcg_env, tcg_op1, tcg_op2);
- } else {
- genfn(tcg_res, tcg_op1, tcg_op2);
- }
-
- write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
- }
- }
- clear_vec_high(s, is_q, rd);
+ g_assert_not_reached();
}
/* AdvSIMD three same
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index d59d5804c5..f9a8753906 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -845,6 +845,8 @@ DO_3SAME_NO_SZ_3(VHADD_S, gen_gvec_shadd)
DO_3SAME_NO_SZ_3(VHADD_U, gen_gvec_uhadd)
DO_3SAME_NO_SZ_3(VHSUB_S, gen_gvec_shsub)
DO_3SAME_NO_SZ_3(VHSUB_U, gen_gvec_uhsub)
+DO_3SAME_NO_SZ_3(VRHADD_S, gen_gvec_srhadd)
+DO_3SAME_NO_SZ_3(VRHADD_U, gen_gvec_urhadd)
#define DO_3SAME_CMP(INSN, COND) \
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
@@ -922,27 +924,6 @@ DO_SHA2(SHA256H, gen_helper_crypto_sha256h)
DO_SHA2(SHA256H2, gen_helper_crypto_sha256h2)
DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
-#define DO_3SAME_32(INSN, FUNC) \
- static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
- uint32_t rn_ofs, uint32_t rm_ofs, \
- uint32_t oprsz, uint32_t maxsz) \
- { \
- static const GVecGen3 ops[4] = { \
- { .fni4 = gen_helper_neon_##FUNC##8 }, \
- { .fni4 = gen_helper_neon_##FUNC##16 }, \
- { .fni4 = gen_helper_neon_##FUNC##32 }, \
- { 0 }, \
- }; \
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece]); \
- } \
- static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a) \
- { \
- if (a->size > 2) { \
- return false; \
- } \
- return do_3same(s, a, gen_##INSN##_3s); \
- }
-
/*
* Some helper functions need to be passed the tcg_env. In order
* to use those with the gvec APIs like tcg_gen_gvec_3() we need
@@ -955,9 +936,6 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
FUNC(d, tcg_env, n, m); \
}
-DO_3SAME_32(VRHADD_S, rhadd_s)
-DO_3SAME_32(VRHADD_U, rhadd_u)
-
#define DO_3SAME_VQDMULH(INSN, FUNC) \
WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16); \
WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32); \
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 25/33] target/arm: Convert SRHADD, URHADD to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (23 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 24/33] target/arm: Convert SRHADD, URHADD to gvec Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 26/33] target/arm: Convert SMAX, SMIN, UMAX, UMIN " Richard Henderson
` (8 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 ++
target/arm/tcg/translate-a64.c | 11 +++--------
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index b1bbcb144e..1c448b4f7c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -916,6 +916,8 @@ SHADD_v 0.00 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
UHADD_v 0.10 1110 ..1 ..... 00000 1 ..... ..... @qrrr_e
SHSUB_v 0.00 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
UHSUB_v 0.10 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
+SRHADD_v 0.00 1110 ..1 ..... 00010 1 ..... ..... @qrrr_e
+URHADD_v 0.10 1110 ..1 ..... 00010 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 40aa7a9d57..9ef5de6755 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5458,6 +5458,8 @@ TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
+TRANS(SRHADD_v, do_gvec_fn3_no64, a, gen_gvec_srhadd)
+TRANS(URHADD_v, do_gvec_fn3_no64, a, gen_gvec_urhadd)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -10923,7 +10925,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
return;
}
/* fall through */
- case 0x2: /* SRHADD, URHADD */
case 0xc: /* SMAX, UMAX */
case 0xd: /* SMIN, UMIN */
case 0xe: /* SABD, UABD */
@@ -10949,6 +10950,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x0: /* SHADD, UHADD */
case 0x01: /* SQADD, UQADD */
+ case 0x02: /* SRHADD, URHADD */
case 0x04: /* SHSUB, UHSUB */
case 0x05: /* SQSUB, UQSUB */
case 0x06: /* CMGT, CMHI */
@@ -10968,13 +10970,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x02: /* SRHADD, URHADD */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urhadd, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srhadd, size);
- }
- return;
case 0x0c: /* SMAX, UMAX */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 26/33] target/arm: Convert SMAX, SMIN, UMAX, UMIN to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (24 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 25/33] target/arm: Convert SRHADD, URHADD to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 27/33] target/arm: Convert SABA, SABD, UABA, UABD " Richard Henderson
` (7 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++++
target/arm/tcg/translate-a64.c | 22 ++++++----------------
2 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 1c448b4f7c..bc98963bc5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -918,6 +918,10 @@ SHSUB_v 0.00 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
UHSUB_v 0.10 1110 ..1 ..... 00100 1 ..... ..... @qrrr_e
SRHADD_v 0.00 1110 ..1 ..... 00010 1 ..... ..... @qrrr_e
URHADD_v 0.10 1110 ..1 ..... 00010 1 ..... ..... @qrrr_e
+SMAX_v 0.00 1110 ..1 ..... 01100 1 ..... ..... @qrrr_e
+UMAX_v 0.10 1110 ..1 ..... 01100 1 ..... ..... @qrrr_e
+SMIN_v 0.00 1110 ..1 ..... 01101 1 ..... ..... @qrrr_e
+UMIN_v 0.10 1110 ..1 ..... 01101 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9ef5de6755..db6f59df17 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5460,6 +5460,10 @@ TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
TRANS(SRHADD_v, do_gvec_fn3_no64, a, gen_gvec_srhadd)
TRANS(URHADD_v, do_gvec_fn3_no64, a, gen_gvec_urhadd)
+TRANS(SMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smax)
+TRANS(UMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umax)
+TRANS(SMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smin)
+TRANS(UMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umin)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -10925,8 +10929,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
return;
}
/* fall through */
- case 0xc: /* SMAX, UMAX */
- case 0xd: /* SMIN, UMIN */
case 0xe: /* SABD, UABD */
case 0xf: /* SABA, UABA */
case 0x12: /* MLA, MLS */
@@ -10959,6 +10961,8 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x09: /* SQSHL, UQSHL */
case 0x0a: /* SRSHL, URSHL */
case 0x0b: /* SQRSHL, UQRSHL */
+ case 0x0c: /* SMAX, UMAX */
+ case 0x0d: /* SMIN, UMIN */
case 0x10: /* ADD, SUB */
case 0x11: /* CMTST, CMEQ */
unallocated_encoding(s);
@@ -10970,20 +10974,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0c: /* SMAX, UMAX */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smax, size);
- }
- return;
- case 0x0d: /* SMIN, UMIN */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umin, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smin, size);
- }
- return;
case 0xe: /* SABD, UABD */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uabd, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 27/33] target/arm: Convert SABA, SABD, UABA, UABD to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (25 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 26/33] target/arm: Convert SMAX, SMIN, UMAX, UMIN " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 28/33] target/arm: Convert MUL, PMUL " Richard Henderson
` (6 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++++
target/arm/tcg/translate-a64.c | 22 ++++++----------------
2 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index bc98963bc5..07b604ec30 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -922,6 +922,10 @@ SMAX_v 0.00 1110 ..1 ..... 01100 1 ..... ..... @qrrr_e
UMAX_v 0.10 1110 ..1 ..... 01100 1 ..... ..... @qrrr_e
SMIN_v 0.00 1110 ..1 ..... 01101 1 ..... ..... @qrrr_e
UMIN_v 0.10 1110 ..1 ..... 01101 1 ..... ..... @qrrr_e
+SABD_v 0.00 1110 ..1 ..... 01110 1 ..... ..... @qrrr_e
+UABD_v 0.10 1110 ..1 ..... 01110 1 ..... ..... @qrrr_e
+SABA_v 0.00 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
+UABA_v 0.10 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index db6f59df17..61afbc434f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5464,6 +5464,10 @@ TRANS(SMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smax)
TRANS(UMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umax)
TRANS(SMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smin)
TRANS(UMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umin)
+TRANS(SABA_v, do_gvec_fn3_no64, a, gen_gvec_saba)
+TRANS(UABA_v, do_gvec_fn3_no64, a, gen_gvec_uaba)
+TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
+TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -10929,8 +10933,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
return;
}
/* fall through */
- case 0xe: /* SABD, UABD */
- case 0xf: /* SABA, UABA */
case 0x12: /* MLA, MLS */
if (size == 3) {
unallocated_encoding(s);
@@ -10963,6 +10965,8 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x0b: /* SQRSHL, UQRSHL */
case 0x0c: /* SMAX, UMAX */
case 0x0d: /* SMIN, UMIN */
+ case 0x0e: /* SABD, UABD */
+ case 0x0f: /* SABA, UABA */
case 0x10: /* ADD, SUB */
case 0x11: /* CMTST, CMEQ */
unallocated_encoding(s);
@@ -10974,20 +10978,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0xe: /* SABD, UABD */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uabd, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sabd, size);
- }
- return;
- case 0xf: /* SABA, UABA */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uaba, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_saba, size);
- }
- return;
case 0x13: /* MUL, PMUL */
if (!u) { /* MUL */
gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 28/33] target/arm: Convert MUL, PMUL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (26 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 27/33] target/arm: Convert SABA, SABD, UABA, UABD " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 29/33] target/arm: Convert MLA, MLS " Richard Henderson
` (5 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 5 ++++
target/arm/tcg/translate-a64.c | 51 +++++++++++++---------------------
2 files changed, 25 insertions(+), 31 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 07b604ec30..3ea0643370 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -926,6 +926,8 @@ SABD_v 0.00 1110 ..1 ..... 01110 1 ..... ..... @qrrr_e
UABD_v 0.10 1110 ..1 ..... 01110 1 ..... ..... @qrrr_e
SABA_v 0.00 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
UABA_v 0.10 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
+MUL_v 0.00 1110 ..1 ..... 10011 1 ..... ..... @qrrr_e
+PMUL_v 0.10 1110 001 ..... 10011 1 ..... ..... @qrrr_b
### Advanced SIMD scalar x indexed element
@@ -967,3 +969,6 @@ FMLAL_vi 0.00 1111 10 .. .... 0000 . 0 ..... ..... @qrrx_h
FMLSL_vi 0.00 1111 10 .. .... 0100 . 0 ..... ..... @qrrx_h
FMLAL2_vi 0.10 1111 10 .. .... 1000 . 0 ..... ..... @qrrx_h
FMLSL2_vi 0.10 1111 10 .. .... 1100 . 0 ..... ..... @qrrx_h
+
+MUL_vi 0.00 1111 01 .. .... 1000 . 0 ..... ..... @qrrx_h
+MUL_vi 0.00 1111 10 . ..... 1000 . 0 ..... ..... @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 61afbc434f..1909d1426c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5468,6 +5468,8 @@ TRANS(SABA_v, do_gvec_fn3_no64, a, gen_gvec_saba)
TRANS(UABA_v, do_gvec_fn3_no64, a, gen_gvec_uaba)
TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
+TRANS(MUL_v, do_gvec_fn3_no64, a, tcg_gen_gvec_mul)
+TRANS(PMUL_v, do_gvec_op3_ool, a, 0, gen_helper_gvec_pmul_b)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -5694,6 +5696,22 @@ TRANS_FEAT(FMLSL_vi, aa64_fhm, do_fmlal_idx, a, true, false)
TRANS_FEAT(FMLAL2_vi, aa64_fhm, do_fmlal_idx, a, false, true)
TRANS_FEAT(FMLSL2_vi, aa64_fhm, do_fmlal_idx, a, true, true)
+static bool do_int3_vector_idx(DisasContext *s, arg_qrrx_e *a,
+ gen_helper_gvec_3 * const fns[2])
+{
+ assert(a->esz == MO_16 || a->esz == MO_32);
+ if (fp_access_check(s)) {
+ gen_gvec_op3_ool(s, a->q, a->rd, a->rn, a->rm, a->idx, fns[a->esz - 1]);
+ }
+ return true;
+}
+
+static gen_helper_gvec_3 * const f_vector_idx_mul[2] = {
+ gen_helper_gvec_mul_idx_h,
+ gen_helper_gvec_mul_idx_s,
+};
+TRANS(MUL_vi, do_int3_vector_idx, a, f_vector_idx_mul)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10927,12 +10945,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
int rd = extract32(insn, 0, 5);
switch (opcode) {
- case 0x13: /* MUL, PMUL */
- if (u && size != 0) {
- unallocated_encoding(s);
- return;
- }
- /* fall through */
case 0x12: /* MLA, MLS */
if (size == 3) {
unallocated_encoding(s);
@@ -10969,6 +10981,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x0f: /* SABA, UABA */
case 0x10: /* ADD, SUB */
case 0x11: /* CMTST, CMEQ */
+ case 0x13: /* MUL, PMUL */
unallocated_encoding(s);
return;
}
@@ -10978,13 +10991,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x13: /* MUL, PMUL */
- if (!u) { /* MUL */
- gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
- } else { /* PMUL */
- gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, gen_helper_gvec_pmul_b);
- }
- return;
case 0x12: /* MLA, MLS */
if (u) {
gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mls, size);
@@ -12198,7 +12204,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
TCGv_ptr fpst;
switch (16 * u + opcode) {
- case 0x08: /* MUL */
case 0x10: /* MLA */
case 0x14: /* MLS */
if (is_scalar) {
@@ -12285,6 +12290,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x01: /* FMLA */
case 0x04: /* FMLSL */
case 0x05: /* FMLS */
+ case 0x08: /* MUL */
case 0x09: /* FMUL */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
@@ -12407,22 +12413,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
return;
- case 0x08: /* MUL */
- if (!is_long && !is_scalar) {
- static gen_helper_gvec_3 * const fns[3] = {
- gen_helper_gvec_mul_idx_h,
- gen_helper_gvec_mul_idx_s,
- gen_helper_gvec_mul_idx_d,
- };
- tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- is_q ? 16 : 8, vec_full_reg_size(s),
- index, fns[size - 1]);
- return;
- }
- break;
-
case 0x10: /* MLA */
if (!is_long && !is_scalar) {
static gen_helper_gvec_4 * const fns[3] = {
@@ -12491,7 +12481,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
switch (16 * u + opcode) {
- case 0x08: /* MUL */
case 0x10: /* MLA */
case 0x14: /* MLS */
{
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 29/33] target/arm: Convert MLA, MLS to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (27 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 28/33] target/arm: Convert MUL, PMUL " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 30/33] target/arm: Tidy SQDMULH, SQRDMULH (vector) Richard Henderson
` (4 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 8 ++++
target/arm/tcg/translate-a64.c | 77 ++++++++++------------------------
2 files changed, 31 insertions(+), 54 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 3ea0643370..2dea68a0a9 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -928,6 +928,8 @@ SABA_v 0.00 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
UABA_v 0.10 1110 ..1 ..... 01111 1 ..... ..... @qrrr_e
MUL_v 0.00 1110 ..1 ..... 10011 1 ..... ..... @qrrr_e
PMUL_v 0.10 1110 001 ..... 10011 1 ..... ..... @qrrr_b
+MLA_v 0.00 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
+MLS_v 0.10 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
@@ -972,3 +974,9 @@ FMLSL2_vi 0.10 1111 10 .. .... 1100 . 0 ..... ..... @qrrx_h
MUL_vi 0.00 1111 01 .. .... 1000 . 0 ..... ..... @qrrx_h
MUL_vi 0.00 1111 10 . ..... 1000 . 0 ..... ..... @qrrx_s
+
+MLA_vi 0.10 1111 01 .. .... 0000 . 0 ..... ..... @qrrx_h
+MLA_vi 0.10 1111 10 . ..... 0000 . 0 ..... ..... @qrrx_s
+
+MLS_vi 0.10 1111 01 .. .... 0100 . 0 ..... ..... @qrrx_h
+MLS_vi 0.10 1111 10 . ..... 0100 . 0 ..... ..... @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1909d1426c..c4601cde2f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5470,6 +5470,8 @@ TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
TRANS(MUL_v, do_gvec_fn3_no64, a, tcg_gen_gvec_mul)
TRANS(PMUL_v, do_gvec_op3_ool, a, 0, gen_helper_gvec_pmul_b)
+TRANS(MLA_v, do_gvec_fn3_no64, a, gen_gvec_mla)
+TRANS(MLS_v, do_gvec_fn3_no64, a, gen_gvec_mls)
static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
{
@@ -5712,6 +5714,24 @@ static gen_helper_gvec_3 * const f_vector_idx_mul[2] = {
};
TRANS(MUL_vi, do_int3_vector_idx, a, f_vector_idx_mul)
+static bool do_mla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool sub)
+{
+ static gen_helper_gvec_4 * const fns[2][2] = {
+ { gen_helper_gvec_mla_idx_h, gen_helper_gvec_mls_idx_h },
+ { gen_helper_gvec_mla_idx_s, gen_helper_gvec_mls_idx_s },
+ };
+
+ assert(a->esz == MO_16 || a->esz == MO_32);
+ if (fp_access_check(s)) {
+ gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->idx, fns[a->esz - 1][sub]);
+ }
+ return true;
+}
+
+TRANS(MLA_vi, do_mla_vector_idx, a, false)
+TRANS(MLS_vi, do_mla_vector_idx, a, true)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10945,12 +10965,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
int rd = extract32(insn, 0, 5);
switch (opcode) {
- case 0x12: /* MLA, MLS */
- if (size == 3) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x16: /* SQDMULH, SQRDMULH */
if (size == 0 || size == 3) {
unallocated_encoding(s);
@@ -10981,6 +10995,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
case 0x0f: /* SABA, UABA */
case 0x10: /* ADD, SUB */
case 0x11: /* CMTST, CMEQ */
+ case 0x12: /* MLA, MLS */
case 0x13: /* MUL, PMUL */
unallocated_encoding(s);
return;
@@ -10991,13 +11006,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x12: /* MLA, MLS */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mls, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mla, size);
- }
- return;
case 0x16: /* SQDMULH, SQRDMULH */
{
static gen_helper_gvec_3_ptr * const fns[2][2] = {
@@ -12204,13 +12212,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
TCGv_ptr fpst;
switch (16 * u + opcode) {
- case 0x10: /* MLA */
- case 0x14: /* MLS */
- if (is_scalar) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x02: /* SMLAL, SMLAL2 */
case 0x12: /* UMLAL, UMLAL2 */
case 0x06: /* SMLSL, SMLSL2 */
@@ -12292,6 +12293,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x05: /* FMLS */
case 0x08: /* MUL */
case 0x09: /* FMUL */
+ case 0x10: /* MLA */
+ case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
@@ -12412,40 +12415,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
: gen_helper_gvec_fcmlah_idx);
}
return;
-
- case 0x10: /* MLA */
- if (!is_long && !is_scalar) {
- static gen_helper_gvec_4 * const fns[3] = {
- gen_helper_gvec_mla_idx_h,
- gen_helper_gvec_mla_idx_s,
- gen_helper_gvec_mla_idx_d,
- };
- tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- vec_full_reg_offset(s, rd),
- is_q ? 16 : 8, vec_full_reg_size(s),
- index, fns[size - 1]);
- return;
- }
- break;
-
- case 0x14: /* MLS */
- if (!is_long && !is_scalar) {
- static gen_helper_gvec_4 * const fns[3] = {
- gen_helper_gvec_mls_idx_h,
- gen_helper_gvec_mls_idx_s,
- gen_helper_gvec_mls_idx_d,
- };
- tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- vec_full_reg_offset(s, rd),
- is_q ? 16 : 8, vec_full_reg_size(s),
- index, fns[size - 1]);
- return;
- }
- break;
}
if (size == 3) {
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 30/33] target/arm: Tidy SQDMULH, SQRDMULH (vector)
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (28 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 29/33] target/arm: Convert MLA, MLS " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-28 20:30 ` [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree Richard Henderson
` (3 subsequent siblings)
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
We already have a gvec helper for the operations, but we aren't
using it on the aa32 neon side. Create a unified expander for
use by both aa32 and aa64 translators.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate.h | 4 ++++
target/arm/tcg/gengvec.c | 20 ++++++++++++++++++++
target/arm/tcg/translate-a64.c | 23 ++++-------------------
target/arm/tcg/translate-neon.c | 23 +++--------------------
4 files changed, 31 insertions(+), 39 deletions(-)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 3b1e68b779..aba21f730f 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -539,6 +539,10 @@ void gen_gvec_sri(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
void gen_gvec_sli(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
int64_t shift, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_sqdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_sqrdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_sqrdmlah_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 119826bf28..56a1dc1f75 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -35,6 +35,26 @@ static void gen_gvec_fn3_qc(uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs,
opr_sz, max_sz, 0, fn);
}
+void gen_gvec_sqdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[2] = {
+ gen_helper_neon_sqdmulh_h, gen_helper_neon_sqdmulh_s
+ };
+ tcg_debug_assert(vece >= 1 && vece <= 2);
+ gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
+void gen_gvec_sqrdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static gen_helper_gvec_3_ptr * const fns[2] = {
+ gen_helper_neon_sqrdmulh_h, gen_helper_neon_sqrdmulh_s
+ };
+ tcg_debug_assert(vece >= 1 && vece <= 2);
+ gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
void gen_gvec_sqrdmlah_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
{
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c4601cde2f..c673b95ec7 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -724,19 +724,6 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
}
-/* Expand a 3-operand + qc + operation using an out-of-line helper. */
-static void gen_gvec_op3_qc(DisasContext *s, bool is_q, int rd, int rn,
- int rm, gen_helper_gvec_3_ptr *fn)
-{
- TCGv_ptr qc_ptr = tcg_temp_new_ptr();
-
- tcg_gen_addi_ptr(qc_ptr, tcg_env, offsetof(CPUARMState, vfp.qc));
- tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm), qc_ptr,
- is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
-}
-
/* Expand a 4-operand operation using an out-of-line helper. */
static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn,
int rm, int ra, int data, gen_helper_gvec_4 *fn)
@@ -11007,12 +10994,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0x16: /* SQDMULH, SQRDMULH */
- {
- static gen_helper_gvec_3_ptr * const fns[2][2] = {
- { gen_helper_neon_sqdmulh_h, gen_helper_neon_sqrdmulh_h },
- { gen_helper_neon_sqdmulh_s, gen_helper_neon_sqrdmulh_s },
- };
- gen_gvec_op3_qc(s, is_q, rd, rn, rm, fns[size - 1][u]);
+ if (u) {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmulh_qc, size);
+ } else {
+ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqdmulh_qc, size);
}
return;
}
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index f9a8753906..915c9e56db 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -937,28 +937,11 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
}
#define DO_3SAME_VQDMULH(INSN, FUNC) \
- WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16); \
- WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32); \
- static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
- uint32_t rn_ofs, uint32_t rm_ofs, \
- uint32_t oprsz, uint32_t maxsz) \
- { \
- static const GVecGen3 ops[2] = { \
- { .fni4 = gen_##INSN##_tramp16 }, \
- { .fni4 = gen_##INSN##_tramp32 }, \
- }; \
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece - 1]); \
- } \
static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a) \
- { \
- if (a->size != 1 && a->size != 2) { \
- return false; \
- } \
- return do_3same(s, a, gen_##INSN##_3s); \
- }
+ { return a->size >= 1 && a->size <= 2 && do_3same(s, a, FUNC); }
-DO_3SAME_VQDMULH(VQDMULH, qdmulh)
-DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
+DO_3SAME_VQDMULH(VQDMULH, gen_gvec_sqdmulh_qc)
+DO_3SAME_VQDMULH(VQRDMULH, gen_gvec_sqrdmulh_qc)
#define WRAP_FP_GVEC(WRAPNAME, FPST, FUNC) \
static void WRAPNAME(unsigned vece, uint32_t rd_ofs, \
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (29 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 30/33] target/arm: Tidy SQDMULH, SQRDMULH (vector) Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:10 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB " Richard Henderson
` (2 subsequent siblings)
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
These are the last instructions within disas_simd_three_reg_same
and disas_simd_scalar_three_reg_same, so remove them.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 10 ++
target/arm/tcg/a64.decode | 18 +++
target/arm/tcg/translate-a64.c | 276 ++++++++++-----------------------
target/arm/tcg/vec_helper.c | 64 ++++++++
4 files changed, 172 insertions(+), 196 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 85f9302563..24feecee9b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -968,6 +968,16 @@ DEF_HELPER_FLAGS_5(neon_sqrdmulh_h, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(neon_sqrdmulh_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqdmulh_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqdmulh_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2dea68a0a9..f7f897f9fc 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -774,6 +774,9 @@ CMHS_s 0111 1110 111 ..... 00111 1 ..... ..... @rrr_d
CMTST_s 0101 1110 111 ..... 10001 1 ..... ..... @rrr_d
CMEQ_s 0111 1110 111 ..... 10001 1 ..... ..... @rrr_d
+SQDMULH_s 0101 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
+SQRDMULH_s 0111 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
+
### Advanced SIMD scalar pairwise
FADDP_s 0101 1110 0011 0000 1101 10 ..... ..... @rr_h
@@ -931,6 +934,9 @@ PMUL_v 0.10 1110 001 ..... 10011 1 ..... ..... @qrrr_b
MLA_v 0.00 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
MLS_v 0.10 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
+SQDMULH_v 0.00 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
+SQRDMULH_v 0.10 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
@@ -949,6 +955,12 @@ FMULX_si 0111 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
FMULX_si 0111 1111 10 . ..... 1001 . 0 ..... ..... @rrx_s
FMULX_si 0111 1111 11 0 ..... 1001 . 0 ..... ..... @rrx_d
+SQDMULH_si 0101 1111 01 .. .... 1100 . 0 ..... ..... @rrx_h
+SQDMULH_si 0101 1111 10 .. .... 1100 . 0 ..... ..... @rrx_s
+
+SQRDMULH_si 0101 1111 01 .. .... 1101 . 0 ..... ..... @rrx_h
+SQRDMULH_si 0101 1111 10 . ..... 1101 . 0 ..... ..... @rrx_s
+
### Advanced SIMD vector x indexed element
FMUL_vi 0.00 1111 00 .. .... 1001 . 0 ..... ..... @qrrx_h
@@ -980,3 +992,9 @@ MLA_vi 0.10 1111 10 . ..... 0000 . 0 ..... ..... @qrrx_s
MLS_vi 0.10 1111 01 .. .... 0100 . 0 ..... ..... @qrrx_h
MLS_vi 0.10 1111 10 . ..... 0100 . 0 ..... ..... @qrrx_s
+
+SQDMULH_vi 0.00 1111 01 .. .... 1100 . 0 ..... ..... @qrrx_h
+SQDMULH_vi 0.00 1111 10 . ..... 1100 . 0 ..... ..... @qrrx_s
+
+SQRDMULH_vi 0.00 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
+SQRDMULH_vi 0.00 1111 10 . ..... 1101 . 0 ..... ..... @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c673b95ec7..14226c56cf 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1350,6 +1350,14 @@ static bool do_gvec_fn3_no64(DisasContext *s, arg_qrrr_e *a, GVecGen3Fn *fn)
return true;
}
+static bool do_gvec_fn3_no8_no64(DisasContext *s, arg_qrrr_e *a, GVecGen3Fn *fn)
+{
+ if (a->esz == MO_8) {
+ return false;
+ }
+ return do_gvec_fn3_no64(s, a, fn);
+}
+
static bool do_gvec_fn4(DisasContext *s, arg_qrrrr_e *a, GVecGen4Fn *fn)
{
if (!a->q && a->esz == MO_64) {
@@ -5167,6 +5175,25 @@ static const ENVScalar2 f_scalar_uqrshl = {
};
TRANS(UQRSHL_s, do_env_scalar2, a, &f_scalar_uqrshl)
+static bool do_env_scalar2_hs(DisasContext *s, arg_rrr_e *a,
+ const ENVScalar2 *f)
+{
+ if (a->esz == MO_16 || a->esz == MO_32) {
+ return do_env_scalar2(s, a, f);
+ }
+ return false;
+}
+
+static const ENVScalar2 f_scalar_sqdmulh = {
+ { NULL, gen_helper_neon_qdmulh_s16, gen_helper_neon_qdmulh_s32 }
+};
+TRANS(SQDMULH_s, do_env_scalar2_hs, a, &f_scalar_sqdmulh)
+
+static const ENVScalar2 f_scalar_sqrdmulh = {
+ { NULL, gen_helper_neon_qrdmulh_s16, gen_helper_neon_qrdmulh_s32 }
+};
+TRANS(SQRDMULH_s, do_env_scalar2_hs, a, &f_scalar_sqrdmulh)
+
static bool do_cmop_d(DisasContext *s, arg_rrr_e *a, TCGCond cond)
{
if (fp_access_check(s)) {
@@ -5482,6 +5509,9 @@ TRANS(CMHS_v, do_cmop_v, a, TCG_COND_GEU)
TRANS(CMEQ_v, do_cmop_v, a, TCG_COND_EQ)
TRANS(CMTST_v, do_gvec_fn3, a, gen_gvec_cmtst)
+TRANS(SQDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqdmulh_qc)
+TRANS(SQRDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmulh_qc)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5589,6 +5619,27 @@ static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
TRANS(FMLA_si, do_fmla_scalar_idx, a, false)
TRANS(FMLS_si, do_fmla_scalar_idx, a, true)
+static bool do_env_scalar2_idx_hs(DisasContext *s, arg_rrx_e *a,
+ const ENVScalar2 *f)
+{
+ if (a->esz < MO_16 || a->esz > MO_32) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, a->idx, a->esz);
+ f->gen_bhs[a->esz](t0, tcg_env, t0, t1);
+ write_fp_sreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+TRANS(SQDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqdmulh)
+TRANS(SQRDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqrdmulh)
+
static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5719,6 +5770,33 @@ static bool do_mla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool sub)
TRANS(MLA_vi, do_mla_vector_idx, a, false)
TRANS(MLS_vi, do_mla_vector_idx, a, true)
+static bool do_int3_qc_vector_idx(DisasContext *s, arg_qrrx_e *a,
+ gen_helper_gvec_4 * const fns[2])
+{
+ assert(a->esz == MO_16 || a->esz == MO_32);
+ if (fp_access_check(s)) {
+ tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ offsetof(CPUARMState, vfp.qc),
+ a->q ? 16 : 8, vec_full_reg_size(s),
+ a->idx, fns[a->esz - 1]);
+ }
+ return true;
+}
+
+static gen_helper_gvec_4 * const f_vector_idx_sqdmulh[2] = {
+ gen_helper_neon_sqdmulh_idx_h,
+ gen_helper_neon_sqdmulh_idx_s,
+};
+TRANS(SQDMULH_vi, do_int3_qc_vector_idx, a, f_vector_idx_sqdmulh)
+
+static gen_helper_gvec_4 * const f_vector_idx_sqrdmulh[2] = {
+ gen_helper_neon_sqrdmulh_idx_h,
+ gen_helper_neon_sqrdmulh_idx_s,
+};
+TRANS(SQRDMULH_vi, do_int3_qc_vector_idx, a, f_vector_idx_sqrdmulh)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -9500,109 +9578,6 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD scalar three same
- * 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
- * +-----+---+-----------+------+---+------+--------+---+------+------+
- * | 0 1 | U | 1 1 1 1 0 | size | 1 | Rm | opcode | 1 | Rn | Rd |
- * +-----+---+-----------+------+---+------+--------+---+------+------+
- */
-static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int opcode = extract32(insn, 11, 5);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- bool u = extract32(insn, 29, 1);
- TCGv_i64 tcg_rd;
-
- switch (opcode) {
- case 0x16: /* SQDMULH, SQRDMULH (vector) */
- if (size != 1 && size != 2) {
- unallocated_encoding(s);
- return;
- }
- break;
- default:
- case 0x1: /* SQADD, UQADD */
- case 0x5: /* SQSUB, UQSUB */
- case 0x6: /* CMGT, CMHI */
- case 0x7: /* CMGE, CMHS */
- case 0x8: /* SSHL, USHL */
- case 0x9: /* SQSHL, UQSHL */
- case 0xa: /* SRSHL, URSHL */
- case 0xb: /* SQRSHL, UQRSHL */
- case 0x10: /* ADD, SUB (vector) */
- case 0x11: /* CMTST, CMEQ */
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- tcg_rd = tcg_temp_new_i64();
-
- if (size == 3) {
- g_assert_not_reached();
- } else {
- /* Do a single operation on the lowest element in the vector.
- * We use the standard Neon helpers and rely on 0 OP 0 == 0 with
- * no side effects for all these operations.
- * OPTME: special-purpose helpers would avoid doing some
- * unnecessary work in the helper for the 8 and 16 bit cases.
- */
- NeonGenTwoOpEnvFn *genenvfn = NULL;
- void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
-
- switch (opcode) {
- case 0x16: /* SQDMULH, SQRDMULH */
- {
- static NeonGenTwoOpEnvFn * const fns[2][2] = {
- { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
- { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
- };
- assert(size == 1 || size == 2);
- genenvfn = fns[size - 1][u];
- break;
- }
- default:
- case 0x1: /* SQADD, UQADD */
- case 0x5: /* SQSUB, UQSUB */
- case 0x9: /* SQSHL, UQSHL */
- case 0xb: /* SQRSHL, UQRSHL */
- g_assert_not_reached();
- }
-
- if (genenvfn) {
- TCGv_i32 tcg_rn = tcg_temp_new_i32();
- TCGv_i32 tcg_rm = tcg_temp_new_i32();
-
- read_vec_element_i32(s, tcg_rn, rn, 0, size);
- read_vec_element_i32(s, tcg_rm, rm, 0, size);
- genenvfn(tcg_rn, tcg_env, tcg_rn, tcg_rm);
- tcg_gen_extu_i32_i64(tcg_rd, tcg_rn);
- } else {
- TCGv_i64 tcg_rn = tcg_temp_new_i64();
- TCGv_i64 tcg_rm = tcg_temp_new_i64();
- TCGv_i64 qc = tcg_temp_new_i64();
-
- read_vec_element(s, tcg_rn, rn, 0, size | (u ? 0 : MO_SIGN));
- read_vec_element(s, tcg_rm, rm, 0, size | (u ? 0 : MO_SIGN));
- tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- genfn(tcg_rd, qc, tcg_rn, tcg_rm, size);
- tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
- if (!u) {
- /* Truncate signed 64-bit result for writeback. */
- tcg_gen_ext_i64(tcg_rd, tcg_rd, size);
- }
- }
- }
-
- write_fp_dreg(s, rd, tcg_rd);
-}
-
/* AdvSIMD scalar three same extra
* 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
* +-----+---+-----------+------+---+------+---+--------+---+----+----+
@@ -10940,94 +10915,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* Integer op subgroup of C3.6.16. */
-static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
-{
- int is_q = extract32(insn, 30, 1);
- int u = extract32(insn, 29, 1);
- int size = extract32(insn, 22, 2);
- int opcode = extract32(insn, 11, 5);
- int rm = extract32(insn, 16, 5);
- int rn = extract32(insn, 5, 5);
- int rd = extract32(insn, 0, 5);
-
- switch (opcode) {
- case 0x16: /* SQDMULH, SQRDMULH */
- if (size == 0 || size == 3) {
- unallocated_encoding(s);
- return;
- }
- break;
- default:
- if (size == 3 && !is_q) {
- unallocated_encoding(s);
- return;
- }
- break;
-
- case 0x0: /* SHADD, UHADD */
- case 0x01: /* SQADD, UQADD */
- case 0x02: /* SRHADD, URHADD */
- case 0x04: /* SHSUB, UHSUB */
- case 0x05: /* SQSUB, UQSUB */
- case 0x06: /* CMGT, CMHI */
- case 0x07: /* CMGE, CMHS */
- case 0x08: /* SSHL, USHL */
- case 0x09: /* SQSHL, UQSHL */
- case 0x0a: /* SRSHL, URSHL */
- case 0x0b: /* SQRSHL, UQRSHL */
- case 0x0c: /* SMAX, UMAX */
- case 0x0d: /* SMIN, UMIN */
- case 0x0e: /* SABD, UABD */
- case 0x0f: /* SABA, UABA */
- case 0x10: /* ADD, SUB */
- case 0x11: /* CMTST, CMEQ */
- case 0x12: /* MLA, MLS */
- case 0x13: /* MUL, PMUL */
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- switch (opcode) {
- case 0x16: /* SQDMULH, SQRDMULH */
- if (u) {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmulh_qc, size);
- } else {
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqdmulh_qc, size);
- }
- return;
- }
- g_assert_not_reached();
-}
-
-/* AdvSIMD three same
- * 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
- * +---+---+---+-----------+------+---+------+--------+---+------+------+
- * | 0 | Q | U | 0 1 1 1 0 | size | 1 | Rm | opcode | 1 | Rn | Rd |
- * +---+---+---+-----------+------+---+------+--------+---+------+------+
- */
-static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
-{
- int opcode = extract32(insn, 11, 5);
-
- switch (opcode) {
- default:
- disas_simd_3same_int(s, insn);
- break;
- case 0x3: /* logic ops */
- case 0x14: /* SMAXP, UMAXP */
- case 0x15: /* SMINP, UMINP */
- case 0x17: /* ADDP */
- case 0x18 ... 0x31: /* floating point ops */
- unallocated_encoding(s);
- break;
- }
-}
-
/* AdvSIMD three same extra
* 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
* +---+---+---+-----------+------+---+------+---+--------+---+----+----+
@@ -12214,9 +12101,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x0c: /* SQDMULH */
- case 0x0d: /* SQRDMULH */
- break;
case 0x1d: /* SQRDMLAH */
case 0x1f: /* SQRDMLSH */
if (!dc_isar_feature(aa64_rdm, s)) {
@@ -12278,6 +12162,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x05: /* FMLS */
case 0x08: /* MUL */
case 0x09: /* FMUL */
+ case 0x0c: /* SQDMULH */
+ case 0x0d: /* SQRDMULH */
case 0x10: /* MLA */
case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
@@ -12683,7 +12569,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
*/
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
- { 0x0e200400, 0x9f200400, disas_simd_three_reg_same },
{ 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
{ 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
@@ -12695,7 +12580,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
{ 0x0e000000, 0xbf208c00, disas_simd_tb },
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x2e000000, 0xbf208400, disas_simd_ext },
- { 0x5e200400, 0xdf200400, disas_simd_scalar_three_reg_same },
{ 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
{ 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index d8e96386be..b05922b425 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -311,6 +311,38 @@ void HELPER(neon_sqrdmulh_h)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqdmulh_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+
+ for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < 16 / 2; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, false, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+
+ for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < 16 / 2; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_h)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
@@ -474,6 +506,38 @@ void HELPER(neon_sqrdmulh_s)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqdmulh_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+
+ for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < 16 / 4; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, false, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+
+ for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < 16 / 4; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_s)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (30 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:06 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 33/33] target/arm: Convert FCSEL " Richard Henderson
2024-05-30 14:32 ` [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Peter Maydell
33 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
These are the only instructions in the 3 source scalar class.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 10 ++
target/arm/tcg/translate-a64.c | 231 ++++++++++++---------------------
2 files changed, 93 insertions(+), 148 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f7f897f9fc..6f6cd805b7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -32,6 +32,7 @@
&rr_e rd rn esz
&rrr_e rd rn rm esz
&rrx_e rd rn rm idx esz
+&rrrr_e rd rn rm ra esz
&qrr_e q rd rn esz
&qrrr_e q rd rn rm esz
&qrrx_e q rd rn rm idx esz
@@ -998,3 +999,12 @@ SQDMULH_vi 0.00 1111 10 . ..... 1100 . 0 ..... ..... @qrrx_s
SQRDMULH_vi 0.00 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
SQRDMULH_vi 0.00 1111 10 . ..... 1101 . 0 ..... ..... @qrrx_s
+
+# Floating-point data-processing (3 source)
+
+@rrrr_hsd .... .... .. . rm:5 . ra:5 rn:5 rd:5 &rrrr_e esz=%esz_hsd
+
+FMADD 0001 1111 .. 0 ..... 0 ..... ..... ..... @rrrr_hsd
+FMSUB 0001 1111 .. 0 ..... 1 ..... ..... ..... @rrrr_hsd
+FNMADD 0001 1111 .. 1 ..... 0 ..... ..... ..... @rrrr_hsd
+FNMSUB 0001 1111 .. 1 ..... 1 ..... ..... ..... @rrrr_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 14226c56cf..78a2e6d692 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5866,6 +5866,88 @@ static bool trans_ADDP_s(DisasContext *s, arg_rr_e *a)
return true;
}
+/*
+ * Floating-point data-processing (3 source)
+ */
+
+static bool do_fmadd(DisasContext *s, arg_rrrr_e *a, bool neg_a, bool neg_n)
+{
+ TCGv_ptr fpst;
+
+ /*
+ * These are fused multiply-add. Note that doing the negations here
+ * as separate steps is correct: an input NaN should come out with
+ * its sign bit flipped if it is a negated-input.
+ */
+ switch (a->esz) {
+ case MO_64:
+ if (fp_access_check(s)) {
+ TCGv_i64 tn = read_fp_dreg(s, a->rn);
+ TCGv_i64 tm = read_fp_dreg(s, a->rm);
+ TCGv_i64 ta = read_fp_dreg(s, a->ra);
+
+ if (neg_a) {
+ gen_vfp_negd(ta, ta);
+ }
+ if (neg_n) {
+ gen_vfp_negd(tn, tn);
+ }
+ fpst = fpstatus_ptr(FPST_FPCR);
+ gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
+ write_fp_dreg(s, a->rd, ta);
+ }
+ break;
+
+ case MO_32:
+ if (fp_access_check(s)) {
+ TCGv_i32 tn = read_fp_sreg(s, a->rn);
+ TCGv_i32 tm = read_fp_sreg(s, a->rm);
+ TCGv_i32 ta = read_fp_sreg(s, a->ra);
+
+ if (neg_a) {
+ gen_vfp_negs(ta, ta);
+ }
+ if (neg_n) {
+ gen_vfp_negs(tn, tn);
+ }
+ fpst = fpstatus_ptr(FPST_FPCR);
+ gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
+ write_fp_sreg(s, a->rd, ta);
+ }
+ break;
+
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ TCGv_i32 tn = read_fp_hreg(s, a->rn);
+ TCGv_i32 tm = read_fp_hreg(s, a->rm);
+ TCGv_i32 ta = read_fp_hreg(s, a->ra);
+
+ if (neg_a) {
+ gen_vfp_negh(ta, ta);
+ }
+ if (neg_n) {
+ gen_vfp_negh(tn, tn);
+ }
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
+ gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
+ write_fp_sreg(s, a->rd, ta);
+ }
+ break;
+
+ default:
+ return false;
+ }
+ return true;
+}
+
+TRANS(FMADD, do_fmadd, a, false, false)
+TRANS(FNMADD, do_fmadd, a, true, true)
+TRANS(FMSUB, do_fmadd, a, false, true)
+TRANS(FNMSUB, do_fmadd, a, true, false)
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -7665,152 +7747,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
}
}
-/* Floating-point data-processing (3 source) - single precision */
-static void handle_fp_3src_single(DisasContext *s, bool o0, bool o1,
- int rd, int rn, int rm, int ra)
-{
- TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
- TCGv_i32 tcg_res = tcg_temp_new_i32();
- TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
-
- tcg_op1 = read_fp_sreg(s, rn);
- tcg_op2 = read_fp_sreg(s, rm);
- tcg_op3 = read_fp_sreg(s, ra);
-
- /* These are fused multiply-add, and must be done as one
- * floating point operation with no rounding between the
- * multiplication and addition steps.
- * NB that doing the negations here as separate steps is
- * correct : an input NaN should come out with its sign bit
- * flipped if it is a negated-input.
- */
- if (o1 == true) {
- gen_vfp_negs(tcg_op3, tcg_op3);
- }
-
- if (o0 != o1) {
- gen_vfp_negs(tcg_op1, tcg_op1);
- }
-
- gen_helper_vfp_muladds(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
-
- write_fp_sreg(s, rd, tcg_res);
-}
-
-/* Floating-point data-processing (3 source) - double precision */
-static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
- int rd, int rn, int rm, int ra)
-{
- TCGv_i64 tcg_op1, tcg_op2, tcg_op3;
- TCGv_i64 tcg_res = tcg_temp_new_i64();
- TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
-
- tcg_op1 = read_fp_dreg(s, rn);
- tcg_op2 = read_fp_dreg(s, rm);
- tcg_op3 = read_fp_dreg(s, ra);
-
- /* These are fused multiply-add, and must be done as one
- * floating point operation with no rounding between the
- * multiplication and addition steps.
- * NB that doing the negations here as separate steps is
- * correct : an input NaN should come out with its sign bit
- * flipped if it is a negated-input.
- */
- if (o1 == true) {
- gen_vfp_negd(tcg_op3, tcg_op3);
- }
-
- if (o0 != o1) {
- gen_vfp_negd(tcg_op1, tcg_op1);
- }
-
- gen_helper_vfp_muladdd(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
-
- write_fp_dreg(s, rd, tcg_res);
-}
-
-/* Floating-point data-processing (3 source) - half precision */
-static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
- int rd, int rn, int rm, int ra)
-{
- TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
- TCGv_i32 tcg_res = tcg_temp_new_i32();
- TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR_F16);
-
- tcg_op1 = read_fp_hreg(s, rn);
- tcg_op2 = read_fp_hreg(s, rm);
- tcg_op3 = read_fp_hreg(s, ra);
-
- /* These are fused multiply-add, and must be done as one
- * floating point operation with no rounding between the
- * multiplication and addition steps.
- * NB that doing the negations here as separate steps is
- * correct : an input NaN should come out with its sign bit
- * flipped if it is a negated-input.
- */
- if (o1 == true) {
- tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
- }
-
- if (o0 != o1) {
- tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
- }
-
- gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
-
- write_fp_sreg(s, rd, tcg_res);
-}
-
-/* Floating point data-processing (3 source)
- * 31 30 29 28 24 23 22 21 20 16 15 14 10 9 5 4 0
- * +---+---+---+-----------+------+----+------+----+------+------+------+
- * | M | 0 | S | 1 1 1 1 1 | type | o1 | Rm | o0 | Ra | Rn | Rd |
- * +---+---+---+-----------+------+----+------+----+------+------+------+
- */
-static void disas_fp_3src(DisasContext *s, uint32_t insn)
-{
- int mos = extract32(insn, 29, 3);
- int type = extract32(insn, 22, 2);
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int ra = extract32(insn, 10, 5);
- int rm = extract32(insn, 16, 5);
- bool o0 = extract32(insn, 15, 1);
- bool o1 = extract32(insn, 21, 1);
-
- if (mos) {
- unallocated_encoding(s);
- return;
- }
-
- switch (type) {
- case 0:
- if (!fp_access_check(s)) {
- return;
- }
- handle_fp_3src_single(s, o0, o1, rd, rn, rm, ra);
- break;
- case 1:
- if (!fp_access_check(s)) {
- return;
- }
- handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
- break;
- case 3:
- if (!dc_isar_feature(aa64_fp16, s)) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
- handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
- break;
- default:
- unallocated_encoding(s);
- }
-}
-
/* Floating point immediate
* 31 30 29 28 24 23 22 21 20 13 12 10 9 5 4 0
* +---+---+---+-----------+------+---+------------+-------+------+------+
@@ -8255,8 +8191,7 @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
{
if (extract32(insn, 24, 1)) {
- /* Floating point data-processing (3 source) */
- disas_fp_3src(s, insn);
+ unallocated_encoding(s); /* in decodetree */
} else if (extract32(insn, 21, 1) == 0) {
/* Floating point to fixed point conversions */
disas_fp_fixed_conv(s, insn);
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v3 33/33] target/arm: Convert FCSEL to decodetree
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (31 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB " Richard Henderson
@ 2024-05-28 20:30 ` Richard Henderson
2024-05-30 14:32 ` [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Peter Maydell
33 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2024-05-28 20:30 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++
target/arm/tcg/translate-a64.c | 108 ++++++++++++++-------------------
2 files changed, 49 insertions(+), 63 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6f6cd805b7..5dadbc74d7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1000,6 +1000,10 @@ SQDMULH_vi 0.00 1111 10 . ..... 1100 . 0 ..... ..... @qrrx_s
SQRDMULH_vi 0.00 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
SQRDMULH_vi 0.00 1111 10 . ..... 1101 . 0 ..... ..... @qrrx_s
+# Floating-point conditional select
+
+FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
+
# Floating-point data-processing (3 source)
@rrrr_hsd .... .... .. . rm:5 . ra:5 rn:5 rd:5 &rrrr_e esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 78a2e6d692..f1dea5834c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5866,6 +5866,50 @@ static bool trans_ADDP_s(DisasContext *s, arg_rr_e *a)
return true;
}
+/*
+ * Floating-point conditional select
+ */
+
+static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a)
+{
+ TCGv_i64 t_true, t_false;
+ DisasCompare64 c;
+
+ switch (a->esz) {
+ case MO_32:
+ case MO_64:
+ break;
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ break;
+ default:
+ return false;
+ }
+
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ /* Zero extend sreg & hreg inputs to 64 bits now. */
+ t_true = tcg_temp_new_i64();
+ t_false = tcg_temp_new_i64();
+ read_vec_element(s, t_true, a->rn, 0, a->esz);
+ read_vec_element(s, t_false, a->rm, 0, a->esz);
+
+ a64_test_cc(&c, a->cond);
+ tcg_gen_movcond_i64(c.cond, t_true, c.value, tcg_constant_i64(0),
+ t_true, t_false);
+
+ /*
+ * Note that sregs & hregs write back zeros to the high bits,
+ * and we've already done the zero-extension.
+ */
+ write_fp_dreg(s, a->rd, t_true);
+ return true;
+}
+
/*
* Floating-point data-processing (3 source)
*/
@@ -7332,68 +7376,6 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
}
}
-/* Floating point conditional select
- * 31 30 29 28 24 23 22 21 20 16 15 12 11 10 9 5 4 0
- * +---+---+---+-----------+------+---+------+------+-----+------+------+
- * | M | 0 | S | 1 1 1 1 0 | type | 1 | Rm | cond | 1 1 | Rn | Rd |
- * +---+---+---+-----------+------+---+------+------+-----+------+------+
- */
-static void disas_fp_csel(DisasContext *s, uint32_t insn)
-{
- unsigned int mos, type, rm, cond, rn, rd;
- TCGv_i64 t_true, t_false;
- DisasCompare64 c;
- MemOp sz;
-
- mos = extract32(insn, 29, 3);
- type = extract32(insn, 22, 2);
- rm = extract32(insn, 16, 5);
- cond = extract32(insn, 12, 4);
- rn = extract32(insn, 5, 5);
- rd = extract32(insn, 0, 5);
-
- if (mos) {
- unallocated_encoding(s);
- return;
- }
-
- switch (type) {
- case 0:
- sz = MO_32;
- break;
- case 1:
- sz = MO_64;
- break;
- case 3:
- sz = MO_16;
- if (dc_isar_feature(aa64_fp16, s)) {
- break;
- }
- /* fallthru */
- default:
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- /* Zero extend sreg & hreg inputs to 64 bits now. */
- t_true = tcg_temp_new_i64();
- t_false = tcg_temp_new_i64();
- read_vec_element(s, t_true, rn, 0, sz);
- read_vec_element(s, t_false, rm, 0, sz);
-
- a64_test_cc(&c, cond);
- tcg_gen_movcond_i64(c.cond, t_true, c.value, tcg_constant_i64(0),
- t_true, t_false);
-
- /* Note that sregs & hregs write back zeros to the high bits,
- and we've already done the zero-extension. */
- write_fp_dreg(s, rd, t_true);
-}
-
/* Floating-point data-processing (1 source) - half precision */
static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
{
@@ -8207,7 +8189,7 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
break;
case 3:
/* Floating point conditional select */
- disas_fp_csel(s, insn);
+ unallocated_encoding(s); /* in decodetree */
break;
case 0:
switch (ctz32(extract32(insn, 12, 4))) {
--
2.34.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB to decodetree
2024-05-28 20:30 ` [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB " Richard Henderson
@ 2024-05-30 14:06 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:06 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> These are the only instructions in the 3 source scalar class.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree
2024-05-28 20:30 ` [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree Richard Henderson
@ 2024-05-30 14:10 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:10 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> These are the last instructions within disas_simd_three_reg_same
> and disas_simd_scalar_three_reg_same, so remove them.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec
2024-05-28 20:30 ` [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec Richard Henderson
@ 2024-05-30 14:12 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec
2024-05-28 20:30 ` [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec Richard Henderson
@ 2024-05-30 14:12 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc
2024-05-28 20:30 ` [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc Richard Henderson
@ 2024-05-30 14:14 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec
2024-05-28 20:30 ` [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec Richard Henderson
@ 2024-05-30 14:18 ` Peter Maydell
0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:18 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 16 +++++
> target/arm/tcg/translate-a64.h | 6 ++
> target/arm/tcg/gengvec64.c | 110 ++++++++++++++++++++++++++++++++
> target/arm/tcg/translate-a64.c | 113 ++++++++++++++-------------------
> target/arm/tcg/vec_helper.c | 64 +++++++++++++++++++
> 5 files changed, 245 insertions(+), 64 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b)
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
` (32 preceding siblings ...)
2024-05-28 20:30 ` [PATCH v3 33/33] target/arm: Convert FCSEL " Richard Henderson
@ 2024-05-30 14:32 ` Peter Maydell
33 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2024-05-30 14:32 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 28 May 2024 at 21:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Changes for v3:
> * Reword prefetch unpredictable patch.
> * Validate vector length when qc is an implied operand.
> * Adjust some legacy decode based on review.
> * Apply r-b.
>
> Patches needing review:
> 01-target-arm-Diagnose-UNPREDICTABLE-operands-to-PLD.patch
> 03-target-arm-Assert-oprsz-in-range-when-using-vfp.q.patch
> 04-target-arm-Convert-SUQADD-and-USQADD-to-gvec.patch
> 10-target-arm-Convert-SRSHL-and-URSHL-register-to-gv.patch
> 12-target-arm-Convert-SQSHL-and-UQSHL-register-to-gv.patch
> 31-target-arm-Convert-SQDMULH-SQRDMULH-to-decodetree.patch
> 32-target-arm-Convert-FMADD-FMSUB-FNMADD-FNMSUB-to-d.patch
Applied 2-33 to target-arm.next, thanks. (Dropped the PLD
patch for the reasons we discussed in the other thread.)
-- PMM
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2024-05-30 14:33 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 20:30 [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Richard Henderson
2024-05-28 20:30 ` [PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI Richard Henderson
2024-05-28 20:30 ` [PATCH v3 02/33] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB Richard Henderson
2024-05-28 20:30 ` [PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc Richard Henderson
2024-05-30 14:14 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec Richard Henderson
2024-05-30 14:18 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 05/33] target/arm: Inline scalar SUQADD and USQADD Richard Henderson
2024-05-28 20:30 ` [PATCH v3 06/33] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB Richard Henderson
2024-05-28 20:30 ` [PATCH v3 07/33] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 08/33] target/arm: Convert SUQADD, USQADD " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 09/33] target/arm: Convert SSHL, USHL " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec Richard Henderson
2024-05-30 14:12 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 11/33] target/arm: Convert SRSHL, URSHL to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec Richard Henderson
2024-05-30 14:12 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 13/33] target/arm: Convert SQSHL, UQSHL to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 14/33] target/arm: Convert SQRSHL and UQRSHL (register) to gvec Richard Henderson
2024-05-28 20:30 ` [PATCH v3 15/33] target/arm: Convert SQRSHL, UQRSHL to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 16/33] target/arm: Convert ADD, SUB (vector) " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 17/33] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 18/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32, i64} Richard Henderson
2024-05-28 20:30 ` [PATCH v3 19/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec Richard Henderson
2024-05-28 20:30 ` [PATCH v3 20/33] target/arm: Convert SHADD, UHADD to gvec Richard Henderson
2024-05-28 20:30 ` [PATCH v3 21/33] target/arm: Convert SHADD, UHADD to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 22/33] target/arm: Convert SHSUB, UHSUB to gvec Richard Henderson
2024-05-28 20:30 ` [PATCH v3 23/33] target/arm: Convert SHSUB, UHSUB to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 24/33] target/arm: Convert SRHADD, URHADD to gvec Richard Henderson
2024-05-28 20:30 ` [PATCH v3 25/33] target/arm: Convert SRHADD, URHADD to decodetree Richard Henderson
2024-05-28 20:30 ` [PATCH v3 26/33] target/arm: Convert SMAX, SMIN, UMAX, UMIN " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 27/33] target/arm: Convert SABA, SABD, UABA, UABD " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 28/33] target/arm: Convert MUL, PMUL " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 29/33] target/arm: Convert MLA, MLS " Richard Henderson
2024-05-28 20:30 ` [PATCH v3 30/33] target/arm: Tidy SQDMULH, SQRDMULH (vector) Richard Henderson
2024-05-28 20:30 ` [PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree Richard Henderson
2024-05-30 14:10 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB " Richard Henderson
2024-05-30 14:06 ` Peter Maydell
2024-05-28 20:30 ` [PATCH v3 33/33] target/arm: Convert FCSEL " Richard Henderson
2024-05-30 14:32 ` [PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b) Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).