* [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4
@ 2024-07-17 6:08 Richard Henderson
2024-07-17 6:08 ` [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT Richard Henderson
` (18 more replies)
0 siblings, 19 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Flush before the queue gets too big.
Also, there's a bug fix in patch 14.
r~
Richard Henderson (17):
target/arm: Use tcg_gen_extract2_i64 for EXT
target/arm: Convert EXT to decodetree
target/arm: Convert TBL, TBX to decodetree
target/arm: Convert UZP, TRN, ZIP to decodetree
target/arm: Simplify do_reduction_op
target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
target/arm: Convert FMOVI (scalar, immediate) to decodetree
target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to
decodetree
target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
target/arm: Fix whitespace near gen_srshr64_i64
target/arm: Convert handle_vec_simd_shri to decodetree
target/arm: Convet handle_vec_simd_shli to decodetree
target/arm: Clear high SVE elements in handle_vec_simd_wshli
target/arm: Use {,s}extract in handle_vec_simd_wshli
target/arm: Convert SSHLL, USHLL to decodetree
target/arm: Push tcg_rnd into handle_shri_with_rndacc
target/arm/tcg/translate.h | 5 +
target/arm/tcg/gengvec.c | 21 +-
target/arm/tcg/translate-a64.c | 1123 +++++++++++--------------------
target/arm/tcg/translate-neon.c | 25 +-
target/arm/tcg/a64.decode | 87 +++
5 files changed, 520 insertions(+), 741 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 9:50 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 02/17] target/arm: Convert EXT to decodetree Richard Henderson
` (17 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
The extract2 tcg op performs the same operation
as the do_ext64 function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 23 +++--------------------
1 file changed, 3 insertions(+), 20 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 559a6cd799..e4c8a20f39 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8849,23 +8849,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-static void do_ext64(DisasContext *s, TCGv_i64 tcg_left, TCGv_i64 tcg_right,
- int pos)
-{
- /* Extract 64 bits from the middle of two concatenated 64 bit
- * vector register slices left:right. The extracted bits start
- * at 'pos' bits into the right (least significant) side.
- * We return the result in tcg_right, and guarantee not to
- * trash tcg_left.
- */
- TCGv_i64 tcg_tmp = tcg_temp_new_i64();
- assert(pos > 0 && pos < 64);
-
- tcg_gen_shri_i64(tcg_right, tcg_right, pos);
- tcg_gen_shli_i64(tcg_tmp, tcg_left, 64 - pos);
- tcg_gen_or_i64(tcg_right, tcg_right, tcg_tmp);
-}
-
/* EXT
* 31 30 29 24 23 22 21 20 16 15 14 11 10 9 5 4 0
* +---+---+-------------+-----+---+------+---+------+---+------+------+
@@ -8903,7 +8886,7 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
read_vec_element(s, tcg_resl, rn, 0, MO_64);
if (pos != 0) {
read_vec_element(s, tcg_resh, rm, 0, MO_64);
- do_ext64(s, tcg_resh, tcg_resl, pos);
+ tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
}
} else {
TCGv_i64 tcg_hh;
@@ -8924,10 +8907,10 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
elt++;
if (pos != 0) {
- do_ext64(s, tcg_resh, tcg_resl, pos);
+ tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
tcg_hh = tcg_temp_new_i64();
read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64);
- do_ext64(s, tcg_hh, tcg_resh, pos);
+ tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 02/17] target/arm: Convert EXT to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
2024-07-17 6:08 ` [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 9:55 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 03/17] target/arm: Convert TBL, TBX " Richard Henderson
` (16 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 121 +++++++++++++--------------------
target/arm/tcg/a64.decode | 5 ++
2 files changed, 53 insertions(+), 73 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index e4c8a20f39..6ca24d9842 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6541,6 +6541,54 @@ static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a)
return true;
}
+/*
+ * Advanced SIMD Extract
+ */
+
+static bool trans_EXT_d(DisasContext *s, arg_EXT_d *a)
+{
+ if (fp_access_check(s)) {
+ TCGv_i64 lo = read_fp_dreg(s, a->rn);
+ if (a->imm != 0) {
+ TCGv_i64 hi = read_fp_dreg(s, a->rm);
+ tcg_gen_extract2_i64(lo, lo, hi, a->imm * 8);
+ }
+ write_fp_dreg(s, a->rd, lo);
+ }
+ return true;
+}
+
+static bool trans_EXT_q(DisasContext *s, arg_EXT_q *a)
+{
+ TCGv_i64 lo, hi;
+ int pos = (a->imm & 7) * 8;
+ int elt = a->imm >> 3;
+
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ lo = tcg_temp_new_i64();
+ hi = tcg_temp_new_i64();
+
+ read_vec_element(s, lo, a->rn, elt, MO_64);
+ elt++;
+ read_vec_element(s, hi, elt & 2 ? a->rm : a->rn, elt & 1, MO_64);
+ elt++;
+
+ if (pos != 0) {
+ TCGv_i64 hh = tcg_temp_new_i64();
+ tcg_gen_extract2_i64(lo, lo, hi, pos);
+ read_vec_element(s, hh, a->rm, elt & 1, MO_64);
+ tcg_gen_extract2_i64(hi, hi, hh, pos);
+ }
+
+ write_vec_element(s, lo, a->rd, 0, MO_64);
+ write_vec_element(s, hi, a->rd, 1, MO_64);
+ clear_vec_high(s, true, a->rd);
+ return true;
+}
+
/*
* Floating-point data-processing (3 source)
*/
@@ -8849,78 +8897,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-/* EXT
- * 31 30 29 24 23 22 21 20 16 15 14 11 10 9 5 4 0
- * +---+---+-------------+-----+---+------+---+------+---+------+------+
- * | 0 | Q | 1 0 1 1 1 0 | op2 | 0 | Rm | 0 | imm4 | 0 | Rn | Rd |
- * +---+---+-------------+-----+---+------+---+------+---+------+------+
- */
-static void disas_simd_ext(DisasContext *s, uint32_t insn)
-{
- int is_q = extract32(insn, 30, 1);
- int op2 = extract32(insn, 22, 2);
- int imm4 = extract32(insn, 11, 4);
- int rm = extract32(insn, 16, 5);
- int rn = extract32(insn, 5, 5);
- int rd = extract32(insn, 0, 5);
- int pos = imm4 << 3;
- TCGv_i64 tcg_resl, tcg_resh;
-
- if (op2 != 0 || (!is_q && extract32(imm4, 3, 1))) {
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- tcg_resh = tcg_temp_new_i64();
- tcg_resl = tcg_temp_new_i64();
-
- /* Vd gets bits starting at pos bits into Vm:Vn. This is
- * either extracting 128 bits from a 128:128 concatenation, or
- * extracting 64 bits from a 64:64 concatenation.
- */
- if (!is_q) {
- read_vec_element(s, tcg_resl, rn, 0, MO_64);
- if (pos != 0) {
- read_vec_element(s, tcg_resh, rm, 0, MO_64);
- tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
- }
- } else {
- TCGv_i64 tcg_hh;
- typedef struct {
- int reg;
- int elt;
- } EltPosns;
- EltPosns eltposns[] = { {rn, 0}, {rn, 1}, {rm, 0}, {rm, 1} };
- EltPosns *elt = eltposns;
-
- if (pos >= 64) {
- elt++;
- pos -= 64;
- }
-
- read_vec_element(s, tcg_resl, elt->reg, elt->elt, MO_64);
- elt++;
- read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
- elt++;
- if (pos != 0) {
- tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
- tcg_hh = tcg_temp_new_i64();
- read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64);
- tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos);
- }
- }
-
- write_vec_element(s, tcg_resl, rd, 0, MO_64);
- if (is_q) {
- write_vec_element(s, tcg_resh, rd, 1, MO_64);
- }
- clear_vec_high(s, is_q, rd);
-}
-
/* TBL/TBX
* 31 30 29 24 23 22 21 20 16 15 14 13 12 11 10 9 5 4 0
* +---+---+-------------+-----+---+------+---+-----+----+-----+------+------+
@@ -11818,7 +11794,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
{ 0x0f000400, 0x9f800400, disas_simd_shift_imm },
{ 0x0e000000, 0xbf208c00, disas_simd_tb },
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
- { 0x2e000000, 0xbf208400, disas_simd_ext },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
{ 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2922de700c..05927fade6 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1136,3 +1136,8 @@ FMADD 0001 1111 .. 0 ..... 0 ..... ..... ..... @rrrr_hsd
FMSUB 0001 1111 .. 0 ..... 1 ..... ..... ..... @rrrr_hsd
FNMADD 0001 1111 .. 1 ..... 0 ..... ..... ..... @rrrr_hsd
FNMSUB 0001 1111 .. 1 ..... 1 ..... ..... ..... @rrrr_hsd
+
+# Advanced SIMD Extract
+
+EXT_d 0010 1110 00 0 rm:5 00 imm:3 0 rn:5 rd:5
+EXT_q 0110 1110 00 0 rm:5 0 imm:4 0 rn:5 rd:5
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 03/17] target/arm: Convert TBL, TBX to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
2024-07-17 6:08 ` [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT Richard Henderson
2024-07-17 6:08 ` [PATCH 02/17] target/arm: Convert EXT to decodetree Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 9:56 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 04/17] target/arm: Convert UZP, TRN, ZIP " Richard Henderson
` (15 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 47 ++++++++++------------------------
target/arm/tcg/a64.decode | 4 +++
2 files changed, 18 insertions(+), 33 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6ca24d9842..7e3bde93fe 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4657,6 +4657,20 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a)
return true;
}
+static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX *a)
+{
+ if (fp_access_check(s)) {
+ int len = (a->len + 1) * 16;
+
+ tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rm), tcg_env,
+ a->q ? 16 : 8, vec_full_reg_size(s),
+ (len << 6) | (a->tbx << 5) | a->rn,
+ gen_helper_simd_tblx);
+ }
+ return true;
+}
+
/*
* Cryptographic AES, SHA, SHA512
*/
@@ -8897,38 +8911,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-/* TBL/TBX
- * 31 30 29 24 23 22 21 20 16 15 14 13 12 11 10 9 5 4 0
- * +---+---+-------------+-----+---+------+---+-----+----+-----+------+------+
- * | 0 | Q | 0 0 1 1 1 0 | op2 | 0 | Rm | 0 | len | op | 0 0 | Rn | Rd |
- * +---+---+-------------+-----+---+------+---+-----+----+-----+------+------+
- */
-static void disas_simd_tb(DisasContext *s, uint32_t insn)
-{
- int op2 = extract32(insn, 22, 2);
- int is_q = extract32(insn, 30, 1);
- int rm = extract32(insn, 16, 5);
- int rn = extract32(insn, 5, 5);
- int rd = extract32(insn, 0, 5);
- int is_tbx = extract32(insn, 12, 1);
- int len = (extract32(insn, 13, 2) + 1) * 16;
-
- if (op2 != 0) {
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rm), tcg_env,
- is_q ? 16 : 8, vec_full_reg_size(s),
- (len << 6) | (is_tbx << 5) | rn,
- gen_helper_simd_tblx);
-}
-
/* ZIP/UZP/TRN
* 31 30 29 24 23 22 21 20 16 15 14 12 11 10 9 5 4 0
* +---+---+-------------+------+---+------+---+------------------+------+
@@ -11792,7 +11774,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
/* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
{ 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
{ 0x0f000400, 0x9f800400, disas_simd_shift_imm },
- { 0x0e000000, 0xbf208c00, disas_simd_tb },
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 05927fade6..45896902d5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1141,3 +1141,7 @@ FNMSUB 0001 1111 .. 1 ..... 1 ..... ..... ..... @rrrr_hsd
EXT_d 0010 1110 00 0 rm:5 00 imm:3 0 rn:5 rd:5
EXT_q 0110 1110 00 0 rm:5 0 imm:4 0 rn:5 rd:5
+
+# Advanced SIMD Table Lookup
+
+TBL_TBX 0 q:1 00 1110 000 rm:5 0 len:2 tbx:1 00 rn:5 rd:5
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 04/17] target/arm: Convert UZP, TRN, ZIP to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (2 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 03/17] target/arm: Convert TBL, TBX " Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 05/17] target/arm: Simplify do_reduction_op Richard Henderson
` (14 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 158 ++++++++++++++-------------------
target/arm/tcg/a64.decode | 9 ++
2 files changed, 77 insertions(+), 90 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7e3bde93fe..e0314a1253 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4671,6 +4671,74 @@ static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX *a)
return true;
}
+typedef int simd_permute_idx_fn(int i, int part, int elements);
+
+static bool do_simd_permute(DisasContext *s, arg_qrrr_e *a,
+ simd_permute_idx_fn *fn, int part)
+{
+ MemOp esz = a->esz;
+ int datasize = a->q ? 16 : 8;
+ int elements = datasize >> esz;
+ TCGv_i64 tcg_res[2], tcg_ele;
+
+ if (esz == MO_64 && !a->q) {
+ return false;
+ }
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ tcg_res[0] = tcg_temp_new_i64();
+ tcg_res[1] = a->q ? tcg_temp_new_i64() : NULL;
+ tcg_ele = tcg_temp_new_i64();
+
+ for (int i = 0; i < elements; i++) {
+ int o, w, idx;
+
+ idx = fn(i, part, elements);
+ read_vec_element(s, tcg_ele, (idx & elements ? a->rm : a->rn),
+ idx & (elements - 1), esz);
+
+ w = (i << (esz + 3)) / 64;
+ o = (i << (esz + 3)) % 64;
+ if (o == 0) {
+ tcg_gen_mov_i64(tcg_res[w], tcg_ele);
+ } else {
+ tcg_gen_deposit_i64(tcg_res[w], tcg_res[w], tcg_ele, o, 8 << esz);
+ }
+ }
+
+ for (int i = a->q; i >= 0; --i) {
+ write_vec_element(s, tcg_res[i], a->rd, i, MO_64);
+ }
+ clear_vec_high(s, a->q, a->rd);
+ return true;
+}
+
+static int permute_load_uzp(int i, int part, int elements)
+{
+ return 2 * i + part;
+}
+
+TRANS(UZP1, do_simd_permute, a, permute_load_uzp, 0)
+TRANS(UZP2, do_simd_permute, a, permute_load_uzp, 1)
+
+static int permute_load_trn(int i, int part, int elements)
+{
+ return (i & 1) * elements + (i & ~1) + part;
+}
+
+TRANS(TRN1, do_simd_permute, a, permute_load_trn, 0)
+TRANS(TRN2, do_simd_permute, a, permute_load_trn, 1)
+
+static int permute_load_zip(int i, int part, int elements)
+{
+ return (i & 1) * elements + ((part * elements + i) >> 1);
+}
+
+TRANS(ZIP1, do_simd_permute, a, permute_load_zip, 0)
+TRANS(ZIP2, do_simd_permute, a, permute_load_zip, 1)
+
/*
* Cryptographic AES, SHA, SHA512
*/
@@ -8911,95 +8979,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-/* ZIP/UZP/TRN
- * 31 30 29 24 23 22 21 20 16 15 14 12 11 10 9 5 4 0
- * +---+---+-------------+------+---+------+---+------------------+------+
- * | 0 | Q | 0 0 1 1 1 0 | size | 0 | Rm | 0 | opc | 1 0 | Rn | Rd |
- * +---+---+-------------+------+---+------+---+------------------+------+
- */
-static void disas_simd_zip_trn(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- /* opc field bits [1:0] indicate ZIP/UZP/TRN;
- * bit 2 indicates 1 vs 2 variant of the insn.
- */
- int opcode = extract32(insn, 12, 2);
- bool part = extract32(insn, 14, 1);
- bool is_q = extract32(insn, 30, 1);
- int esize = 8 << size;
- int i;
- int datasize = is_q ? 128 : 64;
- int elements = datasize / esize;
- TCGv_i64 tcg_res[2], tcg_ele;
-
- if (opcode == 0 || (size == 3 && !is_q)) {
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- tcg_res[0] = tcg_temp_new_i64();
- tcg_res[1] = is_q ? tcg_temp_new_i64() : NULL;
- tcg_ele = tcg_temp_new_i64();
-
- for (i = 0; i < elements; i++) {
- int o, w;
-
- switch (opcode) {
- case 1: /* UZP1/2 */
- {
- int midpoint = elements / 2;
- if (i < midpoint) {
- read_vec_element(s, tcg_ele, rn, 2 * i + part, size);
- } else {
- read_vec_element(s, tcg_ele, rm,
- 2 * (i - midpoint) + part, size);
- }
- break;
- }
- case 2: /* TRN1/2 */
- if (i & 1) {
- read_vec_element(s, tcg_ele, rm, (i & ~1) + part, size);
- } else {
- read_vec_element(s, tcg_ele, rn, (i & ~1) + part, size);
- }
- break;
- case 3: /* ZIP1/2 */
- {
- int base = part * elements / 2;
- if (i & 1) {
- read_vec_element(s, tcg_ele, rm, base + (i >> 1), size);
- } else {
- read_vec_element(s, tcg_ele, rn, base + (i >> 1), size);
- }
- break;
- }
- default:
- g_assert_not_reached();
- }
-
- w = (i * esize) / 64;
- o = (i * esize) % 64;
- if (o == 0) {
- tcg_gen_mov_i64(tcg_res[w], tcg_ele);
- } else {
- tcg_gen_shli_i64(tcg_ele, tcg_ele, o);
- tcg_gen_or_i64(tcg_res[w], tcg_res[w], tcg_ele);
- }
- }
-
- for (i = 0; i <= is_q; ++i) {
- write_vec_element(s, tcg_res[i], rd, i, MO_64);
- }
- clear_vec_high(s, is_q, rd);
-}
-
/*
* do_reduction_op helper
*
@@ -11774,7 +11753,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
/* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
{ 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
{ 0x0f000400, 0x9f800400, disas_simd_shift_imm },
- { 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
{ 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 45896902d5..5bd5603cd0 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1145,3 +1145,12 @@ EXT_q 0110 1110 00 0 rm:5 0 imm:4 0 rn:5 rd:5
# Advanced SIMD Table Lookup
TBL_TBX 0 q:1 00 1110 000 rm:5 0 len:2 tbx:1 00 rn:5 rd:5
+
+# Advanced SIMD Permute
+
+UZP1 0.00 1110 .. 0 ..... 0 001 10 ..... ..... @qrrr_e
+UZP2 0.00 1110 .. 0 ..... 0 101 10 ..... ..... @qrrr_e
+TRN1 0.00 1110 .. 0 ..... 0 010 10 ..... ..... @qrrr_e
+TRN2 0.00 1110 .. 0 ..... 0 110 10 ..... ..... @qrrr_e
+ZIP1 0.00 1110 .. 0 ..... 0 011 10 ..... ..... @qrrr_e
+ZIP2 0.00 1110 .. 0 ..... 0 111 10 ..... ..... @qrrr_e
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 05/17] target/arm: Simplify do_reduction_op
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (3 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 04/17] target/arm: Convert UZP, TRN, ZIP " Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree Richard Henderson
` (13 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Use simple shift and add instead of ctpop, ctz, shift and mask.
Unlike SVE, there is no predicate to disable elements.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 40 +++++++++++-----------------------
1 file changed, 13 insertions(+), 27 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index e0314a1253..6d2e1a2d80 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8986,34 +8986,23 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
* important for correct NaN propagation that we do these
* operations in exactly the order specified by the pseudocode.
*
- * This is a recursive function, TCG temps should be freed by the
- * calling function once it is done with the values.
+ * This is a recursive function.
*/
static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn,
- int esize, int size, int vmap, TCGv_ptr fpst)
+ MemOp esz, int ebase, int ecount, TCGv_ptr fpst)
{
- if (esize == size) {
- int element;
- MemOp msize = esize == 16 ? MO_16 : MO_32;
- TCGv_i32 tcg_elem;
-
- /* We should have one register left here */
- assert(ctpop8(vmap) == 1);
- element = ctz32(vmap);
- assert(element < 8);
-
- tcg_elem = tcg_temp_new_i32();
- read_vec_element_i32(s, tcg_elem, rn, element, msize);
+ if (ecount == 1) {
+ TCGv_i32 tcg_elem = tcg_temp_new_i32();
+ read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
return tcg_elem;
} else {
- int bits = size / 2;
- int shift = ctpop8(vmap) / 2;
- int vmap_lo = (vmap >> shift) & vmap;
- int vmap_hi = (vmap & ~vmap_lo);
+ int half = ecount >> 1;
TCGv_i32 tcg_hi, tcg_lo, tcg_res;
- tcg_hi = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, fpst);
- tcg_lo = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, fpst);
+ tcg_hi = do_reduction_op(s, fpopcode, rn, esz,
+ ebase + half, half, fpst);
+ tcg_lo = do_reduction_op(s, fpopcode, rn, esz,
+ ebase, half, fpst);
tcg_res = tcg_temp_new_i32();
switch (fpopcode) {
@@ -9064,7 +9053,6 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
bool is_u = extract32(insn, 29, 1);
bool is_fp = false;
bool is_min = false;
- int esize;
int elements;
int i;
TCGv_i64 tcg_res, tcg_elt;
@@ -9111,8 +9099,7 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
return;
}
- esize = 8 << size;
- elements = (is_q ? 128 : 64) / esize;
+ elements = (is_q ? 16 : 8) >> size;
tcg_res = tcg_temp_new_i64();
tcg_elt = tcg_temp_new_i64();
@@ -9167,9 +9154,8 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
*/
TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
int fpopcode = opcode | is_min << 4 | is_u << 5;
- int vmap = (1 << elements) - 1;
- TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, esize,
- (is_q ? 128 : 64), vmap, fpst);
+ TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, size,
+ 0, elements, fpst);
tcg_gen_extu_i32_i64(tcg_res, tcg_res32);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (4 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 05/17] target/arm: Simplify do_reduction_op Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV " Richard Henderson
` (12 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 140 ++++++++++++---------------------
target/arm/tcg/a64.decode | 12 +++
2 files changed, 61 insertions(+), 91 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6d2e1a2d80..055ba4695e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6753,6 +6753,47 @@ TRANS(FNMADD, do_fmadd, a, true, true)
TRANS(FMSUB, do_fmadd, a, false, true)
TRANS(FNMSUB, do_fmadd, a, true, false)
+/*
+ * Advanced SIMD Across Lanes
+ */
+
+static bool do_int_reduction(DisasContext *s, arg_qrr_e *a, bool widen,
+ MemOp src_sign, NeonGenTwo64OpFn *fn)
+{
+ TCGv_i64 tcg_res, tcg_elt;
+ MemOp src_mop = a->esz | src_sign;
+ int elements = (a->q ? 16 : 8) >> a->esz;
+
+ /* Reject MO_64, and MO_32 without Q: a minimum of 4 elements. */
+ if (elements < 4) {
+ return false;
+ }
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ tcg_res = tcg_temp_new_i64();
+ tcg_elt = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_res, a->rn, 0, src_mop);
+ for (int i = 1; i < elements; i++) {
+ read_vec_element(s, tcg_elt, a->rn, i, src_mop);
+ fn(tcg_res, tcg_res, tcg_elt);
+ }
+
+ tcg_gen_ext_i64(tcg_res, tcg_res, a->esz + widen);
+ write_fp_dreg(s, a->rd, tcg_res);
+ return true;
+}
+
+TRANS(ADDV, do_int_reduction, a, false, 0, tcg_gen_add_i64)
+TRANS(SADDLV, do_int_reduction, a, true, MO_SIGN, tcg_gen_add_i64)
+TRANS(UADDLV, do_int_reduction, a, true, 0, tcg_gen_add_i64)
+TRANS(SMAXV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smax_i64)
+TRANS(UMAXV, do_int_reduction, a, false, 0, tcg_gen_umax_i64)
+TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64)
+TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64)
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9051,27 +9092,10 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
int opcode = extract32(insn, 12, 5);
bool is_q = extract32(insn, 30, 1);
bool is_u = extract32(insn, 29, 1);
- bool is_fp = false;
bool is_min = false;
int elements;
- int i;
- TCGv_i64 tcg_res, tcg_elt;
switch (opcode) {
- case 0x1b: /* ADDV */
- if (is_u) {
- unallocated_encoding(s);
- return;
- }
- /* fall through */
- case 0x3: /* SADDLV, UADDLV */
- case 0xa: /* SMAXV, UMAXV */
- case 0x1a: /* SMINV, UMINV */
- if (size == 3 || (size == 2 && !is_q)) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0xc: /* FMAXNMV, FMINNMV */
case 0xf: /* FMAXV, FMINV */
/* Bit 1 of size field encodes min vs max and the actual size
@@ -9080,7 +9104,6 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
* precision.
*/
is_min = extract32(size, 1, 1);
- is_fp = true;
if (!is_u && dc_isar_feature(aa64_fp16, s)) {
size = 1;
} else if (!is_u || !is_q || extract32(size, 0, 1)) {
@@ -9091,6 +9114,10 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x3: /* SADDLV, UADDLV */
+ case 0xa: /* SMAXV, UMAXV */
+ case 0x1a: /* SMINV, UMINV */
+ case 0x1b: /* ADDV */
unallocated_encoding(s);
return;
}
@@ -9101,52 +9128,7 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
elements = (is_q ? 16 : 8) >> size;
- tcg_res = tcg_temp_new_i64();
- tcg_elt = tcg_temp_new_i64();
-
- /* These instructions operate across all lanes of a vector
- * to produce a single result. We can guarantee that a 64
- * bit intermediate is sufficient:
- * + for [US]ADDLV the maximum element size is 32 bits, and
- * the result type is 64 bits
- * + for FMAX*V, FMIN*V, ADDV the intermediate type is the
- * same as the element size, which is 32 bits at most
- * For the integer operations we can choose to work at 64
- * or 32 bits and truncate at the end; for simplicity
- * we use 64 bits always. The floating point
- * ops do require 32 bit intermediates, though.
- */
- if (!is_fp) {
- read_vec_element(s, tcg_res, rn, 0, size | (is_u ? 0 : MO_SIGN));
-
- for (i = 1; i < elements; i++) {
- read_vec_element(s, tcg_elt, rn, i, size | (is_u ? 0 : MO_SIGN));
-
- switch (opcode) {
- case 0x03: /* SADDLV / UADDLV */
- case 0x1b: /* ADDV */
- tcg_gen_add_i64(tcg_res, tcg_res, tcg_elt);
- break;
- case 0x0a: /* SMAXV / UMAXV */
- if (is_u) {
- tcg_gen_umax_i64(tcg_res, tcg_res, tcg_elt);
- } else {
- tcg_gen_smax_i64(tcg_res, tcg_res, tcg_elt);
- }
- break;
- case 0x1a: /* SMINV / UMINV */
- if (is_u) {
- tcg_gen_umin_i64(tcg_res, tcg_res, tcg_elt);
- } else {
- tcg_gen_smin_i64(tcg_res, tcg_res, tcg_elt);
- }
- break;
- default:
- g_assert_not_reached();
- }
-
- }
- } else {
+ {
/* Floating point vector reduction ops which work across 32
* bit (single) or 16 bit (half-precision) intermediates.
* Note that correct NaN propagation requires that we do these
@@ -9154,34 +9136,10 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
*/
TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
int fpopcode = opcode | is_min << 4 | is_u << 5;
- TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, size,
- 0, elements, fpst);
- tcg_gen_extu_i32_i64(tcg_res, tcg_res32);
+ TCGv_i32 tcg_res = do_reduction_op(s, fpopcode, rn, size,
+ 0, elements, fpst);
+ write_fp_sreg(s, rd, tcg_res);
}
-
- /* Now truncate the result to the width required for the final output */
- if (opcode == 0x03) {
- /* SADDLV, UADDLV: result is 2*esize */
- size++;
- }
-
- switch (size) {
- case 0:
- tcg_gen_ext8u_i64(tcg_res, tcg_res);
- break;
- case 1:
- tcg_gen_ext16u_i64(tcg_res, tcg_res);
- break;
- case 2:
- tcg_gen_ext32u_i64(tcg_res, tcg_res);
- break;
- case 3:
- break;
- default:
- g_assert_not_reached();
- }
-
- write_fp_dreg(s, rd, tcg_res);
}
/* AdvSIMD modified immediate
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 5bd5603cd0..9c182eaff1 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -59,6 +59,8 @@
@rrr_q1e3 ........ ... rm:5 ...... rn:5 rd:5 &qrrr_e q=1 esz=3
@rrrr_q1e3 ........ ... rm:5 . ra:5 rn:5 rd:5 &qrrrr_e q=1 esz=3
+@qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e
+
@qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0
@qrrr_h . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=1
@qrrr_s . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=2
@@ -1154,3 +1156,13 @@ TRN1 0.00 1110 .. 0 ..... 0 010 10 ..... ..... @qrrr_e
TRN2 0.00 1110 .. 0 ..... 0 110 10 ..... ..... @qrrr_e
ZIP1 0.00 1110 .. 0 ..... 0 011 10 ..... ..... @qrrr_e
ZIP2 0.00 1110 .. 0 ..... 0 111 10 ..... ..... @qrrr_e
+
+# Advanced SIMD Across Lanes
+
+ADDV 0.00 1110 .. 11000 11011 10 ..... ..... @qrr_e
+SADDLV 0.00 1110 .. 11000 00011 10 ..... ..... @qrr_e
+UADDLV 0.10 1110 .. 11000 00011 10 ..... ..... @qrr_e
+SMAXV 0.00 1110 .. 11000 01010 10 ..... ..... @qrr_e
+UMAXV 0.10 1110 .. 11000 01010 10 ..... ..... @qrr_e
+SMINV 0.00 1110 .. 11000 11010 10 ..... ..... @qrr_e
+UMINV 0.10 1110 .. 11000 11010 10 ..... ..... @qrr_e
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (5 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) " Richard Henderson
` (11 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 176 ++++++++++-----------------------
target/arm/tcg/a64.decode | 14 +++
2 files changed, 67 insertions(+), 123 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 055ba4695e..2964279c00 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6794,6 +6794,59 @@ TRANS(UMAXV, do_int_reduction, a, false, 0, tcg_gen_umax_i64)
TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64)
TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64)
+/*
+ * do_fp_reduction helper
+ *
+ * This mirrors the Reduce() pseudocode in the ARM ARM. It is
+ * important for correct NaN propagation that we do these
+ * operations in exactly the order specified by the pseudocode.
+ *
+ * This is a recursive function.
+ */
+static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
+ int ebase, int ecount, TCGv_ptr fpst,
+ NeonGenTwoSingleOpFn *fn)
+{
+ if (ecount == 1) {
+ TCGv_i32 tcg_elem = tcg_temp_new_i32();
+ read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
+ return tcg_elem;
+ } else {
+ int half = ecount >> 1;
+ TCGv_i32 tcg_hi, tcg_lo, tcg_res;
+
+ tcg_hi = do_reduction_op(s, rn, esz, ebase + half, half, fpst, fn);
+ tcg_lo = do_reduction_op(s, rn, esz, ebase, half, fpst, fn);
+ tcg_res = tcg_temp_new_i32();
+
+ fn(tcg_res, tcg_lo, tcg_hi, fpst);
+ return tcg_res;
+ }
+}
+
+static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
+ NeonGenTwoSingleOpFn *fn)
+{
+ if (fp_access_check(s)) {
+ MemOp esz = a->esz;
+ int elts = (a->q ? 16 : 8) >> esz;
+ TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+ TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+ write_fp_sreg(s, a->rd, res);
+ }
+ return true;
+}
+
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minh)
+
+TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9020,128 +9073,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-/*
- * do_reduction_op helper
- *
- * This mirrors the Reduce() pseudocode in the ARM ARM. It is
- * important for correct NaN propagation that we do these
- * operations in exactly the order specified by the pseudocode.
- *
- * This is a recursive function.
- */
-static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn,
- MemOp esz, int ebase, int ecount, TCGv_ptr fpst)
-{
- if (ecount == 1) {
- TCGv_i32 tcg_elem = tcg_temp_new_i32();
- read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
- return tcg_elem;
- } else {
- int half = ecount >> 1;
- TCGv_i32 tcg_hi, tcg_lo, tcg_res;
-
- tcg_hi = do_reduction_op(s, fpopcode, rn, esz,
- ebase + half, half, fpst);
- tcg_lo = do_reduction_op(s, fpopcode, rn, esz,
- ebase, half, fpst);
- tcg_res = tcg_temp_new_i32();
-
- switch (fpopcode) {
- case 0x0c: /* fmaxnmv half-precision */
- gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x0f: /* fmaxv half-precision */
- gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x1c: /* fminnmv half-precision */
- gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x1f: /* fminv half-precision */
- gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x2c: /* fmaxnmv */
- gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x2f: /* fmaxv */
- gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x3c: /* fminnmv */
- gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- case 0x3f: /* fminv */
- gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst);
- break;
- default:
- g_assert_not_reached();
- }
- return tcg_res;
- }
-}
-
-/* AdvSIMD across lanes
- * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0
- * +---+---+---+-----------+------+-----------+--------+-----+------+------+
- * | 0 | Q | U | 0 1 1 1 0 | size | 1 1 0 0 0 | opcode | 1 0 | Rn | Rd |
- * +---+---+---+-----------+------+-----------+--------+-----+------+------+
- */
-static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int size = extract32(insn, 22, 2);
- int opcode = extract32(insn, 12, 5);
- bool is_q = extract32(insn, 30, 1);
- bool is_u = extract32(insn, 29, 1);
- bool is_min = false;
- int elements;
-
- switch (opcode) {
- case 0xc: /* FMAXNMV, FMINNMV */
- case 0xf: /* FMAXV, FMINV */
- /* Bit 1 of size field encodes min vs max and the actual size
- * depends on the encoding of the U bit. If not set (and FP16
- * enabled) then we do half-precision float instead of single
- * precision.
- */
- is_min = extract32(size, 1, 1);
- if (!is_u && dc_isar_feature(aa64_fp16, s)) {
- size = 1;
- } else if (!is_u || !is_q || extract32(size, 0, 1)) {
- unallocated_encoding(s);
- return;
- } else {
- size = 2;
- }
- break;
- default:
- case 0x3: /* SADDLV, UADDLV */
- case 0xa: /* SMAXV, UMAXV */
- case 0x1a: /* SMINV, UMINV */
- case 0x1b: /* ADDV */
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- elements = (is_q ? 16 : 8) >> size;
-
- {
- /* Floating point vector reduction ops which work across 32
- * bit (single) or 16 bit (half-precision) intermediates.
- * Note that correct NaN propagation requires that we do these
- * operations in exactly the order specified by the pseudocode.
- */
- TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
- int fpopcode = opcode | is_min << 4 | is_u << 5;
- TCGv_i32 tcg_res = do_reduction_op(s, fpopcode, rn, size,
- 0, elements, fpst);
- write_fp_sreg(s, rd, tcg_res);
- }
-}
-
/* AdvSIMD modified immediate
* 31 30 29 28 19 18 16 15 12 11 10 9 5 4 0
* +---+---+----+---------------------+-----+-------+----+---+-------+------+
@@ -11693,7 +11624,6 @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
- { 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
/* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
{ 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
{ 0x0f000400, 0x9f800400, disas_simd_shift_imm },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 9c182eaff1..117269803d 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -54,11 +54,13 @@
@rrx_d ........ .. . rm:5 .... idx:1 . rn:5 rd:5 &rrx_e esz=3
@rr_q1e0 ........ ........ ...... rn:5 rd:5 &qrr_e q=1 esz=0
+@rr_q1e2 ........ ........ ...... rn:5 rd:5 &qrr_e q=1 esz=2
@r2r_q1e0 ........ ........ ...... rm:5 rd:5 &qrrr_e rn=%rd q=1 esz=0
@rrr_q1e0 ........ ... rm:5 ...... rn:5 rd:5 &qrrr_e q=1 esz=0
@rrr_q1e3 ........ ... rm:5 ...... rn:5 rd:5 &qrrr_e q=1 esz=3
@rrrr_q1e3 ........ ... rm:5 . ra:5 rn:5 rd:5 &qrrrr_e q=1 esz=3
+@qrr_h . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=1
@qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e
@qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0
@@ -1166,3 +1168,15 @@ SMAXV 0.00 1110 .. 11000 01010 10 ..... ..... @qrr_e
UMAXV 0.10 1110 .. 11000 01010 10 ..... ..... @qrr_e
SMINV 0.00 1110 .. 11000 11010 10 ..... ..... @qrr_e
UMINV 0.10 1110 .. 11000 11010 10 ..... ..... @qrr_e
+
+FMAXNMV_h 0.00 1110 00 11000 01100 10 ..... ..... @qrr_h
+FMAXNMV_s 0110 1110 00 11000 01100 10 ..... ..... @rr_q1e2
+
+FMINNMV_h 0.00 1110 10 11000 01100 10 ..... ..... @qrr_h
+FMINNMV_s 0110 1110 10 11000 01100 10 ..... ..... @rr_q1e2
+
+FMAXV_h 0.00 1110 00 11000 01111 10 ..... ..... @qrr_h
+FMAXV_s 0110 1110 00 11000 01111 10 ..... ..... @rr_q1e2
+
+FMINV_h 0.00 1110 10 11000 01111 10 ..... ..... @qrr_h
+FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (6 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV " Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 10:00 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector " Richard Henderson
` (10 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 74 ++++++++++++----------------------
target/arm/tcg/a64.decode | 4 ++
2 files changed, 30 insertions(+), 48 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2964279c00..6582816e4e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6847,6 +6847,31 @@ TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+/*
+ * Floating-point Immediate
+ */
+
+static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a)
+{
+ switch (a->esz) {
+ case MO_32:
+ case MO_64:
+ break;
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ break;
+ default:
+ return false;
+ }
+ if (fp_access_check(s)) {
+ uint64_t imm = vfp_expand_imm(a->esz, a->imm);
+ write_fp_dreg(s, a->rd, tcg_constant_i64(imm));
+ }
+ return true;
+}
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -8584,53 +8609,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
}
}
-/* Floating point immediate
- * 31 30 29 28 24 23 22 21 20 13 12 10 9 5 4 0
- * +---+---+---+-----------+------+---+------------+-------+------+------+
- * | M | 0 | S | 1 1 1 1 0 | type | 1 | imm8 | 1 0 0 | imm5 | Rd |
- * +---+---+---+-----------+------+---+------------+-------+------+------+
- */
-static void disas_fp_imm(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int imm5 = extract32(insn, 5, 5);
- int imm8 = extract32(insn, 13, 8);
- int type = extract32(insn, 22, 2);
- int mos = extract32(insn, 29, 3);
- uint64_t imm;
- MemOp sz;
-
- if (mos || imm5) {
- unallocated_encoding(s);
- return;
- }
-
- switch (type) {
- case 0:
- sz = MO_32;
- break;
- case 1:
- sz = MO_64;
- break;
- case 3:
- sz = MO_16;
- if (dc_isar_feature(aa64_fp16, s)) {
- break;
- }
- /* fallthru */
- default:
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- imm = vfp_expand_imm(sz, imm8);
- write_fp_dreg(s, rd, tcg_constant_i64(imm));
-}
-
/* Handle floating point <=> fixed point conversions. Note that we can
* also deal with fp <=> integer conversions as a special case (scale == 64)
* OPTME: consider handling that special case specially or at least skipping
@@ -9050,7 +9028,7 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
switch (ctz32(extract32(insn, 12, 4))) {
case 0: /* [15:12] == xxx1 */
/* Floating point immediate */
- disas_fp_imm(s, insn);
+ unallocated_encoding(s); /* in decodetree */
break;
case 1: /* [15:12] == xx10 */
/* Floating point compare */
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 117269803d..de763d3f12 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1180,3 +1180,7 @@ FMAXV_s 0110 1110 00 11000 01111 10 ..... ..... @rr_q1e2
FMINV_h 0.00 1110 10 11000 01111 10 ..... ..... @qrr_h
FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2
+
+# Floating-point Immediate
+
+FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (7 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) " Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr Richard Henderson
` (9 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 117 ++++++++++++++-------------------
target/arm/tcg/a64.decode | 9 +++
2 files changed, 59 insertions(+), 67 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6582816e4e..1fa9dc3172 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6872,6 +6872,52 @@ static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a)
return true;
}
+/*
+ * Advanced SIMD Modified Immediate
+ */
+
+static bool trans_FMOVI_v_h(DisasContext *s, arg_FMOVI_v_h *a)
+{
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ tcg_gen_gvec_dup_imm(MO_16, vec_full_reg_offset(s, a->rd),
+ a->q ? 16 : 8, vec_full_reg_size(s),
+ vfp_expand_imm(MO_16, a->abcdefgh));
+ }
+ return true;
+}
+
+static void gen_movi(unsigned vece, uint32_t dofs, uint32_t aofs,
+ int64_t c, uint32_t oprsz, uint32_t maxsz)
+{
+ tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, c);
+}
+
+static bool trans_Vimm(DisasContext *s, arg_Vimm *a)
+{
+ GVecGen2iFn *fn;
+
+ /* Handle decode of cmode/op here between ORR/BIC/MOVI */
+ if ((a->cmode & 1) && a->cmode < 12) {
+ /* For op=1, the imm will be inverted, so BIC becomes AND. */
+ fn = a->op ? tcg_gen_gvec_andi : tcg_gen_gvec_ori;
+ } else {
+ /* There is one unallocated cmode/op combination in this space */
+ if (a->cmode == 15 && a->op == 1 && a->q == 0) {
+ return false;
+ }
+ fn = gen_movi;
+ }
+
+ if (fp_access_check(s)) {
+ uint64_t imm = asimd_imm_const(a->abcdefgh, a->cmode, a->op);
+ gen_gvec_fn2i(s, a->q, a->rd, a->rd, imm, fn, MO_64);
+ }
+ return true;
+}
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9051,69 +9097,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD modified immediate
- * 31 30 29 28 19 18 16 15 12 11 10 9 5 4 0
- * +---+---+----+---------------------+-----+-------+----+---+-------+------+
- * | 0 | Q | op | 0 1 1 1 1 0 0 0 0 0 | abc | cmode | o2 | 1 | defgh | Rd |
- * +---+---+----+---------------------+-----+-------+----+---+-------+------+
- *
- * There are a number of operations that can be carried out here:
- * MOVI - move (shifted) imm into register
- * MVNI - move inverted (shifted) imm into register
- * ORR - bitwise OR of (shifted) imm with register
- * BIC - bitwise clear of (shifted) imm with register
- * With ARMv8.2 we also have:
- * FMOV half-precision
- */
-static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int cmode = extract32(insn, 12, 4);
- int o2 = extract32(insn, 11, 1);
- uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
- bool is_neg = extract32(insn, 29, 1);
- bool is_q = extract32(insn, 30, 1);
- uint64_t imm = 0;
-
- if (o2) {
- if (cmode != 0xf || is_neg) {
- unallocated_encoding(s);
- return;
- }
- /* FMOV (vector, immediate) - half-precision */
- if (!dc_isar_feature(aa64_fp16, s)) {
- unallocated_encoding(s);
- return;
- }
- imm = vfp_expand_imm(MO_16, abcdefgh);
- /* now duplicate across the lanes */
- imm = dup_const(MO_16, imm);
- } else {
- if (cmode == 0xf && is_neg && !is_q) {
- unallocated_encoding(s);
- return;
- }
- imm = asimd_imm_const(abcdefgh, cmode, is_neg);
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
- /* MOVI or MVNI, with MVNI negation handled above. */
- tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8,
- vec_full_reg_size(s), imm);
- } else {
- /* ORR or BIC, with BIC negation to AND handled above. */
- if (is_neg) {
- gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_andi, MO_64);
- } else {
- gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_ori, MO_64);
- }
- }
-}
-
/*
* Common SSHR[RA]/USHR[RA] - Shift right (optional rounding/accumulate)
*
@@ -10593,8 +10576,10 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
bool is_u = extract32(insn, 29, 1);
bool is_q = extract32(insn, 30, 1);
- /* data_proc_simd[] has sent immh == 0 to disas_simd_mod_imm. */
- assert(immh != 0);
+ if (immh == 0) {
+ unallocated_encoding(s);
+ return;
+ }
switch (opcode) {
case 0x08: /* SRI */
@@ -11602,8 +11587,6 @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
- /* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
- { 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
{ 0x0f000400, 0x9f800400, disas_simd_shift_imm },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index de763d3f12..d4dfc5f772 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1184,3 +1184,12 @@ FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2
# Floating-point Immediate
FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd
+
+# Advanced SIMD Modified Immediate
+
+%abcdefgh 16:3 5:5
+
+FMOVI_v_h 0 q:1 00 1111 00000 ... 1111 11 ..... rd:5 %abcdefgh
+
+# MOVI, MVNI, ORR, BIC, FMOV are all intermixed via cmode.
+Vimm 0 q:1 op:1 0 1111 00000 ... cmode:4 01 ..... rd:5 %abcdefgh
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (8 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector " Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64 Richard Henderson
` (8 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Handle the two special cases within these new
functions instead of higher in the call stack.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate.h | 5 +++++
target/arm/tcg/gengvec.c | 19 +++++++++++++++++++
target/arm/tcg/translate-a64.c | 16 +---------------
target/arm/tcg/translate-neon.c | 25 ++-----------------------
4 files changed, 27 insertions(+), 38 deletions(-)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index a8672c857c..d1a836ca6f 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -514,6 +514,11 @@ void gen_sqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+ int64_t shift, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+ int64_t shift, uint32_t opr_sz, uint32_t max_sz);
+
void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
int64_t shift, uint32_t opr_sz, uint32_t max_sz);
void gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 56a1dc1f75..47ac2634ce 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -88,6 +88,25 @@ GEN_CMP0(gen_gvec_cgt0, TCG_COND_GT)
#undef GEN_CMP0
+void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+ int64_t shift, uint32_t opr_sz, uint32_t max_sz)
+{
+ /* Signed shift out of range results in all-sign-bits */
+ shift = MIN(shift, (8 << vece) - 1);
+ tcg_gen_gvec_sari(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz);
+}
+
+void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+ int64_t shift, uint32_t opr_sz, uint32_t max_sz)
+{
+ /* Unsigned shift out of range results in all-zero-bits */
+ if (shift >= (8 << vece)) {
+ tcg_gen_gvec_dup_imm(vece, rd_ofs, opr_sz, max_sz, 0);
+ } else {
+ tcg_gen_gvec_shri(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz);
+ }
+}
+
static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
{
tcg_gen_vec_sar8i_i64(a, a, shift);
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1fa9dc3172..d0a3450d75 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10411,21 +10411,7 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
break;
case 0x00: /* SSHR / USHR */
- if (is_u) {
- if (shift == 8 << size) {
- /* Shift count the same size as element size produces zero. */
- tcg_gen_gvec_dup_imm(size, vec_full_reg_offset(s, rd),
- is_q ? 16 : 8, vec_full_reg_size(s), 0);
- return;
- }
- gvec_fn = tcg_gen_gvec_shri;
- } else {
- /* Shift count the same size as element size produces all sign. */
- if (shift == 8 << size) {
- shift -= 1;
- }
- gvec_fn = tcg_gen_gvec_sari;
- }
+ gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr;
break;
case 0x04: /* SRSHR / URSHR (rounding) */
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 915c9e56db..05d4016633 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -1068,29 +1068,8 @@ DO_2SH(VRSHR_S, gen_gvec_srshr)
DO_2SH(VRSHR_U, gen_gvec_urshr)
DO_2SH(VRSRA_S, gen_gvec_srsra)
DO_2SH(VRSRA_U, gen_gvec_ursra)
-
-static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a)
-{
- /* Signed shift out of range results in all-sign-bits */
- a->shift = MIN(a->shift, (8 << a->size) - 1);
- return do_vector_2sh(s, a, tcg_gen_gvec_sari);
-}
-
-static void gen_zero_rd_2sh(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
- int64_t shift, uint32_t oprsz, uint32_t maxsz)
-{
- tcg_gen_gvec_dup_imm(vece, rd_ofs, oprsz, maxsz, 0);
-}
-
-static bool trans_VSHR_U_2sh(DisasContext *s, arg_2reg_shift *a)
-{
- /* Shift out of range is architecturally valid and results in zero. */
- if (a->shift >= (8 << a->size)) {
- return do_vector_2sh(s, a, gen_zero_rd_2sh);
- } else {
- return do_vector_2sh(s, a, tcg_gen_gvec_shri);
- }
-}
+DO_2SH(VSHR_S, gen_gvec_sshr)
+DO_2SH(VSHR_U, gen_gvec_ushr)
static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
NeonGenTwo64OpEnvFn *fn)
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (9 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 10:00 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree Richard Henderson
` (7 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/gengvec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 47ac2634ce..b6c0d86bad 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -304,7 +304,7 @@ void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
tcg_gen_add_i32(d, d, t);
}
- void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
+void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
{
TCGv_i64 t = tcg_temp_new_i64();
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (10 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64 Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-08-12 13:14 ` Peter Maydell
2024-07-17 6:08 ` [PATCH 13/17] target/arm: Convet handle_vec_simd_shli " Richard Henderson
` (6 subsequent siblings)
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 109 +++++++++++++++------------------
target/arm/tcg/a64.decode | 27 +++++++-
2 files changed, 74 insertions(+), 62 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d0a3450d75..1e482477c5 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -68,6 +68,22 @@ static int scale_by_log2_tag_granule(DisasContext *s, int x)
return x << LOG2_TAG_GRANULE;
}
+/*
+ * For Advanced SIMD shift by immediate, extract esz from immh.
+ * The result must be validated by the translator: MO_8 <= x <= MO_64.
+ */
+static int esz_immh(DisasContext *s, int x)
+{
+ return 32 - clz32(x) - 1;
+}
+
+/* For Advanced SIMD shift by immediate, right shift count. */
+static int rcount_immhb(DisasContext *s, int x)
+{
+ int size = esz_immh(s, x >> 3);
+ return (16 << size) - x;
+}
+
/*
* Include the generated decoders.
*/
@@ -6918,6 +6934,35 @@ static bool trans_Vimm(DisasContext *s, arg_Vimm *a)
return true;
}
+/*
+ * Advanced SIMD Shift by Immediate
+ */
+
+static bool do_vec_shift_imm(DisasContext *s, arg_qrri_e *a, GVecGen2iFn *fn)
+{
+ /* Validate result of esz_immh, for invalid immh == 0. */
+ if (a->esz < 0) {
+ return false;
+ }
+ if (a->esz == MO_64 && !a->q) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ gen_gvec_fn2i(s, a->q, a->rd, a->rn, a->imm, fn, a->esz);
+ }
+ return true;
+}
+
+TRANS(SSHR_v, do_vec_shift_imm, a, gen_gvec_sshr)
+TRANS(USHR_v, do_vec_shift_imm, a, gen_gvec_ushr)
+TRANS(SSRA_v, do_vec_shift_imm, a, gen_gvec_ssra)
+TRANS(USRA_v, do_vec_shift_imm, a, gen_gvec_usra)
+TRANS(SRSHR_v, do_vec_shift_imm, a, gen_gvec_srshr)
+TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr)
+TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra)
+TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra)
+TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -10382,53 +10427,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
}
}
-/* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
-static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
- int immh, int immb, int opcode, int rn, int rd)
-{
- int size = 32 - clz32(immh) - 1;
- int immhb = immh << 3 | immb;
- int shift = 2 * (8 << size) - immhb;
- GVecGen2iFn *gvec_fn;
-
- if (extract32(immh, 3, 1) && !is_q) {
- unallocated_encoding(s);
- return;
- }
- tcg_debug_assert(size <= 3);
-
- if (!fp_access_check(s)) {
- return;
- }
-
- switch (opcode) {
- case 0x02: /* SSRA / USRA (accumulate) */
- gvec_fn = is_u ? gen_gvec_usra : gen_gvec_ssra;
- break;
-
- case 0x08: /* SRI */
- gvec_fn = gen_gvec_sri;
- break;
-
- case 0x00: /* SSHR / USHR */
- gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr;
- break;
-
- case 0x04: /* SRSHR / URSHR (rounding) */
- gvec_fn = is_u ? gen_gvec_urshr : gen_gvec_srshr;
- break;
-
- case 0x06: /* SRSRA / URSRA (accum + rounding) */
- gvec_fn = is_u ? gen_gvec_ursra : gen_gvec_srsra;
- break;
-
- default:
- g_assert_not_reached();
- }
-
- gen_gvec_fn2i(s, is_q, rd, rn, shift, gvec_fn, size);
-}
-
/* SHL/SLI - Vector shift left */
static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
int immh, int immb, int opcode, int rn, int rd)
@@ -10568,18 +10566,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x08: /* SRI */
- if (!is_u) {
- unallocated_encoding(s);
- return;
- }
- /* fall through */
- case 0x00: /* SSHR / USHR */
- case 0x02: /* SSRA / USRA (accumulate) */
- case 0x04: /* SRSHR / URSHR (rounding) */
- case 0x06: /* SRSRA / URSRA (accum + rounding) */
- handle_vec_simd_shri(s, is_q, is_u, immh, immb, opcode, rn, rd);
- break;
case 0x0a: /* SHL / SLI */
handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
break;
@@ -10618,6 +10604,11 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
handle_simd_shift_fpint_conv(s, false, is_q, is_u, immh, immb, rn, rd);
return;
default:
+ case 0x00: /* SSHR / USHR */
+ case 0x02: /* SSRA / USRA (accumulate) */
+ case 0x04: /* SRSHR / URSHR (rounding) */
+ case 0x06: /* SRSRA / URSRA (accum + rounding) */
+ case 0x08: /* SRI */
unallocated_encoding(s);
return;
}
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index d4dfc5f772..c525f5fc35 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -34,6 +34,7 @@
&rrx_e rd rn rm idx esz
&rrrr_e rd rn rm ra esz
&qrr_e q rd rn esz
+&qrri_e q rd rn imm esz
&qrrr_e q rd rn rm esz
&qrrx_e q rd rn rm idx esz
&qrrrr_e q rd rn rm ra esz
@@ -1185,11 +1186,31 @@ FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2
FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd
-# Advanced SIMD Modified Immediate
+# Advanced SIMD Modified Immediate / Shift by Immediate
%abcdefgh 16:3 5:5
+%esz_immh 19:4 !function=esz_immh
+%rcount_immhb 16:7 !function=rcount_immhb
+
+@qrshifti . q:1 .. ..... .... ... ..... . rn:5 rd:5 \
+ &qrri_e esz=%esz_immh imm=%rcount_immhb
FMOVI_v_h 0 q:1 00 1111 00000 ... 1111 11 ..... rd:5 %abcdefgh
-# MOVI, MVNI, ORR, BIC, FMOV are all intermixed via cmode.
-Vimm 0 q:1 op:1 0 1111 00000 ... cmode:4 01 ..... rd:5 %abcdefgh
+{
+ # MOVI, MVNI, ORR, BIC, FMOV are all intermixed via cmode.
+ Vimm 0 q:1 op:1 0 1111 00000 ... cmode:4 01 ..... rd:5 %abcdefgh
+
+ # Shift by immediate requires immh==0, consumed by Vimm above.
+ [
+ SSHR_v 0.00 11110 .... ... 00000 1 ..... ..... @qrshifti
+ USHR_v 0.10 11110 .... ... 00000 1 ..... ..... @qrshifti
+ SSRA_v 0.00 11110 .... ... 00010 1 ..... ..... @qrshifti
+ USRA_v 0.10 11110 .... ... 00010 1 ..... ..... @qrshifti
+ SRSHR_v 0.00 11110 .... ... 00100 1 ..... ..... @qrshifti
+ URSHR_v 0.10 11110 .... ... 00100 1 ..... ..... @qrshifti
+ SRSRA_v 0.00 11110 .... ... 00110 1 ..... ..... @qrshifti
+ URSRA_v 0.10 11110 .... ... 00110 1 ..... ..... @qrshifti
+ SRI_v 0.10 11110 .... ... 01000 1 ..... ..... @qrshifti
+ ]
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 13/17] target/arm: Convet handle_vec_simd_shli to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (11 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree Richard Henderson
@ 2024-07-17 6:08 ` Richard Henderson
2024-07-17 6:09 ` [PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli Richard Henderson
` (5 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
This includes SHL and SLI.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 40 +++++++++-------------------------
target/arm/tcg/a64.decode | 6 +++++
2 files changed, 16 insertions(+), 30 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1e482477c5..fd90752dee 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -84,6 +84,13 @@ static int rcount_immhb(DisasContext *s, int x)
return (16 << size) - x;
}
+/* For Advanced SIMD shift by immediate, left shift count. */
+static int lcount_immhb(DisasContext *s, int x)
+{
+ int size = esz_immh(s, x >> 3);
+ return x - (8 << size);
+}
+
/*
* Include the generated decoders.
*/
@@ -6962,6 +6969,8 @@ TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr)
TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra)
TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra)
TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
+TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli)
+TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli);
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
@@ -10427,33 +10436,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
}
}
-/* SHL/SLI - Vector shift left */
-static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
- int immh, int immb, int opcode, int rn, int rd)
-{
- int size = 32 - clz32(immh) - 1;
- int immhb = immh << 3 | immb;
- int shift = immhb - (8 << size);
-
- /* Range of size is limited by decode: immh is a non-zero 4 bit field */
- assert(size >= 0 && size <= 3);
-
- if (extract32(immh, 3, 1) && !is_q) {
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- if (insert) {
- gen_gvec_fn2i(s, is_q, rd, rn, shift, gen_gvec_sli, size);
- } else {
- gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shli, size);
- }
-}
-
/* USHLL/SHLL - Vector shift left with widening */
static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
int immh, int immb, int opcode, int rn, int rd)
@@ -10566,9 +10548,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0a: /* SHL / SLI */
- handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
- break;
case 0x10: /* SHRN */
case 0x11: /* RSHRN / SQRSHRUN */
if (is_u) {
@@ -10609,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
case 0x04: /* SRSHR / URSHR (rounding) */
case 0x06: /* SRSRA / URSRA (accum + rounding) */
case 0x08: /* SRI */
+ case 0x0a: /* SHL / SLI */
unallocated_encoding(s);
return;
}
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index c525f5fc35..6aa8a18240 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1191,9 +1191,12 @@ FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd
%abcdefgh 16:3 5:5
%esz_immh 19:4 !function=esz_immh
%rcount_immhb 16:7 !function=rcount_immhb
+%lcount_immhb 16:7 !function=lcount_immhb
@qrshifti . q:1 .. ..... .... ... ..... . rn:5 rd:5 \
&qrri_e esz=%esz_immh imm=%rcount_immhb
+@qlshifti . q:1 .. ..... .... ... ..... . rn:5 rd:5 \
+ &qrri_e esz=%esz_immh imm=%lcount_immhb
FMOVI_v_h 0 q:1 00 1111 00000 ... 1111 11 ..... rd:5 %abcdefgh
@@ -1212,5 +1215,8 @@ FMOVI_v_h 0 q:1 00 1111 00000 ... 1111 11 ..... rd:5 %abcdefgh
SRSRA_v 0.00 11110 .... ... 00110 1 ..... ..... @qrshifti
URSRA_v 0.10 11110 .... ... 00110 1 ..... ..... @qrshifti
SRI_v 0.10 11110 .... ... 01000 1 ..... ..... @qrshifti
+
+ SHL_v 0.00 11110 .... ... 01010 1 ..... ..... @qlshifti
+ SLI_v 0.10 11110 .... ... 01010 1 ..... ..... @qlshifti
]
}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (12 preceding siblings ...)
2024-07-17 6:08 ` [PATCH 13/17] target/arm: Convet handle_vec_simd_shli " Richard Henderson
@ 2024-07-17 6:09 ` Richard Henderson
2024-07-17 6:09 ` [PATCH 15/17] target/arm: Use {,s}extract " Richard Henderson
` (4 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:09 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable
AdvSIMD instructions are supposed to zero bits beyond 128.
Affects SSHLL, USHLL, SSHLL2, USHLL2.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index fd90752dee..d0ad6c90bc 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10471,6 +10471,7 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
write_vec_element(s, tcg_rd, rd, i, size + 1);
}
+ clear_vec_high(s, true, rd);
}
/* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 15/17] target/arm: Use {,s}extract in handle_vec_simd_wshli
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (13 preceding siblings ...)
2024-07-17 6:09 ` [PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli Richard Henderson
@ 2024-07-17 6:09 ` Richard Henderson
2024-07-17 6:09 ` [PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree Richard Henderson
` (3 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:09 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Combine the right shift with the extension via
the tcg extract operations.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d0ad6c90bc..627d4311bb 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10466,8 +10466,11 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
for (i = 0; i < elements; i++) {
- tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize);
- ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0);
+ if (is_u) {
+ tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
+ } else {
+ tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
+ }
tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
write_vec_element(s, tcg_rd, rd, i, size + 1);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (14 preceding siblings ...)
2024-07-17 6:09 ` [PATCH 15/17] target/arm: Use {,s}extract " Richard Henderson
@ 2024-07-17 6:09 ` Richard Henderson
2024-07-17 6:09 ` [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc Richard Henderson
` (2 subsequent siblings)
18 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:09 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 84 ++++++++++++++++------------------
target/arm/tcg/a64.decode | 3 ++
2 files changed, 43 insertions(+), 44 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 627d4311bb..2a9cb3fbe0 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6972,6 +6972,45 @@ TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli)
TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli);
+static bool do_vec_shift_imm_wide(DisasContext *s, arg_qrri_e *a, bool is_u)
+{
+ TCGv_i64 tcg_rn, tcg_rd;
+ int esz = a->esz;
+ int esize;
+
+ if (esz < 0 || esz >= MO_64) {
+ return false;
+ }
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ /*
+ * For the LL variants the store is larger than the load,
+ * so if rd == rn we would overwrite parts of our input.
+ * So load everything right now and use shifts in the main loop.
+ */
+ tcg_rd = tcg_temp_new_i64();
+ tcg_rn = tcg_temp_new_i64();
+ read_vec_element(s, tcg_rn, a->rn, a->q, MO_64);
+
+ esize = 8 << esz;
+ for (int i = 0, elements = 8 >> esz; i < elements; i++) {
+ if (is_u) {
+ tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
+ } else {
+ tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
+ }
+ tcg_gen_shli_i64(tcg_rd, tcg_rd, a->imm);
+ write_vec_element(s, tcg_rd, a->rd, i, esz + 1);
+ }
+ clear_vec_high(s, true, a->rd);
+ return true;
+}
+
+TRANS(SSHLL_v, do_vec_shift_imm_wide, a, false)
+TRANS(USHLL_v, do_vec_shift_imm_wide, a, true)
+
/* Shift a TCGv src by TCGv shift_amount, put result in dst.
* Note that it is the caller's responsibility to ensure that the
* shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -10436,47 +10475,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
}
}
-/* USHLL/SHLL - Vector shift left with widening */
-static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
- int immh, int immb, int opcode, int rn, int rd)
-{
- int size = 32 - clz32(immh) - 1;
- int immhb = immh << 3 | immb;
- int shift = immhb - (8 << size);
- int dsize = 64;
- int esize = 8 << size;
- int elements = dsize/esize;
- TCGv_i64 tcg_rn = tcg_temp_new_i64();
- TCGv_i64 tcg_rd = tcg_temp_new_i64();
- int i;
-
- if (size >= 3) {
- unallocated_encoding(s);
- return;
- }
-
- if (!fp_access_check(s)) {
- return;
- }
-
- /* For the LL variants the store is larger than the load,
- * so if rd == rn we would overwrite parts of our input.
- * So load everything right now and use shifts in the main loop.
- */
- read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
-
- for (i = 0; i < elements; i++) {
- if (is_u) {
- tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
- } else {
- tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
- }
- tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
- write_vec_element(s, tcg_rd, rd, i, size + 1);
- }
- clear_vec_high(s, true, rd);
-}
-
/* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */
static void handle_vec_simd_shrn(DisasContext *s, bool is_q,
int immh, int immb, int opcode, int rn, int rd)
@@ -10566,9 +10564,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
handle_vec_simd_sqshrn(s, false, is_q, is_u, is_u, immh, immb,
opcode, rn, rd);
break;
- case 0x14: /* SSHLL / USHLL */
- handle_vec_simd_wshli(s, is_q, is_u, immh, immb, opcode, rn, rd);
- break;
case 0x1c: /* SCVTF / UCVTF */
handle_simd_shift_intfp_conv(s, false, is_q, is_u, immh, immb,
opcode, rn, rd);
@@ -10593,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
case 0x06: /* SRSRA / URSRA (accum + rounding) */
case 0x08: /* SRI */
case 0x0a: /* SHL / SLI */
+ case 0x14: /* SSHLL / USHLL */
unallocated_encoding(s);
return;
}
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6aa8a18240..d13d680589 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1218,5 +1218,8 @@ FMOVI_v_h 0 q:1 00 1111 00000 ... 1111 11 ..... rd:5 %abcdefgh
SHL_v 0.00 11110 .... ... 01010 1 ..... ..... @qlshifti
SLI_v 0.10 11110 .... ... 01010 1 ..... ..... @qlshifti
+
+ SSHLL_v 0.00 11110 .... ... 10100 1 ..... ..... @qlshifti
+ USHLL_v 0.10 11110 .... ... 10100 1 ..... ..... @qlshifti
]
}
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (15 preceding siblings ...)
2024-07-17 6:09 ` [PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree Richard Henderson
@ 2024-07-17 6:09 ` Richard Henderson
2024-07-17 10:02 ` Philippe Mathieu-Daudé
2024-08-11 17:40 ` [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Michael Tokarev
2024-08-12 15:14 ` Peter Maydell
18 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-17 6:09 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
We always pass the same value for round; compute it
within common code.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 32 ++++++--------------------------
1 file changed, 6 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2a9cb3fbe0..f4ff698257 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9197,11 +9197,10 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn)
* the vector and scalar code.
*/
static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
- TCGv_i64 tcg_rnd, bool accumulate,
+ bool round, bool accumulate,
bool is_u, int size, int shift)
{
bool extended_result = false;
- bool round = tcg_rnd != NULL;
int ext_lshift = 0;
TCGv_i64 tcg_src_hi;
@@ -9219,6 +9218,7 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
/* Deal with the rounding step */
if (round) {
+ TCGv_i64 tcg_rnd = tcg_constant_i64(1ull << (shift - 1));
if (extended_result) {
TCGv_i64 tcg_zero = tcg_constant_i64(0);
if (!is_u) {
@@ -9286,7 +9286,6 @@ static void handle_scalar_simd_shri(DisasContext *s,
bool insert = false;
TCGv_i64 tcg_rn;
TCGv_i64 tcg_rd;
- TCGv_i64 tcg_round;
if (!extract32(immh, 3, 1)) {
unallocated_encoding(s);
@@ -9312,12 +9311,6 @@ static void handle_scalar_simd_shri(DisasContext *s,
break;
}
- if (round) {
- tcg_round = tcg_constant_i64(1ULL << (shift - 1));
- } else {
- tcg_round = NULL;
- }
-
tcg_rn = read_fp_dreg(s, rn);
tcg_rd = (accumulate || insert) ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
@@ -9331,7 +9324,7 @@ static void handle_scalar_simd_shri(DisasContext *s,
tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, 0, esize - shift);
}
} else {
- handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+ handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
accumulate, is_u, size, shift);
}
@@ -9384,7 +9377,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
int elements = is_scalar ? 1 : (64 / esize);
bool round = extract32(opcode, 0, 1);
MemOp ldop = (size + 1) | (is_u_shift ? 0 : MO_SIGN);
- TCGv_i64 tcg_rn, tcg_rd, tcg_round;
+ TCGv_i64 tcg_rn, tcg_rd;
TCGv_i32 tcg_rd_narrowed;
TCGv_i64 tcg_final;
@@ -9429,15 +9422,9 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q,
tcg_rd_narrowed = tcg_temp_new_i32();
tcg_final = tcg_temp_new_i64();
- if (round) {
- tcg_round = tcg_constant_i64(1ULL << (shift - 1));
- } else {
- tcg_round = NULL;
- }
-
for (i = 0; i < elements; i++) {
read_vec_element(s, tcg_rn, rn, i, ldop);
- handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+ handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
false, is_u_shift, size+1, shift);
narrowfn(tcg_rd_narrowed, tcg_env, tcg_rd);
tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
@@ -10487,7 +10474,6 @@ static void handle_vec_simd_shrn(DisasContext *s, bool is_q,
int shift = (2 * esize) - immhb;
bool round = extract32(opcode, 0, 1);
TCGv_i64 tcg_rn, tcg_rd, tcg_final;
- TCGv_i64 tcg_round;
int i;
if (extract32(immh, 3, 1)) {
@@ -10504,15 +10490,9 @@ static void handle_vec_simd_shrn(DisasContext *s, bool is_q,
tcg_final = tcg_temp_new_i64();
read_vec_element(s, tcg_final, rd, is_q ? 1 : 0, MO_64);
- if (round) {
- tcg_round = tcg_constant_i64(1ULL << (shift - 1));
- } else {
- tcg_round = NULL;
- }
-
for (i = 0; i < elements; i++) {
read_vec_element(s, tcg_rn, rn, i, size+1);
- handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+ handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
false, true, size+1, shift);
tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
--
2.43.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT
2024-07-17 6:08 ` [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT Richard Henderson
@ 2024-07-17 9:50 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 9:50 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:08, Richard Henderson wrote:
> The extract2 tcg op performs the same operation
> as the do_ext64 function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 23 +++--------------------
> 1 file changed, 3 insertions(+), 20 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 02/17] target/arm: Convert EXT to decodetree
2024-07-17 6:08 ` [PATCH 02/17] target/arm: Convert EXT to decodetree Richard Henderson
@ 2024-07-17 9:55 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 9:55 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:08, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 121 +++++++++++++--------------------
> target/arm/tcg/a64.decode | 5 ++
> 2 files changed, 53 insertions(+), 73 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 03/17] target/arm: Convert TBL, TBX to decodetree
2024-07-17 6:08 ` [PATCH 03/17] target/arm: Convert TBL, TBX " Richard Henderson
@ 2024-07-17 9:56 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 9:56 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:08, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 47 ++++++++++------------------------
> target/arm/tcg/a64.decode | 4 +++
> 2 files changed, 18 insertions(+), 33 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) to decodetree
2024-07-17 6:08 ` [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) " Richard Henderson
@ 2024-07-17 10:00 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 10:00 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:08, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 74 ++++++++++++----------------------
> target/arm/tcg/a64.decode | 4 ++
> 2 files changed, 30 insertions(+), 48 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64
2024-07-17 6:08 ` [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64 Richard Henderson
@ 2024-07-17 10:00 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 10:00 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:08, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/gengvec.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc
2024-07-17 6:09 ` [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc Richard Henderson
@ 2024-07-17 10:02 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-07-17 10:02 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
On 17/7/24 08:09, Richard Henderson wrote:
> We always pass the same value for round; compute it
> within common code.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 32 ++++++--------------------------
> 1 file changed, 6 insertions(+), 26 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (16 preceding siblings ...)
2024-07-17 6:09 ` [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc Richard Henderson
@ 2024-08-11 17:40 ` Michael Tokarev
2024-08-12 14:11 ` Peter Maydell
2024-08-12 15:14 ` Peter Maydell
18 siblings, 1 reply; 29+ messages in thread
From: Michael Tokarev @ 2024-08-11 17:40 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm
17.07.2024 09:08, Richard Henderson wrote:
> Flush before the queue gets too big.
> Also, there's a bug fix in patch 14.
Hi!
Has this patchset (together with the bugfix) been forgotten?
Maybe we should include at least the bug fix for 9.1?
Thanks,
/mjt
> r~
>
> Richard Henderson (17):
> target/arm: Use tcg_gen_extract2_i64 for EXT
> target/arm: Convert EXT to decodetree
> target/arm: Convert TBL, TBX to decodetree
> target/arm: Convert UZP, TRN, ZIP to decodetree
> target/arm: Simplify do_reduction_op
> target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
> target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
> target/arm: Convert FMOVI (scalar, immediate) to decodetree
> target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to
> decodetree
> target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
> target/arm: Fix whitespace near gen_srshr64_i64
> target/arm: Convert handle_vec_simd_shri to decodetree
> target/arm: Convet handle_vec_simd_shli to decodetree
> target/arm: Clear high SVE elements in handle_vec_simd_wshli
> target/arm: Use {,s}extract in handle_vec_simd_wshli
> target/arm: Convert SSHLL, USHLL to decodetree
> target/arm: Push tcg_rnd into handle_shri_with_rndacc
>
> target/arm/tcg/translate.h | 5 +
> target/arm/tcg/gengvec.c | 21 +-
> target/arm/tcg/translate-a64.c | 1123 +++++++++++--------------------
> target/arm/tcg/translate-neon.c | 25 +-
> target/arm/tcg/a64.decode | 87 +++
> 5 files changed, 520 insertions(+), 741 deletions(-)
>
--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E 9D8B E14E 3F2A 9DD7 9199 28F1 61AD 3D98 ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree
2024-07-17 6:08 ` [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree Richard Henderson
@ 2024-08-12 13:14 ` Peter Maydell
2024-08-12 22:07 ` Richard Henderson
0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2024-08-12 13:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Wed, 17 Jul 2024 at 07:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-a64.c | 109 +++++++++++++++------------------
> target/arm/tcg/a64.decode | 27 +++++++-
> 2 files changed, 74 insertions(+), 62 deletions(-)
>
> diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
> index d0a3450d75..1e482477c5 100644
> --- a/target/arm/tcg/translate-a64.c
> +++ b/target/arm/tcg/translate-a64.c
> @@ -68,6 +68,22 @@ static int scale_by_log2_tag_granule(DisasContext *s, int x)
> return x << LOG2_TAG_GRANULE;
> }
>
> +/*
> + * For Advanced SIMD shift by immediate, extract esz from immh.
> + * The result must be validated by the translator: MO_8 <= x <= MO_64.
> + */
> +static int esz_immh(DisasContext *s, int x)
> +{
> + return 32 - clz32(x) - 1;
> +}
> +
> +/* For Advanced SIMD shift by immediate, right shift count. */
> +static int rcount_immhb(DisasContext *s, int x)
> +{
> + int size = esz_immh(s, x >> 3);
> + return (16 << size) - x;
We need to avoid shift-by-negative-value if esz_immh()
returns < 0 here, right? (like commit 76916dfa8 did
for tszimm_esz())
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4
2024-08-11 17:40 ` [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Michael Tokarev
@ 2024-08-12 14:11 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-08-12 14:11 UTC (permalink / raw)
To: Michael Tokarev; +Cc: Richard Henderson, qemu-devel, qemu-arm
On Sun, 11 Aug 2024 at 18:41, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> 17.07.2024 09:08, Richard Henderson wrote:
> > Flush before the queue gets too big.
> > Also, there's a bug fix in patch 14.
>
> Hi!
>
> Has this patchset (together with the bugfix) been forgotten?
> Maybe we should include at least the bug fix for 9.1?
Thanks for the ping -- I had indeed lost track of the
patchset. The series itself is not 9.1 material, but
the bugfix could go in. (I don't rate the bugfix as
very critical -- nobody's noticed it in the at least
five years it's been there.)
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
` (17 preceding siblings ...)
2024-08-11 17:40 ` [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Michael Tokarev
@ 2024-08-12 15:14 ` Peter Maydell
18 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-08-12 15:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Wed, 17 Jul 2024 at 07:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Flush before the queue gets too big.
> Also, there's a bug fix in patch 14.
>
> r~
>
> Richard Henderson (17):
> target/arm: Use tcg_gen_extract2_i64 for EXT
> target/arm: Convert EXT to decodetree
> target/arm: Convert TBL, TBX to decodetree
> target/arm: Convert UZP, TRN, ZIP to decodetree
> target/arm: Simplify do_reduction_op
> target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
> target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
> target/arm: Convert FMOVI (scalar, immediate) to decodetree
> target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to
> decodetree
> target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
> target/arm: Fix whitespace near gen_srshr64_i64
> target/arm: Convert handle_vec_simd_shri to decodetree
> target/arm: Convet handle_vec_simd_shli to decodetree
> target/arm: Clear high SVE elements in handle_vec_simd_wshli
> target/arm: Use {,s}extract in handle_vec_simd_wshli
> target/arm: Convert SSHLL, USHLL to decodetree
> target/arm: Push tcg_rnd into handle_shri_with_rndacc
Other than the need-to-avoid-shift-by-negative nits in
patches 12 and 13, whole series
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree
2024-08-12 13:14 ` Peter Maydell
@ 2024-08-12 22:07 ` Richard Henderson
0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-08-12 22:07 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, qemu-arm
On 8/12/24 23:14, Peter Maydell wrote:
> On Wed, 17 Jul 2024 at 07:11, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/tcg/translate-a64.c | 109 +++++++++++++++------------------
>> target/arm/tcg/a64.decode | 27 +++++++-
>> 2 files changed, 74 insertions(+), 62 deletions(-)
>>
>> diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
>> index d0a3450d75..1e482477c5 100644
>> --- a/target/arm/tcg/translate-a64.c
>> +++ b/target/arm/tcg/translate-a64.c
>> @@ -68,6 +68,22 @@ static int scale_by_log2_tag_granule(DisasContext *s, int x)
>> return x << LOG2_TAG_GRANULE;
>> }
>>
>> +/*
>> + * For Advanced SIMD shift by immediate, extract esz from immh.
>> + * The result must be validated by the translator: MO_8 <= x <= MO_64.
>> + */
>> +static int esz_immh(DisasContext *s, int x)
>> +{
>> + return 32 - clz32(x) - 1;
>> +}
>> +
>> +/* For Advanced SIMD shift by immediate, right shift count. */
>> +static int rcount_immhb(DisasContext *s, int x)
>> +{
>> + int size = esz_immh(s, x >> 3);
>> + return (16 << size) - x;
>
> We need to avoid shift-by-negative-value if esz_immh()
> returns < 0 here, right? (like commit 76916dfa8 did
> for tszimm_esz())
In the interim, I have rewritten this to be more like neon-dp.decode, to decode each
element size separately.
r~
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-08-12 22:08 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17 6:08 [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Richard Henderson
2024-07-17 6:08 ` [PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT Richard Henderson
2024-07-17 9:50 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 02/17] target/arm: Convert EXT to decodetree Richard Henderson
2024-07-17 9:55 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 03/17] target/arm: Convert TBL, TBX " Richard Henderson
2024-07-17 9:56 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 04/17] target/arm: Convert UZP, TRN, ZIP " Richard Henderson
2024-07-17 6:08 ` [PATCH 05/17] target/arm: Simplify do_reduction_op Richard Henderson
2024-07-17 6:08 ` [PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree Richard Henderson
2024-07-17 6:08 ` [PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV " Richard Henderson
2024-07-17 6:08 ` [PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) " Richard Henderson
2024-07-17 10:00 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector " Richard Henderson
2024-07-17 6:08 ` [PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr Richard Henderson
2024-07-17 6:08 ` [PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64 Richard Henderson
2024-07-17 10:00 ` Philippe Mathieu-Daudé
2024-07-17 6:08 ` [PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree Richard Henderson
2024-08-12 13:14 ` Peter Maydell
2024-08-12 22:07 ` Richard Henderson
2024-07-17 6:08 ` [PATCH 13/17] target/arm: Convet handle_vec_simd_shli " Richard Henderson
2024-07-17 6:09 ` [PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli Richard Henderson
2024-07-17 6:09 ` [PATCH 15/17] target/arm: Use {,s}extract " Richard Henderson
2024-07-17 6:09 ` [PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree Richard Henderson
2024-07-17 6:09 ` [PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc Richard Henderson
2024-07-17 10:02 ` Philippe Mathieu-Daudé
2024-08-11 17:40 ` [PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4 Michael Tokarev
2024-08-12 14:11 ` Peter Maydell
2024-08-12 15:14 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).