* [PATCH 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx]
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
@ 2024-06-25 5:07 ` Richard Henderson
2024-06-25 11:42 ` Peter Maydell
2024-06-25 5:07 ` [PATCH 02/13] target/arm: Fix SQDMULH (by element) with Q=0 Richard Henderson
` (11 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:07 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable
The inner loop, bounded by eltspersegment, must not be
larger than the outer loop, bounded by elements.
Cc: qemu-stable@nongnu.org
Fixes: 18fc2405781 ("target/arm: Implement SVE fp complex multiply add (indexed)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2376
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/vec_helper.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index b05922b425..7b34cc98af 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -907,7 +907,7 @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
uint32_t neg_real = flip ^ neg_imag;
intptr_t elements = opr_sz / sizeof(float16);
- intptr_t eltspersegment = 16 / sizeof(float16);
+ intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
intptr_t i, j;
/* Shift boolean to the sign bit so we can xor to negate. */
@@ -969,7 +969,7 @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
uint32_t neg_real = flip ^ neg_imag;
intptr_t elements = opr_sz / sizeof(float32);
- intptr_t eltspersegment = 16 / sizeof(float32);
+ intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
intptr_t i, j;
/* Shift boolean to the sign bit so we can xor to negate. */
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx]
2024-06-25 5:07 ` [PATCH 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx] Richard Henderson
@ 2024-06-25 11:42 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 11:42 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Tue, 25 Jun 2024 at 06:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The inner loop, bounded by eltspersegment, must not be
> larger than the outer loop, bounded by elements.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 18fc2405781 ("target/arm: Implement SVE fp complex multiply add (indexed)")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2376
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 02/13] target/arm: Fix SQDMULH (by element) with Q=0
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
2024-06-25 5:07 ` [PATCH 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx] Richard Henderson
@ 2024-06-25 5:07 ` Richard Henderson
2024-06-25 11:43 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 03/13] target/arm: Fix FJCVTZS vs flush-to-zero Richard Henderson
` (10 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:07 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable
The inner loop, bounded by eltspersegment, must not be
larger than the outer loop, bounded by elements.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/vec_helper.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 7b34cc98af..d477479bb1 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -317,10 +317,12 @@ void HELPER(neon_sqdmulh_idx_h)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
- for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ for (i = 0; i < elements; i += 16 / 2) {
int16_t mm = m[i];
- for (j = 0; j < 16 / 2; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, false, vq);
}
}
@@ -333,10 +335,12 @@ void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
- for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ for (i = 0; i < elements; i += 16 / 2) {
int16_t mm = m[i];
- for (j = 0; j < 16 / 2; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, true, vq);
}
}
@@ -512,10 +516,12 @@ void HELPER(neon_sqdmulh_idx_s)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
- for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ for (i = 0; i < elements; i += 16 / 4) {
int32_t mm = m[i];
- for (j = 0; j < 16 / 4; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, false, vq);
}
}
@@ -528,10 +534,12 @@ void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
- for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ for (i = 0; i < elements; i += 16 / 4) {
int32_t mm = m[i];
- for (j = 0; j < 16 / 4; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, true, vq);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 03/13] target/arm: Fix FJCVTZS vs flush-to-zero
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
2024-06-25 5:07 ` [PATCH 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx] Richard Henderson
2024-06-25 5:07 ` [PATCH 02/13] target/arm: Fix SQDMULH (by element) with Q=0 Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 11:56 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree Richard Henderson
` (9 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable
Input denormals cause the Javascript inexact bit
(output to Z) to be set.
Cc: qemu-stable@nongnu.org
Fixes: 6c1f6f2733a ("target/arm: Implement ARMv8.3-JSConv")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2375
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/vfp_helper.c | 18 +++++++++---------
tests/tcg/aarch64/test-2375.c | 20 ++++++++++++++++++++
tests/tcg/aarch64/Makefile.target | 3 ++-
3 files changed, 31 insertions(+), 10 deletions(-)
create mode 100644 tests/tcg/aarch64/test-2375.c
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index ce26b8a71a..50d7042fa9 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -1091,8 +1091,8 @@ const FloatRoundMode arm_rmode_to_sf_map[] = {
uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
{
float_status *status = vstatus;
- uint32_t inexact, frac;
- uint32_t e_old, e_new;
+ uint32_t frac, e_old, e_new;
+ bool inexact;
e_old = get_float_exception_flags(status);
set_float_exception_flags(0, status);
@@ -1100,13 +1100,13 @@ uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
e_new = get_float_exception_flags(status);
set_float_exception_flags(e_old | e_new, status);
- if (value == float64_chs(float64_zero)) {
- /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
- inexact = 1;
- } else {
- /* Normal inexact or overflow or NaN */
- inexact = e_new & (float_flag_inexact | float_flag_invalid);
- }
+ /* Normal inexact, denormal with flush-to-zero, or overflow or NaN */
+ inexact = e_new & (float_flag_inexact |
+ float_flag_input_denormal |
+ float_flag_invalid);
+
+ /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
+ inexact |= value == float64_chs(float64_zero);
/* Pack the result and the env->ZF representation of Z together. */
return deposit64(frac, 32, 32, inexact);
diff --git a/tests/tcg/aarch64/test-2375.c b/tests/tcg/aarch64/test-2375.c
new file mode 100644
index 0000000000..f83af8b3ea
--- /dev/null
+++ b/tests/tcg/aarch64/test-2375.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2375 */
+
+#include <assert.h>
+
+int main()
+{
+ int r, z;
+
+ asm("msr fpcr, %2\n\t"
+ "fjcvtzs %w0, %d3\n\t"
+ "cset %1, eq"
+ : "=r"(r), "=r"(z)
+ : "r"(0x01000000L), /* FZ = 1 */
+ "w"(0xfcff00L)); /* denormal */
+
+ assert(r == 0);
+ assert(z == 0);
+ return 0;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 70d728ae9a..4ecbca6a41 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -41,8 +41,9 @@ endif
# Pauth Tests
ifneq ($(CROSS_CC_HAS_ARMV8_3),)
-AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5
+AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5 test-2375
pauth-%: CFLAGS += -march=armv8.3-a
+test-2375: CFLAGS += -march=armv8.3-a
run-pauth-1: QEMU_OPTS += -cpu max
run-pauth-2: QEMU_OPTS += -cpu max
# Choose a cpu with FEAT_Pauth but without FEAT_FPAC for pauth-[45].
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 03/13] target/arm: Fix FJCVTZS vs flush-to-zero
2024-06-25 5:08 ` [PATCH 03/13] target/arm: Fix FJCVTZS vs flush-to-zero Richard Henderson
@ 2024-06-25 11:56 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 11:56 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Tue, 25 Jun 2024 at 06:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Input denormals cause the Javascript inexact bit
> (output to Z) to be set.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 6c1f6f2733a ("target/arm: Implement ARMv8.3-JSConv")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2375
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> --- /dev/null
> +++ b/tests/tcg/aarch64/test-2375.c
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* See https://gitlab.com/qemu-project/qemu/-/issues/2375 */
Not a project requirement, I guess, but on the Linaro end I
think we want a
/* Copyright (c) 2024 Linaro Ltd */
too.
> +
> +#include <assert.h>
> +
> +int main()
Missing "void".
> +{
> + int r, z;
> +
> + asm("msr fpcr, %2\n\t"
> + "fjcvtzs %w0, %d3\n\t"
> + "cset %1, eq"
> + : "=r"(r), "=r"(z)
> + : "r"(0x01000000L), /* FZ = 1 */
> + "w"(0xfcff00L)); /* denormal */
> +
> + assert(r == 0);
> + assert(z == 0);
> + return 0;
> +}
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
(If these are the only issues in the patchset I'll fix them
up when I apply it.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (2 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 03/13] target/arm: Fix FJCVTZS vs flush-to-zero Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:38 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 05/13] target/arm: Convert SDOT, UDOT " Richard Henderson
` (8 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 10 ++
target/arm/tcg/a64.decode | 16 +++
target/arm/tcg/translate-a64.c | 206 +++++++++++++--------------------
target/arm/tcg/vec_helper.c | 72 ++++++++++++
4 files changed, 180 insertions(+), 124 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index eca2043fc2..970d059dec 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -979,6 +979,16 @@ DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_h, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlah_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlah_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(neon_sqrdmlsh_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlsh_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2b7a3254a0..613cc9365c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -781,6 +781,8 @@ CMEQ_s 0111 1110 111 ..... 10001 1 ..... ..... @rrr_d
SQDMULH_s 0101 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
SQRDMULH_s 0111 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
+SQRDMLAH_s 0111 1110 ..0 ..... 10000 1 ..... ..... @rrr_e
+SQRDMLSH_s 0111 1110 ..0 ..... 10001 1 ..... ..... @rrr_e
### Advanced SIMD scalar pairwise
@@ -941,6 +943,8 @@ MLS_v 0.10 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
SQDMULH_v 0.00 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
SQRDMULH_v 0.10 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
+SQRDMLAH_v 0.10 1110 ..0 ..... 10000 1 ..... ..... @qrrr_e
+SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
@@ -966,6 +970,12 @@ SQDMULH_si 0101 1111 10 .. .... 1100 . 0 ..... ..... @rrx_s
SQRDMULH_si 0101 1111 01 .. .... 1101 . 0 ..... ..... @rrx_h
SQRDMULH_si 0101 1111 10 . ..... 1101 . 0 ..... ..... @rrx_s
+SQRDMLAH_si 0111 1111 01 .. .... 1101 . 0 ..... ..... @rrx_h
+SQRDMLAH_si 0111 1111 10 .. .... 1101 . 0 ..... ..... @rrx_s
+
+SQRDMLSH_si 0111 1111 01 .. .... 1111 . 0 ..... ..... @rrx_h
+SQRDMLSH_si 0111 1111 10 .. .... 1111 . 0 ..... ..... @rrx_s
+
### Advanced SIMD vector x indexed element
FMUL_vi 0.00 1111 00 .. .... 1001 . 0 ..... ..... @qrrx_h
@@ -1004,6 +1014,12 @@ SQDMULH_vi 0.00 1111 10 . ..... 1100 . 0 ..... ..... @qrrx_s
SQRDMULH_vi 0.00 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
SQRDMULH_vi 0.00 1111 10 . ..... 1101 . 0 ..... ..... @qrrx_s
+SQRDMLAH_vi 0.10 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
+SQRDMLAH_vi 0.10 1111 10 .. .... 1101 . 0 ..... ..... @qrrx_s
+
+SQRDMLSH_vi 0.10 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_h
+SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 93543da39c..32c24c7422 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5235,6 +5235,43 @@ static const ENVScalar2 f_scalar_sqrdmulh = {
};
TRANS(SQRDMULH_s, do_env_scalar2_hs, a, &f_scalar_sqrdmulh)
+typedef struct ENVScalar3 {
+ NeonGenThreeOpEnvFn *gen_hs[2];
+} ENVScalar3;
+
+static bool do_env_scalar3_hs(DisasContext *s, arg_rrr_e *a,
+ const ENVScalar3 *f)
+{
+ TCGv_i32 t0, t1, t2;
+
+ if (a->esz != MO_16 && a->esz != MO_32) {
+ return false;
+ }
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ t0 = tcg_temp_new_i32();
+ t1 = tcg_temp_new_i32();
+ t2 = tcg_temp_new_i32();
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, 0, a->esz);
+ read_vec_element_i32(s, t2, a->rd, 0, a->esz);
+ f->gen_hs[a->esz - 1](t0, tcg_env, t0, t1, t2);
+ write_fp_sreg(s, a->rd, t0);
+ return true;
+}
+
+static const ENVScalar3 f_scalar_sqrdmlah = {
+ { gen_helper_neon_qrdmlah_s16, gen_helper_neon_qrdmlah_s32 }
+};
+TRANS_FEAT(SQRDMLAH_s, aa64_rdm, do_env_scalar3_hs, a, &f_scalar_sqrdmlah)
+
+static const ENVScalar3 f_scalar_sqrdmlsh = {
+ { gen_helper_neon_qrdmlsh_s16, gen_helper_neon_qrdmlsh_s32 }
+};
+TRANS_FEAT(SQRDMLSH_s, aa64_rdm, do_env_scalar3_hs, a, &f_scalar_sqrdmlsh)
+
static bool do_cmop_d(DisasContext *s, arg_rrr_e *a, TCGCond cond)
{
if (fp_access_check(s)) {
@@ -5552,6 +5589,8 @@ TRANS(CMTST_v, do_gvec_fn3, a, gen_gvec_cmtst)
TRANS(SQDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqdmulh_qc)
TRANS(SQRDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmulh_qc)
+TRANS_FEAT(SQRDMLAH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlah_qc)
+TRANS_FEAT(SQRDMLSH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlsh_qc)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5681,6 +5720,29 @@ static bool do_env_scalar2_idx_hs(DisasContext *s, arg_rrx_e *a,
TRANS(SQDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqdmulh)
TRANS(SQRDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqrdmulh)
+static bool do_env_scalar3_idx_hs(DisasContext *s, arg_rrx_e *a,
+ const ENVScalar3 *f)
+{
+ if (a->esz < MO_16 || a->esz > MO_32) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+ TCGv_i32 t2 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, a->idx, a->esz);
+ read_vec_element_i32(s, t2, a->rd, 0, a->esz);
+ f->gen_hs[a->esz - 1](t0, tcg_env, t0, t1, t2);
+ write_fp_sreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+TRANS_FEAT(SQRDMLAH_si, aa64_rdm, do_env_scalar3_idx_hs, a, &f_scalar_sqrdmlah)
+TRANS_FEAT(SQRDMLSH_si, aa64_rdm, do_env_scalar3_idx_hs, a, &f_scalar_sqrdmlsh)
+
static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5838,6 +5900,20 @@ static gen_helper_gvec_4 * const f_vector_idx_sqrdmulh[2] = {
};
TRANS(SQRDMULH_vi, do_int3_qc_vector_idx, a, f_vector_idx_sqrdmulh)
+static gen_helper_gvec_4 * const f_vector_idx_sqrdmlah[2] = {
+ gen_helper_neon_sqrdmlah_idx_h,
+ gen_helper_neon_sqrdmlah_idx_s,
+};
+TRANS_FEAT(SQRDMLAH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
+ f_vector_idx_sqrdmlah)
+
+static gen_helper_gvec_4 * const f_vector_idx_sqrdmlsh[2] = {
+ gen_helper_neon_sqrdmlsh_idx_h,
+ gen_helper_neon_sqrdmlsh_idx_s,
+};
+TRANS_FEAT(SQRDMLSH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
+ f_vector_idx_sqrdmlsh)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -9536,84 +9612,6 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD scalar three same extra
- * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
- * +-----+---+-----------+------+---+------+---+--------+---+----+----+
- * | 0 1 | U | 1 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
- * +-----+---+-----------+------+---+------+---+--------+---+----+----+
- */
-static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
- uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int opcode = extract32(insn, 11, 4);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- bool u = extract32(insn, 29, 1);
- TCGv_i32 ele1, ele2, ele3;
- TCGv_i64 res;
- bool feature;
-
- switch (u * 16 + opcode) {
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- if (size != 1 && size != 2) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_rdm, s);
- break;
- default:
- unallocated_encoding(s);
- return;
- }
- if (!feature) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
-
- /* Do a single operation on the lowest element in the vector.
- * We use the standard Neon helpers and rely on 0 OP 0 == 0
- * with no side effects for all these operations.
- * OPTME: special-purpose helpers would avoid doing some
- * unnecessary work in the helper for the 16 bit cases.
- */
- ele1 = tcg_temp_new_i32();
- ele2 = tcg_temp_new_i32();
- ele3 = tcg_temp_new_i32();
-
- read_vec_element_i32(s, ele1, rn, 0, size);
- read_vec_element_i32(s, ele2, rm, 0, size);
- read_vec_element_i32(s, ele3, rd, 0, size);
-
- switch (opcode) {
- case 0x0: /* SQRDMLAH */
- if (size == 1) {
- gen_helper_neon_qrdmlah_s16(ele3, tcg_env, ele1, ele2, ele3);
- } else {
- gen_helper_neon_qrdmlah_s32(ele3, tcg_env, ele1, ele2, ele3);
- }
- break;
- case 0x1: /* SQRDMLSH */
- if (size == 1) {
- gen_helper_neon_qrdmlsh_s16(ele3, tcg_env, ele1, ele2, ele3);
- } else {
- gen_helper_neon_qrdmlsh_s32(ele3, tcg_env, ele1, ele2, ele3);
- }
- break;
- default:
- g_assert_not_reached();
- }
-
- res = tcg_temp_new_i64();
- tcg_gen_extu_i32_i64(res, ele3);
- write_fp_dreg(s, rd, res);
-}
-
static void handle_2misc_64(DisasContext *s, int opcode, bool u,
TCGv_i64 tcg_rd, TCGv_i64 tcg_rn,
TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus)
@@ -10892,14 +10890,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- if (size != 1 && size != 2) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_rdm, s);
- break;
case 0x02: /* SDOT (vector) */
case 0x12: /* UDOT (vector) */
if (size != MO_32) {
@@ -10957,6 +10947,8 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x10: /* SQRDMLAH (vector) */
+ case 0x11: /* SQRDMLSH (vector) */
unallocated_encoding(s);
return;
}
@@ -10969,14 +10961,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0: /* SQRDMLAH (vector) */
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmlah_qc, size);
- return;
-
- case 0x1: /* SQRDMLSH (vector) */
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmlsh_qc, size);
- return;
-
case 0x2: /* SDOT / UDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0,
u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b);
@@ -12059,13 +12043,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x1d: /* SQRDMLAH */
- case 0x1f: /* SQRDMLSH */
- if (!dc_isar_feature(aa64_rdm, s)) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x0e: /* SDOT */
case 0x1e: /* UDOT */
if (is_scalar || size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
@@ -12127,6 +12104,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
+ case 0x1d: /* SQRDMLAH */
+ case 0x1f: /* SQRDMLSH */
unallocated_encoding(s);
return;
}
@@ -12320,33 +12299,13 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
tcg_op, tcg_idx);
}
break;
- case 0x1d: /* SQRDMLAH */
- read_vec_element_i32(s, tcg_res, rd, pass,
- is_scalar ? size : MO_32);
- if (size == 1) {
- gen_helper_neon_qrdmlah_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- } else {
- gen_helper_neon_qrdmlah_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- }
- break;
- case 0x1f: /* SQRDMLSH */
- read_vec_element_i32(s, tcg_res, rd, pass,
- is_scalar ? size : MO_32);
- if (size == 1) {
- gen_helper_neon_qrdmlsh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- } else {
- gen_helper_neon_qrdmlsh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- }
- break;
default:
case 0x01: /* FMLA */
case 0x05: /* FMLS */
case 0x09: /* FMUL */
case 0x19: /* FMULX */
+ case 0x1d: /* SQRDMLAH */
+ case 0x1f: /* SQRDMLSH */
g_assert_not_reached();
}
@@ -12538,7 +12497,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
{ 0x0e000000, 0xbf208c00, disas_simd_tb },
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x2e000000, 0xbf208400, disas_simd_ext },
- { 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
{ 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000000, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index d477479bb1..98604d170f 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -347,6 +347,42 @@ void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqrdmlah_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
+
+ for (i = 0; i < elements; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, d[i + j], false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmlsh_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
+
+ for (i = 0; i < elements; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, d[i + j], true, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_h)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
@@ -546,6 +582,42 @@ void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqrdmlah_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
+
+ for (i = 0; i < elements; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, d[i + j], false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmlsh_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
+
+ for (i = 0; i < elements; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, d[i + j], true, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_s)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 05/13] target/arm: Convert SDOT, UDOT to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (3 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:38 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 06/13] target/arm: Convert SUDOT, USDOT " Richard Henderson
` (7 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 7 +++++
target/arm/tcg/translate-a64.c | 54 ++++++++++++++++++----------------
2 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 613cc9365c..7411d4ba97 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -61,6 +61,7 @@
@qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0
@qrrr_h . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=1
+@qrrr_s . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=2
@qrrr_sd . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=%esz_sd
@qrrr_e . q:1 ...... esz:2 . rm:5 ...... rn:5 rd:5 &qrrr_e
@qr2r_e . q:1 ...... esz:2 . ..... ...... rm:5 rd:5 &qrrr_e rn=%rd
@@ -946,6 +947,9 @@ SQRDMULH_v 0.10 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
SQRDMLAH_v 0.10 1110 ..0 ..... 10000 1 ..... ..... @qrrr_e
SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
+SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
@@ -1020,6 +1024,9 @@ SQRDMLAH_vi 0.10 1111 10 .. .... 1101 . 0 ..... ..... @qrrx_s
SQRDMLSH_vi 0.10 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_h
SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 32c24c7422..f2e7d8d75c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5592,6 +5592,18 @@ TRANS(SQRDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmulh_qc)
TRANS_FEAT(SQRDMLAH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlah_qc)
TRANS_FEAT(SQRDMLSH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlsh_qc)
+static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
+ gen_helper_gvec_4 *fn)
+{
+ if (fp_access_check(s)) {
+ gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd, 0, fn);
+ }
+ return true;
+}
+
+TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
+TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5914,6 +5926,18 @@ static gen_helper_gvec_4 * const f_vector_idx_sqrdmlsh[2] = {
TRANS_FEAT(SQRDMLSH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
f_vector_idx_sqrdmlsh)
+static bool do_dot_vector_idx(DisasContext *s, arg_qrrx_e *a,
+ gen_helper_gvec_4 *fn)
+{
+ if (fp_access_check(s)) {
+ gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd, a->idx, fn);
+ }
+ return true;
+}
+
+TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b)
+TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10890,14 +10914,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x02: /* SDOT (vector) */
- case 0x12: /* UDOT (vector) */
- if (size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_dp, s);
- break;
case 0x03: /* USDOT */
if (size != MO_32) {
unallocated_encoding(s);
@@ -10947,8 +10963,10 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x02: /* SDOT (vector) */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
+ case 0x12: /* UDOT (vector) */
unallocated_encoding(s);
return;
}
@@ -10961,11 +10979,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x2: /* SDOT / UDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0,
- u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b);
- return;
-
case 0x3: /* USDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
return;
@@ -12043,13 +12056,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x0e: /* SDOT */
- case 0x1e: /* UDOT */
- if (is_scalar || size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x0f:
switch (size) {
case 0: /* SUDOT */
@@ -12099,12 +12105,14 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x09: /* FMUL */
case 0x0c: /* SQDMULH */
case 0x0d: /* SQRDMULH */
+ case 0x0e: /* SDOT */
case 0x10: /* MLA */
case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
case 0x1d: /* SQRDMLAH */
+ case 0x1e: /* UDOT */
case 0x1f: /* SQRDMLSH */
unallocated_encoding(s);
return;
@@ -12180,12 +12188,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
switch (16 * u + opcode) {
- case 0x0e: /* SDOT */
- case 0x1e: /* UDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- u ? gen_helper_gvec_udot_idx_b
- : gen_helper_gvec_sdot_idx_b);
- return;
case 0x0f:
switch (extract32(insn, 22, 2)) {
case 0: /* SUDOT */
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 06/13] target/arm: Convert SUDOT, USDOT to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (4 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 05/13] target/arm: Convert SDOT, UDOT " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:37 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 07/13] target/arm: Convert BFDOT " Richard Henderson
` (6 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 3 +++
target/arm/tcg/translate-a64.c | 35 ++++++++--------------------------
2 files changed, 11 insertions(+), 27 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7411d4ba97..8a0251f83c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -949,6 +949,7 @@ SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
### Advanced SIMD scalar x indexed element
@@ -1026,6 +1027,8 @@ SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
+USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index f2e7d8d75c..9a658ca876 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5603,6 +5603,7 @@ static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
+TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5937,6 +5938,10 @@ static bool do_dot_vector_idx(DisasContext *s, arg_qrrx_e *a,
TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b)
TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b)
+TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
+ gen_helper_gvec_sudot_idx_b)
+TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
+ gen_helper_gvec_usdot_idx_b)
/*
* Advanced SIMD scalar pairwise
@@ -10914,13 +10919,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x03: /* USDOT */
- if (size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_i8mm, s);
- break;
case 0x04: /* SMMLA */
case 0x14: /* UMMLA */
case 0x05: /* USMMLA */
@@ -10964,6 +10962,7 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
break;
default:
case 0x02: /* SDOT (vector) */
+ case 0x03: /* USDOT */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
@@ -10979,10 +10978,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x3: /* USDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
- return;
-
case 0x04: /* SMMLA, UMMLA */
gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0,
u ? gen_helper_gvec_ummla_b
@@ -12058,14 +12053,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
case 0x0f:
switch (size) {
- case 0: /* SUDOT */
- case 2: /* USDOT */
- if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
- unallocated_encoding(s);
- return;
- }
- size = MO_32;
- break;
case 1: /* BFDOT */
if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
unallocated_encoding(s);
@@ -12082,6 +12069,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
size = MO_16;
break;
default:
+ case 0: /* SUDOT */
+ case 2: /* USDOT */
unallocated_encoding(s);
return;
}
@@ -12190,18 +12179,10 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
switch (16 * u + opcode) {
case 0x0f:
switch (extract32(insn, 22, 2)) {
- case 0: /* SUDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_sudot_idx_b);
- return;
case 1: /* BFDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
gen_helper_gvec_bfdot_idx);
return;
- case 2: /* USDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_usdot_idx_b);
- return;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
gen_helper_gvec_bfmlal_idx);
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 07/13] target/arm: Convert BFDOT to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (5 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 06/13] target/arm: Convert SUDOT, USDOT " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:42 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 08/13] target/arm: Convert BFMLALB, BFMLALT " Richard Henderson
` (5 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 ++
target/arm/tcg/translate-a64.c | 20 +++++---------------
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 8a0251f83c..6819fd2587 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -950,6 +950,7 @@ SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
+BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
### Advanced SIMD scalar x indexed element
@@ -1029,6 +1030,7 @@ SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9a658ca876..0f44cd5aee 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5604,6 +5604,7 @@ static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
+TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5942,6 +5943,8 @@ TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
gen_helper_gvec_sudot_idx_b)
TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
gen_helper_gvec_usdot_idx_b)
+TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx, a,
+ gen_helper_gvec_bfdot_idx)
/*
* Advanced SIMD scalar pairwise
@@ -10951,11 +10954,11 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
break;
case 0x1f:
switch (size) {
- case 1: /* BFDOT */
case 3: /* BFMLAL{B,T} */
feature = dc_isar_feature(aa64_bf16, s);
break;
default:
+ case 1: /* BFDOT */
unallocated_encoding(s);
return;
}
@@ -11036,9 +11039,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
return;
case 0xf:
switch (size) {
- case 1: /* BFDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
- break;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
gen_helper_gvec_bfmlal);
@@ -12053,13 +12053,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
case 0x0f:
switch (size) {
- case 1: /* BFDOT */
- if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
- unallocated_encoding(s);
- return;
- }
- size = MO_32;
- break;
case 3: /* BFMLAL{B,T} */
if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
unallocated_encoding(s);
@@ -12070,6 +12063,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
default:
case 0: /* SUDOT */
+ case 1: /* BFDOT */
case 2: /* USDOT */
unallocated_encoding(s);
return;
@@ -12179,10 +12173,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
switch (16 * u + opcode) {
case 0x0f:
switch (extract32(insn, 22, 2)) {
- case 1: /* BFDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_bfdot_idx);
- return;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
gen_helper_gvec_bfmlal_idx);
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 08/13] target/arm: Convert BFMLALB, BFMLALT to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (6 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 07/13] target/arm: Convert BFDOT " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:38 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA " Richard Henderson
` (4 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 +
target/arm/tcg/translate-a64.c | 77 +++++++++++++---------------------
2 files changed, 31 insertions(+), 48 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6819fd2587..15344a73de 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -951,6 +951,7 @@ SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
+BFMLAL_v 0.10 1110 110 ..... 11111 1 ..... ..... @qrrr_h
### Advanced SIMD scalar x indexed element
@@ -1031,6 +1032,7 @@ UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
+BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0f44cd5aee..95be862dde 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5606,6 +5606,19 @@ TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
+static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
+{
+ if (!dc_isar_feature(aa64_bf16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ /* Q bit selects BFMLALB vs BFMLALT. */
+ gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+ gen_helper_gvec_bfmlal);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5946,6 +5959,20 @@ TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx, a,
gen_helper_gvec_bfdot_idx)
+static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
+{
+ if (!dc_isar_feature(aa64_bf16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ /* Q bit selects BFMLALB vs BFMLALT. */
+ gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+ (a->idx << 1) | a->q,
+ gen_helper_gvec_bfmlal_idx);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10952,23 +10979,13 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_bf16, s);
break;
- case 0x1f:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- feature = dc_isar_feature(aa64_bf16, s);
- break;
- default:
- case 1: /* BFDOT */
- unallocated_encoding(s);
- return;
- }
- break;
default:
case 0x02: /* SDOT (vector) */
case 0x03: /* USDOT */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
+ case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
}
@@ -11037,17 +11054,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0xd: /* BFMMLA */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
return;
- case 0xf:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
- gen_helper_gvec_bfmlal);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
default:
g_assert_not_reached();
}
@@ -12051,24 +12057,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x0f:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
- unallocated_encoding(s);
- return;
- }
- /* can't set is_fp without other incorrect size checks */
- size = MO_16;
- break;
- default:
- case 0: /* SUDOT */
- case 1: /* BFDOT */
- case 2: /* USDOT */
- unallocated_encoding(s);
- return;
- }
- break;
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
@@ -12089,6 +12077,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0c: /* SQDMULH */
case 0x0d: /* SQRDMULH */
case 0x0e: /* SDOT */
+ case 0x0f: /* SUDOT / BFDOT / USDOT / BFMLAL */
case 0x10: /* MLA */
case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
@@ -12171,14 +12160,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
switch (16 * u + opcode) {
- case 0x0f:
- switch (extract32(insn, 22, 2)) {
- case 3: /* BFMLAL{B,T} */
- gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
- gen_helper_gvec_bfmlal_idx);
- return;
- }
- g_assert_not_reached();
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (7 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 08/13] target/arm: Convert BFMLALB, BFMLALT " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:42 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 10/13] target/arm: Add data argument to do_fp3_vector Richard Henderson
` (3 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++++
target/arm/tcg/translate-a64.c | 36 ++++++++--------------------------
2 files changed, 12 insertions(+), 28 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 15344a73de..b2c7e36969 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -952,6 +952,10 @@ UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
BFMLAL_v 0.10 1110 110 ..... 11111 1 ..... ..... @qrrr_h
+BFMMLA 0110 1110 010 ..... 11101 1 ..... ..... @rrr_q1e0
+SMMLA 0100 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
+UMMLA 0110 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
+USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 95be862dde..2697c4b305 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5605,6 +5605,10 @@ TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
+TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfmmla)
+TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b)
+TRANS_FEAT(UMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_ummla_b)
+TRANS_FEAT(USMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usmmla_b)
static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
{
@@ -10949,15 +10953,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x04: /* SMMLA */
- case 0x14: /* UMMLA */
- case 0x05: /* USMMLA */
- if (!is_q || size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_i8mm, s);
- break;
case 0x18: /* FCMLA, #0 */
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
@@ -10972,19 +10967,16 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_fcma, s);
break;
- case 0x1d: /* BFMMLA */
- if (size != MO_16 || !is_q) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_bf16, s);
- break;
default:
case 0x02: /* SDOT (vector) */
case 0x03: /* USDOT */
+ case 0x04: /* SMMLA */
+ case 0x05: /* USMMLA */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
+ case 0x14: /* UMMLA */
+ case 0x1d: /* BFMMLA */
case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
@@ -10998,15 +10990,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x04: /* SMMLA, UMMLA */
- gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0,
- u ? gen_helper_gvec_ummla_b
- : gen_helper_gvec_smmla_b);
- return;
- case 0x05: /* USMMLA */
- gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0, gen_helper_gvec_usmmla_b);
- return;
-
case 0x8: /* FCMLA, #0 */
case 0x9: /* FCMLA, #90 */
case 0xa: /* FCMLA, #180 */
@@ -11051,9 +11034,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
return;
- case 0xd: /* BFMMLA */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
- return;
default:
g_assert_not_reached();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA to decodetree
2024-06-25 5:08 ` [PATCH 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA " Richard Henderson
@ 2024-06-25 12:42 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 12:42 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 25 Jun 2024 at 06:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/a64.decode | 4 ++++
> target/arm/tcg/translate-a64.c | 36 ++++++++--------------------------
> 2 files changed, 12 insertions(+), 28 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 10/13] target/arm: Add data argument to do_fp3_vector
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (8 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:43 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 11/13] target/arm: Convert FCADD to decodetree Richard Henderson
` (2 subsequent siblings)
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 52 +++++++++++++++++-----------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2697c4b305..57cdde008e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5290,7 +5290,7 @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
gen_helper_gvec_3_ptr * const fns[3])
{
MemOp esz = a->esz;
@@ -5313,7 +5313,7 @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
}
if (fp_access_check(s)) {
gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
- esz == MO_16, 0, fns[esz - 1]);
+ esz == MO_16, data, fns[esz - 1]);
}
return true;
}
@@ -5323,168 +5323,168 @@ static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
gen_helper_gvec_fadd_s,
gen_helper_gvec_fadd_d,
};
-TRANS(FADD_v, do_fp3_vector, a, f_vector_fadd)
+TRANS(FADD_v, do_fp3_vector, a, 0, f_vector_fadd)
static gen_helper_gvec_3_ptr * const f_vector_fsub[3] = {
gen_helper_gvec_fsub_h,
gen_helper_gvec_fsub_s,
gen_helper_gvec_fsub_d,
};
-TRANS(FSUB_v, do_fp3_vector, a, f_vector_fsub)
+TRANS(FSUB_v, do_fp3_vector, a, 0, f_vector_fsub)
static gen_helper_gvec_3_ptr * const f_vector_fdiv[3] = {
gen_helper_gvec_fdiv_h,
gen_helper_gvec_fdiv_s,
gen_helper_gvec_fdiv_d,
};
-TRANS(FDIV_v, do_fp3_vector, a, f_vector_fdiv)
+TRANS(FDIV_v, do_fp3_vector, a, 0, f_vector_fdiv)
static gen_helper_gvec_3_ptr * const f_vector_fmul[3] = {
gen_helper_gvec_fmul_h,
gen_helper_gvec_fmul_s,
gen_helper_gvec_fmul_d,
};
-TRANS(FMUL_v, do_fp3_vector, a, f_vector_fmul)
+TRANS(FMUL_v, do_fp3_vector, a, 0, f_vector_fmul)
static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
gen_helper_gvec_fmax_h,
gen_helper_gvec_fmax_s,
gen_helper_gvec_fmax_d,
};
-TRANS(FMAX_v, do_fp3_vector, a, f_vector_fmax)
+TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
gen_helper_gvec_fmin_h,
gen_helper_gvec_fmin_s,
gen_helper_gvec_fmin_d,
};
-TRANS(FMIN_v, do_fp3_vector, a, f_vector_fmin)
+TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
gen_helper_gvec_fmaxnum_h,
gen_helper_gvec_fmaxnum_s,
gen_helper_gvec_fmaxnum_d,
};
-TRANS(FMAXNM_v, do_fp3_vector, a, f_vector_fmaxnm)
+TRANS(FMAXNM_v, do_fp3_vector, a, 0, f_vector_fmaxnm)
static gen_helper_gvec_3_ptr * const f_vector_fminnm[3] = {
gen_helper_gvec_fminnum_h,
gen_helper_gvec_fminnum_s,
gen_helper_gvec_fminnum_d,
};
-TRANS(FMINNM_v, do_fp3_vector, a, f_vector_fminnm)
+TRANS(FMINNM_v, do_fp3_vector, a, 0, f_vector_fminnm)
static gen_helper_gvec_3_ptr * const f_vector_fmulx[3] = {
gen_helper_gvec_fmulx_h,
gen_helper_gvec_fmulx_s,
gen_helper_gvec_fmulx_d,
};
-TRANS(FMULX_v, do_fp3_vector, a, f_vector_fmulx)
+TRANS(FMULX_v, do_fp3_vector, a, 0, f_vector_fmulx)
static gen_helper_gvec_3_ptr * const f_vector_fmla[3] = {
gen_helper_gvec_vfma_h,
gen_helper_gvec_vfma_s,
gen_helper_gvec_vfma_d,
};
-TRANS(FMLA_v, do_fp3_vector, a, f_vector_fmla)
+TRANS(FMLA_v, do_fp3_vector, a, 0, f_vector_fmla)
static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
gen_helper_gvec_vfms_h,
gen_helper_gvec_vfms_s,
gen_helper_gvec_vfms_d,
};
-TRANS(FMLS_v, do_fp3_vector, a, f_vector_fmls)
+TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
gen_helper_gvec_fceq_h,
gen_helper_gvec_fceq_s,
gen_helper_gvec_fceq_d,
};
-TRANS(FCMEQ_v, do_fp3_vector, a, f_vector_fcmeq)
+TRANS(FCMEQ_v, do_fp3_vector, a, 0, f_vector_fcmeq)
static gen_helper_gvec_3_ptr * const f_vector_fcmge[3] = {
gen_helper_gvec_fcge_h,
gen_helper_gvec_fcge_s,
gen_helper_gvec_fcge_d,
};
-TRANS(FCMGE_v, do_fp3_vector, a, f_vector_fcmge)
+TRANS(FCMGE_v, do_fp3_vector, a, 0, f_vector_fcmge)
static gen_helper_gvec_3_ptr * const f_vector_fcmgt[3] = {
gen_helper_gvec_fcgt_h,
gen_helper_gvec_fcgt_s,
gen_helper_gvec_fcgt_d,
};
-TRANS(FCMGT_v, do_fp3_vector, a, f_vector_fcmgt)
+TRANS(FCMGT_v, do_fp3_vector, a, 0, f_vector_fcmgt)
static gen_helper_gvec_3_ptr * const f_vector_facge[3] = {
gen_helper_gvec_facge_h,
gen_helper_gvec_facge_s,
gen_helper_gvec_facge_d,
};
-TRANS(FACGE_v, do_fp3_vector, a, f_vector_facge)
+TRANS(FACGE_v, do_fp3_vector, a, 0, f_vector_facge)
static gen_helper_gvec_3_ptr * const f_vector_facgt[3] = {
gen_helper_gvec_facgt_h,
gen_helper_gvec_facgt_s,
gen_helper_gvec_facgt_d,
};
-TRANS(FACGT_v, do_fp3_vector, a, f_vector_facgt)
+TRANS(FACGT_v, do_fp3_vector, a, 0, f_vector_facgt)
static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
gen_helper_gvec_fabd_h,
gen_helper_gvec_fabd_s,
gen_helper_gvec_fabd_d,
};
-TRANS(FABD_v, do_fp3_vector, a, f_vector_fabd)
+TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
gen_helper_gvec_recps_h,
gen_helper_gvec_recps_s,
gen_helper_gvec_recps_d,
};
-TRANS(FRECPS_v, do_fp3_vector, a, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
gen_helper_gvec_rsqrts_h,
gen_helper_gvec_rsqrts_s,
gen_helper_gvec_rsqrts_d,
};
-TRANS(FRSQRTS_v, do_fp3_vector, a, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
gen_helper_gvec_faddp_h,
gen_helper_gvec_faddp_s,
gen_helper_gvec_faddp_d,
};
-TRANS(FADDP_v, do_fp3_vector, a, f_vector_faddp)
+TRANS(FADDP_v, do_fp3_vector, a, 0, f_vector_faddp)
static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
gen_helper_gvec_fmaxp_h,
gen_helper_gvec_fmaxp_s,
gen_helper_gvec_fmaxp_d,
};
-TRANS(FMAXP_v, do_fp3_vector, a, f_vector_fmaxp)
+TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
gen_helper_gvec_fminp_h,
gen_helper_gvec_fminp_s,
gen_helper_gvec_fminp_d,
};
-TRANS(FMINP_v, do_fp3_vector, a, f_vector_fminp)
+TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
gen_helper_gvec_fmaxnump_h,
gen_helper_gvec_fmaxnump_s,
gen_helper_gvec_fmaxnump_d,
};
-TRANS(FMAXNMP_v, do_fp3_vector, a, f_vector_fmaxnmp)
+TRANS(FMAXNMP_v, do_fp3_vector, a, 0, f_vector_fmaxnmp)
static gen_helper_gvec_3_ptr * const f_vector_fminnmp[3] = {
gen_helper_gvec_fminnump_h,
gen_helper_gvec_fminnump_s,
gen_helper_gvec_fminnump_d,
};
-TRANS(FMINNMP_v, do_fp3_vector, a, f_vector_fminnmp)
+TRANS(FMINNMP_v, do_fp3_vector, a, 0, f_vector_fminnmp)
static bool do_fmlal(DisasContext *s, arg_qrrr_e *a, bool is_s, bool is_2)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 11/13] target/arm: Convert FCADD to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (9 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 10/13] target/arm: Add data argument to do_fp3_vector Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:41 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 12/13] target/arm: Convert FCMLA " Richard Henderson
2024-06-25 5:08 ` [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed Richard Henderson
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 3 +++
target/arm/tcg/translate-a64.c | 33 ++++++++++-----------------------
2 files changed, 13 insertions(+), 23 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index b2c7e36969..f330919851 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -957,6 +957,9 @@ SMMLA 0100 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
UMMLA 0110 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
+FCADD_90 0.10 1110 ..0 ..... 11100 1 ..... ..... @qrrr_e
+FCADD_270 0.10 1110 ..0 ..... 11110 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 57cdde008e..a1b338263f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5623,6 +5623,14 @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
return true;
}
+static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
+ gen_helper_gvec_fcaddh,
+ gen_helper_gvec_fcadds,
+ gen_helper_gvec_fcaddd,
+};
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -10957,8 +10965,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
case 0x1b: /* FCMLA, #270 */
- case 0x1c: /* FCADD, #90 */
- case 0x1e: /* FCADD, #270 */
if (size == 0
|| (size == 1 && !dc_isar_feature(aa64_fp16, s))
|| (size == 3 && !is_q)) {
@@ -10976,7 +10982,9 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
case 0x14: /* UMMLA */
+ case 0x1c: /* FCADD, #90 */
case 0x1d: /* BFMMLA */
+ case 0x1e: /* FCADD, #270 */
case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
@@ -11013,27 +11021,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
return;
- case 0xc: /* FCADD, #90 */
- case 0xe: /* FCADD, #270 */
- rot = extract32(opcode, 1, 1);
- switch (size) {
- case 1:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcaddh);
- break;
- case 2:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcadds);
- break;
- case 3:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcaddd);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
default:
g_assert_not_reached();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 12/13] target/arm: Convert FCMLA to decodetree
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (10 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 11/13] target/arm: Convert FCADD to decodetree Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:35 ` Peter Maydell
2024-06-25 5:08 ` [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed Richard Henderson
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 5 +
target/arm/tcg/translate-a64.c | 241 ++++++++++-----------------------
2 files changed, 76 insertions(+), 170 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f330919851..4b2a6ba302 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -960,6 +960,8 @@ USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
FCADD_90 0.10 1110 ..0 ..... 11100 1 ..... ..... @qrrr_e
FCADD_270 0.10 1110 ..0 ..... 11110 1 ..... ..... @qrrr_e
+FCMLA_v 0 q:1 10 1110 esz:2 0 rm:5 110 rot:2 1 rn:5 rd:5
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
@@ -1041,6 +1043,9 @@ USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
+FCMLA_vi 0 q:1 10 1111 10 . rm:5 0 rot:2 1 . 0 rn:5 rd:5 esz=1 idx=%hl
+FCMLA_vi 0 q:1 10 1111 01 0 rm:5 0 rot:2 1 idx:1 0 rn:5 rd:5 esz=2
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a1b338263f..0a54a9ef8f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5631,6 +5631,39 @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+{
+ gen_helper_gvec_4_ptr *fn;
+
+ if (!dc_isar_feature(aa64_fcma, s)) {
+ return false;
+ }
+ switch (a->esz) {
+ case MO_64:
+ if (!a->q) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlad;
+ break;
+ case MO_32:
+ fn = gen_helper_gvec_fcmlas;
+ break;
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlah;
+ break;
+ default:
+ return false;
+ }
+ if (fp_access_check(s)) {
+ gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->esz == MO_16, a->rot, fn);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5985,6 +6018,36 @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
return true;
}
+static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+{
+ gen_helper_gvec_4_ptr *fn;
+
+ if (!dc_isar_feature(aa64_fcma, s)) {
+ return false;
+ }
+ switch (a->esz) {
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlah_idx;
+ break;
+ case MO_32:
+ if (!a->q && a->idx) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlas_idx;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ if (fp_access_check(s)) {
+ gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10942,90 +11005,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD three same extra
- * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
- * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
- * | 0 | Q | U | 0 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
- * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
- */
-static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int opcode = extract32(insn, 11, 4);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- bool u = extract32(insn, 29, 1);
- bool is_q = extract32(insn, 30, 1);
- bool feature;
- int rot;
-
- switch (u * 16 + opcode) {
- case 0x18: /* FCMLA, #0 */
- case 0x19: /* FCMLA, #90 */
- case 0x1a: /* FCMLA, #180 */
- case 0x1b: /* FCMLA, #270 */
- if (size == 0
- || (size == 1 && !dc_isar_feature(aa64_fp16, s))
- || (size == 3 && !is_q)) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_fcma, s);
- break;
- default:
- case 0x02: /* SDOT (vector) */
- case 0x03: /* USDOT */
- case 0x04: /* SMMLA */
- case 0x05: /* USMMLA */
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- case 0x12: /* UDOT (vector) */
- case 0x14: /* UMMLA */
- case 0x1c: /* FCADD, #90 */
- case 0x1d: /* BFMMLA */
- case 0x1e: /* FCADD, #270 */
- case 0x1f: /* BFDOT / BFMLAL */
- unallocated_encoding(s);
- return;
- }
- if (!feature) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
-
- switch (opcode) {
- case 0x8: /* FCMLA, #0 */
- case 0x9: /* FCMLA, #90 */
- case 0xa: /* FCMLA, #180 */
- case 0xb: /* FCMLA, #270 */
- rot = extract32(opcode, 0, 2);
- switch (size) {
- case 1:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, true, rot,
- gen_helper_gvec_fcmlah);
- break;
- case 2:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot,
- gen_helper_gvec_fcmlas);
- break;
- case 3:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot,
- gen_helper_gvec_fcmlad);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
- default:
- g_assert_not_reached();
- }
-}
-
static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q,
int size, int rn, int rd)
{
@@ -12001,10 +11980,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
bool is_long = false;
- int is_fp = 0;
- bool is_fp16 = false;
int index;
- TCGv_ptr fpst;
switch (16 * u + opcode) {
case 0x02: /* SMLAL, SMLAL2 */
@@ -12024,16 +12000,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x11: /* FCMLA #0 */
- case 0x13: /* FCMLA #90 */
- case 0x15: /* FCMLA #180 */
- case 0x17: /* FCMLA #270 */
- if (is_scalar || !dc_isar_feature(aa64_fcma, s)) {
- unallocated_encoding(s);
- return;
- }
- is_fp = 2;
- break;
default:
case 0x00: /* FMLAL */
case 0x01: /* FMLA */
@@ -12046,7 +12012,11 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0e: /* SDOT */
case 0x0f: /* SUDOT / BFDOT / USDOT / BFMLAL */
case 0x10: /* MLA */
+ case 0x11: /* FCMLA #0 */
+ case 0x13: /* FCMLA #90 */
case 0x14: /* MLS */
+ case 0x15: /* FCMLA #180 */
+ case 0x17: /* FCMLA #270 */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
@@ -12057,46 +12027,12 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
return;
}
- switch (is_fp) {
- case 1: /* normal fp */
- unallocated_encoding(s); /* in decodetree */
- return;
-
- case 2: /* complex fp */
- /* Each indexable element is a complex pair. */
- size += 1;
- switch (size) {
- case MO_32:
- if (h && !is_q) {
- unallocated_encoding(s);
- return;
- }
- is_fp16 = true;
- break;
- case MO_64:
- break;
- default:
- unallocated_encoding(s);
- return;
- }
- break;
-
- default: /* integer */
- switch (size) {
- case MO_8:
- case MO_64:
- unallocated_encoding(s);
- return;
- }
- break;
- }
- if (is_fp16 && !dc_isar_feature(aa64_fp16, s)) {
- unallocated_encoding(s);
- return;
- }
-
/* Given MemOp size, adjust register and indexing. */
switch (size) {
+ case MO_8:
+ case MO_64:
+ unallocated_encoding(s);
+ return;
case MO_16:
index = h << 2 | l << 1 | m;
break;
@@ -12104,14 +12040,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
index = h << 1 | l;
rm |= m << 4;
break;
- case MO_64:
- if (l || !is_q) {
- unallocated_encoding(s);
- return;
- }
- index = h;
- rm |= m << 4;
- break;
default:
g_assert_not_reached();
}
@@ -12120,32 +12048,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
return;
}
- if (is_fp) {
- fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR);
- } else {
- fpst = NULL;
- }
-
- switch (16 * u + opcode) {
- case 0x11: /* FCMLA #0 */
- case 0x13: /* FCMLA #90 */
- case 0x15: /* FCMLA #180 */
- case 0x17: /* FCMLA #270 */
- {
- int rot = extract32(insn, 13, 2);
- int data = (index << 2) | rot;
- tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- vec_full_reg_offset(s, rd), fpst,
- is_q ? 16 : 8, vec_full_reg_size(s), data,
- size == MO_64
- ? gen_helper_gvec_fcmlas_idx
- : gen_helper_gvec_fcmlah_idx);
- }
- return;
- }
-
if (size == 3) {
g_assert_not_reached();
} else if (!is_long) {
@@ -12407,7 +12309,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
*/
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
- { 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
{ 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
{ 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 12/13] target/arm: Convert FCMLA to decodetree
2024-06-25 5:08 ` [PATCH 12/13] target/arm: Convert FCMLA " Richard Henderson
@ 2024-06-25 12:35 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 12:35 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 25 Jun 2024 at 06:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/a64.decode | 5 +
> target/arm/tcg/translate-a64.c | 241 ++++++++++-----------------------
> 2 files changed, 76 insertions(+), 170 deletions(-)
>
> diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
> index f330919851..4b2a6ba302 100644
> --- a/target/arm/tcg/a64.decode
> +++ b/target/arm/tcg/a64.decode
> @@ -960,6 +960,8 @@ USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
> FCADD_90 0.10 1110 ..0 ..... 11100 1 ..... ..... @qrrr_e
> FCADD_270 0.10 1110 ..0 ..... 11110 1 ..... ..... @qrrr_e
>
> +FCMLA_v 0 q:1 10 1110 esz:2 0 rm:5 110 rot:2 1 rn:5 rd:5
> +
> ### Advanced SIMD scalar x indexed element
>
> FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
> @@ -1041,6 +1043,9 @@ USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
> BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
> BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
>
> +FCMLA_vi 0 q:1 10 1111 10 . rm:5 0 rot:2 1 . 0 rn:5 rd:5 esz=1 idx=%hl
> +FCMLA_vi 0 q:1 10 1111 01 0 rm:5 0 rot:2 1 idx:1 0 rn:5 rd:5 esz=2
This doesn't look right. Bits [23:22] are the size field, but
here you have 10 and esz=1, and 01 and esz=2. I think the 01
and 10 should be the other way around.
In the size 10 case (MO_32), we should UNDEF for L=1 or Q=0,
which is to say that L must be 0 and Q must be 1. We hard-wire
the L bit (bit 21) to 0 in the decode, but we leave q:1 as it
is; I think it would be better for the second line here to have
a forced 1 in the q bit and set q=1 at the end of the line.
We also don't have the check for H==1 && Q==0 for the size 01 case.
I think we can do that in the decode file by splitting the 01
case into two lines, one for Q=0 and one for Q=1.
> +static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
> +{
> + gen_helper_gvec_4_ptr *fn;
> +
> + if (!dc_isar_feature(aa64_fcma, s)) {
> + return false;
> + }
> + switch (a->esz) {
> + case MO_16:
> + if (!dc_isar_feature(aa64_fp16, s)) {
> + return false;
> + }
> + fn = gen_helper_gvec_fcmlah_idx;
> + break;
> + case MO_32:
> + if (!a->q && a->idx) {
> + return false;
> + }
I suspect this was supposed to be the size 01 H==1 && Q==0
check, but it's in the wrong case.
So my suggested fixup for this patch is:
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 4b2a6ba302d..223eac3cac2 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1043,8 +1043,9 @@ USDOT_vi 0.00 1111 10 .. .... 1111 . 0
..... ..... @qrrx_s
BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
-FCMLA_vi 0 q:1 10 1111 10 . rm:5 0 rot:2 1 . 0 rn:5 rd:5 esz=1 idx=%hl
-FCMLA_vi 0 q:1 10 1111 01 0 rm:5 0 rot:2 1 idx:1 0 rn:5 rd:5 esz=2
+FCMLA_vi 0 0 10 1111 01 idx:1 rm:5 0 rot:2 1 0 0 rn:5 rd:5 esz=1 q=0
+FCMLA_vi 0 1 10 1111 01 . rm:5 0 rot:2 1 . 0 rn:5 rd:5 esz=1 idx=%hl q=1
+FCMLA_vi 0 1 10 1111 10 0 rm:5 0 rot:2 1 idx:1 0 rn:5 rd:5 esz=2 q=1
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0a54a9ef8f7..161fa2659c4 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6033,9 +6033,6 @@ static bool trans_FCMLA_vi(DisasContext *s,
arg_FCMLA_vi *a)
fn = gen_helper_gvec_fcmlah_idx;
break;
case MO_32:
- if (!a->q && a->idx) {
- return false;
- }
fn = gen_helper_gvec_fcmlas_idx;
break;
default:
thanks
-- PMM
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed
2024-06-25 5:07 [PATCH 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (11 preceding siblings ...)
2024-06-25 5:08 ` [PATCH 12/13] target/arm: Convert FCMLA " Richard Henderson
@ 2024-06-25 5:08 ` Richard Henderson
2024-06-25 12:41 ` Peter Maydell
12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 5:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
The last insns in this block, MLA and MLS, were converted
with f80701cb44d, and this code should have been removed then.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 93 ----------------------------------
1 file changed, 93 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0a54a9ef8f..11955c0c36 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -11979,7 +11979,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
int h = extract32(insn, 11, 1);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- bool is_long = false;
int index;
switch (16 * u + opcode) {
@@ -11993,12 +11992,10 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
unallocated_encoding(s);
return;
}
- is_long = true;
break;
case 0x03: /* SQDMLAL, SQDMLAL2 */
case 0x07: /* SQDMLSL, SQDMLSL2 */
case 0x0b: /* SQDMULL, SQDMULL2 */
- is_long = true;
break;
default:
case 0x00: /* FMLAL */
@@ -12050,96 +12047,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
if (size == 3) {
g_assert_not_reached();
- } else if (!is_long) {
- /* 32 bit floating point, or 16 or 32 bit integer.
- * For the 16 bit scalar case we use the usual Neon helpers and
- * rely on the fact that 0 op 0 == 0 with no side effects.
- */
- TCGv_i32 tcg_idx = tcg_temp_new_i32();
- int pass, maxpasses;
-
- if (is_scalar) {
- maxpasses = 1;
- } else {
- maxpasses = is_q ? 4 : 2;
- }
-
- read_vec_element_i32(s, tcg_idx, rm, index, size);
-
- if (size == 1 && !is_scalar) {
- /* The simplest way to handle the 16x16 indexed ops is to duplicate
- * the index into both halves of the 32 bit tcg_idx and then use
- * the usual Neon helpers.
- */
- tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
- }
-
- for (pass = 0; pass < maxpasses; pass++) {
- TCGv_i32 tcg_op = tcg_temp_new_i32();
- TCGv_i32 tcg_res = tcg_temp_new_i32();
-
- read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
-
- switch (16 * u + opcode) {
- case 0x10: /* MLA */
- case 0x14: /* MLS */
- {
- static NeonGenTwoOpFn * const fns[2][2] = {
- { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
- { tcg_gen_add_i32, tcg_gen_sub_i32 },
- };
- NeonGenTwoOpFn *genfn;
- bool is_sub = opcode == 0x4;
-
- if (size == 1) {
- gen_helper_neon_mul_u16(tcg_res, tcg_op, tcg_idx);
- } else {
- tcg_gen_mul_i32(tcg_res, tcg_op, tcg_idx);
- }
- if (opcode == 0x8) {
- break;
- }
- read_vec_element_i32(s, tcg_op, rd, pass, MO_32);
- genfn = fns[size - 1][is_sub];
- genfn(tcg_res, tcg_op, tcg_res);
- break;
- }
- case 0x0c: /* SQDMULH */
- if (size == 1) {
- gen_helper_neon_qdmulh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- } else {
- gen_helper_neon_qdmulh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- }
- break;
- case 0x0d: /* SQRDMULH */
- if (size == 1) {
- gen_helper_neon_qrdmulh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- } else {
- gen_helper_neon_qrdmulh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- }
- break;
- default:
- case 0x01: /* FMLA */
- case 0x05: /* FMLS */
- case 0x09: /* FMUL */
- case 0x19: /* FMULX */
- case 0x1d: /* SQRDMLAH */
- case 0x1f: /* SQRDMLSH */
- g_assert_not_reached();
- }
-
- if (is_scalar) {
- write_fp_sreg(s, rd, tcg_res);
- } else {
- write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
- }
- }
-
- clear_vec_high(s, is_q, rd);
} else {
/* long ops: 16x16->32 or 32x32->64 */
TCGv_i64 tcg_res[2];
--
2.34.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed
2024-06-25 5:08 ` [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed Richard Henderson
@ 2024-06-25 12:41 ` Peter Maydell
2024-06-25 14:18 ` Richard Henderson
0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 12:41 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 25 Jun 2024 at 06:09, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The last insns in this block, MLA and MLS, were converted
> with f80701cb44d, and this code should have been removed then.
"MLA, MLS, SQDMULH, SQRDMULH, were converted with f80701cb44d
and f80701cb44d33", I think, since there's still code for
all four of those insns that we're deleting here ?
> - switch (16 * u + opcode) {
> - case 0x10: /* MLA */
> - case 0x14: /* MLS */
> - {
> - static NeonGenTwoOpFn * const fns[2][2] = {
> - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
> - { tcg_gen_add_i32, tcg_gen_sub_i32 },
> - };
> - NeonGenTwoOpFn *genfn;
> - bool is_sub = opcode == 0x4;
> -
> - if (size == 1) {
> - gen_helper_neon_mul_u16(tcg_res, tcg_op, tcg_idx);
> - } else {
> - tcg_gen_mul_i32(tcg_res, tcg_op, tcg_idx);
> - }
> - if (opcode == 0x8) {
> - break;
> - }
> - read_vec_element_i32(s, tcg_op, rd, pass, MO_32);
> - genfn = fns[size - 1][is_sub];
> - genfn(tcg_res, tcg_op, tcg_res);
> - break;
> - }
> - case 0x0c: /* SQDMULH */
> - if (size == 1) {
> - gen_helper_neon_qdmulh_s16(tcg_res, tcg_env,
> - tcg_op, tcg_idx);
> - } else {
> - gen_helper_neon_qdmulh_s32(tcg_res, tcg_env,
> - tcg_op, tcg_idx);
> - }
> - break;
> - case 0x0d: /* SQRDMULH */
> - if (size == 1) {
> - gen_helper_neon_qrdmulh_s16(tcg_res, tcg_env,
> - tcg_op, tcg_idx);
> - } else {
> - gen_helper_neon_qrdmulh_s32(tcg_res, tcg_env,
> - tcg_op, tcg_idx);
> - }
> - break;
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed
2024-06-25 12:41 ` Peter Maydell
@ 2024-06-25 14:18 ` Richard Henderson
2024-06-25 14:21 ` Peter Maydell
0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-06-25 14:18 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, qemu-arm
On 6/25/24 05:41, Peter Maydell wrote:
> On Tue, 25 Jun 2024 at 06:09, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> The last insns in this block, MLA and MLS, were converted
>> with f80701cb44d, and this code should have been removed then.
>
> "MLA, MLS, SQDMULH, SQRDMULH, were converted with f80701cb44d
> and f80701cb44d33", I think, since there's still code for
> all four of those insns that we're deleting here ?
Yes.
r~
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 13/13] target/arm: Delete dead code from disas_simd_indexed
2024-06-25 14:18 ` Richard Henderson
@ 2024-06-25 14:21 ` Peter Maydell
0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-06-25 14:21 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 25 Jun 2024 at 15:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/25/24 05:41, Peter Maydell wrote:
> > On Tue, 25 Jun 2024 at 06:09, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> The last insns in this block, MLA and MLS, were converted
> >> with f80701cb44d, and this code should have been removed then.
> >
> > "MLA, MLS, SQDMULH, SQRDMULH, were converted with f80701cb44d
> > and f80701cb44d33", I think, since there's still code for
> > all four of those insns that we're deleting here ?
>
> Yes.
...except "with 8db93dcd3def0ca and f80701cb44d".
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread