* [PATCH v2 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx]
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0 Richard Henderson
` (12 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable, Peter Maydell
The inner loop, bounded by eltspersegment, must not be
larger than the outer loop, bounded by elements.
Cc: qemu-stable@nongnu.org
Fixes: 18fc2405781 ("target/arm: Implement SVE fp complex multiply add (indexed)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2376
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/vec_helper.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index b05922b425..7b34cc98af 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -907,7 +907,7 @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, void *va,
intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
uint32_t neg_real = flip ^ neg_imag;
intptr_t elements = opr_sz / sizeof(float16);
- intptr_t eltspersegment = 16 / sizeof(float16);
+ intptr_t eltspersegment = MIN(16 / sizeof(float16), elements);
intptr_t i, j;
/* Shift boolean to the sign bit so we can xor to negate. */
@@ -969,7 +969,7 @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, void *va,
intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2);
uint32_t neg_real = flip ^ neg_imag;
intptr_t elements = opr_sz / sizeof(float32);
- intptr_t eltspersegment = 16 / sizeof(float32);
+ intptr_t eltspersegment = MIN(16 / sizeof(float32), elements);
intptr_t i, j;
/* Shift boolean to the sign bit so we can xor to negate. */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
2024-06-25 18:35 ` [PATCH v2 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx] Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-07-02 6:48 ` Michael Tokarev
2024-06-25 18:35 ` [PATCH v2 03/13] target/arm: Fix FJCVTZS vs flush-to-zero Richard Henderson
` (11 subsequent siblings)
13 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable, Peter Maydell
The inner loop, bounded by eltspersegment, must not be
larger than the outer loop, bounded by elements.
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/vec_helper.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 7b34cc98af..d477479bb1 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -317,10 +317,12 @@ void HELPER(neon_sqdmulh_idx_h)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
- for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ for (i = 0; i < elements; i += 16 / 2) {
int16_t mm = m[i];
- for (j = 0; j < 16 / 2; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, false, vq);
}
}
@@ -333,10 +335,12 @@ void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
- for (i = 0; i < opr_sz / 2; i += 16 / 2) {
+ for (i = 0; i < elements; i += 16 / 2) {
int16_t mm = m[i];
- for (j = 0; j < 16 / 2; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, true, vq);
}
}
@@ -512,10 +516,12 @@ void HELPER(neon_sqdmulh_idx_s)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
- for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ for (i = 0; i < elements; i += 16 / 4) {
int32_t mm = m[i];
- for (j = 0; j < 16 / 4; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, false, vq);
}
}
@@ -528,10 +534,12 @@ void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
intptr_t i, j, opr_sz = simd_oprsz(desc);
int idx = simd_data(desc);
int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
- for (i = 0; i < opr_sz / 4; i += 16 / 4) {
+ for (i = 0; i < elements; i += 16 / 4) {
int32_t mm = m[i];
- for (j = 0; j < 16 / 4; ++j) {
+ for (j = 0; j < eltspersegment; ++j) {
d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, true, vq);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0
2024-06-25 18:35 ` [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0 Richard Henderson
@ 2024-07-02 6:48 ` Michael Tokarev
2024-07-02 14:37 ` Richard Henderson
0 siblings, 1 reply; 17+ messages in thread
From: Michael Tokarev @ 2024-07-02 6:48 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: qemu-arm, qemu-stable, Peter Maydell
25.06.2024 21:35, Richard Henderson wrote:
> The inner loop, bounded by eltspersegment, must not be
> larger than the outer loop, bounded by elements.
>
> Cc: qemu-stable@nongnu.org
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/vec_helper.c | 24 ++++++++++++++++--------
> 1 file changed, 16 insertions(+), 8 deletions(-)
If my understanding is correct, this one
Fixes: f80701cb44d3 ("target/arm: Convert SQDMULH, SQRDMULH to decodetree")
and before this commit, there was no issue.
Is my understanding correct?
Thanks,
/mjt
> diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> index 7b34cc98af..d477479bb1 100644
> --- a/target/arm/tcg/vec_helper.c
> +++ b/target/arm/tcg/vec_helper.c
> @@ -317,10 +317,12 @@ void HELPER(neon_sqdmulh_idx_h)(void *vd, void *vn, void *vm,
> intptr_t i, j, opr_sz = simd_oprsz(desc);
> int idx = simd_data(desc);
> int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
> + intptr_t elements = opr_sz / 2;
> + intptr_t eltspersegment = MIN(16 / 2, elements);
>
> - for (i = 0; i < opr_sz / 2; i += 16 / 2) {
> + for (i = 0; i < elements; i += 16 / 2) {
> int16_t mm = m[i];
> - for (j = 0; j < 16 / 2; ++j) {
> + for (j = 0; j < eltspersegment; ++j) {
> d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, false, vq);
> }
> }
> @@ -333,10 +335,12 @@ void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
> intptr_t i, j, opr_sz = simd_oprsz(desc);
> int idx = simd_data(desc);
> int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
> + intptr_t elements = opr_sz / 2;
> + intptr_t eltspersegment = MIN(16 / 2, elements);
>
> - for (i = 0; i < opr_sz / 2; i += 16 / 2) {
> + for (i = 0; i < elements; i += 16 / 2) {
> int16_t mm = m[i];
> - for (j = 0; j < 16 / 2; ++j) {
> + for (j = 0; j < eltspersegment; ++j) {
> d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, true, vq);
> }
> }
> @@ -512,10 +516,12 @@ void HELPER(neon_sqdmulh_idx_s)(void *vd, void *vn, void *vm,
> intptr_t i, j, opr_sz = simd_oprsz(desc);
> int idx = simd_data(desc);
> int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
> + intptr_t elements = opr_sz / 4;
> + intptr_t eltspersegment = MIN(16 / 4, elements);
>
> - for (i = 0; i < opr_sz / 4; i += 16 / 4) {
> + for (i = 0; i < elements; i += 16 / 4) {
> int32_t mm = m[i];
> - for (j = 0; j < 16 / 4; ++j) {
> + for (j = 0; j < eltspersegment; ++j) {
> d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, false, vq);
> }
> }
> @@ -528,10 +534,12 @@ void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
> intptr_t i, j, opr_sz = simd_oprsz(desc);
> int idx = simd_data(desc);
> int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
> + intptr_t elements = opr_sz / 4;
> + intptr_t eltspersegment = MIN(16 / 4, elements);
>
> - for (i = 0; i < opr_sz / 4; i += 16 / 4) {
> + for (i = 0; i < elements; i += 16 / 4) {
> int32_t mm = m[i];
> - for (j = 0; j < 16 / 4; ++j) {
> + for (j = 0; j < eltspersegment; ++j) {
> d[i + j] = do_sqrdmlah_s(n[i + j], mm, 0, false, true, vq);
> }
> }
--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E 9D8B E14E 3F2A 9DD7 9199 28F1 61AD 3D98 ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0
2024-07-02 6:48 ` Michael Tokarev
@ 2024-07-02 14:37 ` Richard Henderson
0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-07-02 14:37 UTC (permalink / raw)
To: Michael Tokarev, qemu-devel; +Cc: qemu-arm, qemu-stable, Peter Maydell
On 7/1/24 23:48, Michael Tokarev wrote:
> 25.06.2024 21:35, Richard Henderson wrote:
>> The inner loop, bounded by eltspersegment, must not be
>> larger than the outer loop, bounded by elements.
>>
>> Cc: qemu-stable@nongnu.org
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/tcg/vec_helper.c | 24 ++++++++++++++++--------
>> 1 file changed, 16 insertions(+), 8 deletions(-)
>
> If my understanding is correct, this one
>
> Fixes: f80701cb44d3 ("target/arm: Convert SQDMULH, SQRDMULH to decodetree")
>
> and before this commit, there was no issue.
>
> Is my understanding correct?
Yes. So, not as old a bug as I thought.
r~
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 03/13] target/arm: Fix FJCVTZS vs flush-to-zero
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
2024-06-25 18:35 ` [PATCH v2 01/13] target/arm: Fix VCMLA Dd, Dn, Dm[idx] Richard Henderson
2024-06-25 18:35 ` [PATCH v2 02/13] target/arm: Fix SQDMULH (by element) with Q=0 Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree Richard Henderson
` (10 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, qemu-stable, Peter Maydell
Input denormals cause the Javascript inexact bit
(output to Z) to be set.
Cc: qemu-stable@nongnu.org
Fixes: 6c1f6f2733a ("target/arm: Implement ARMv8.3-JSConv")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2375
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/vfp_helper.c | 18 +++++++++---------
tests/tcg/aarch64/test-2375.c | 21 +++++++++++++++++++++
tests/tcg/aarch64/Makefile.target | 3 ++-
3 files changed, 32 insertions(+), 10 deletions(-)
create mode 100644 tests/tcg/aarch64/test-2375.c
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index ce26b8a71a..50d7042fa9 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -1091,8 +1091,8 @@ const FloatRoundMode arm_rmode_to_sf_map[] = {
uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
{
float_status *status = vstatus;
- uint32_t inexact, frac;
- uint32_t e_old, e_new;
+ uint32_t frac, e_old, e_new;
+ bool inexact;
e_old = get_float_exception_flags(status);
set_float_exception_flags(0, status);
@@ -1100,13 +1100,13 @@ uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
e_new = get_float_exception_flags(status);
set_float_exception_flags(e_old | e_new, status);
- if (value == float64_chs(float64_zero)) {
- /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
- inexact = 1;
- } else {
- /* Normal inexact or overflow or NaN */
- inexact = e_new & (float_flag_inexact | float_flag_invalid);
- }
+ /* Normal inexact, denormal with flush-to-zero, or overflow or NaN */
+ inexact = e_new & (float_flag_inexact |
+ float_flag_input_denormal |
+ float_flag_invalid);
+
+ /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript. */
+ inexact |= value == float64_chs(float64_zero);
/* Pack the result and the env->ZF representation of Z together. */
return deposit64(frac, 32, 32, inexact);
diff --git a/tests/tcg/aarch64/test-2375.c b/tests/tcg/aarch64/test-2375.c
new file mode 100644
index 0000000000..163eba422b
--- /dev/null
+++ b/tests/tcg/aarch64/test-2375.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright (c) 2024 Linaro Ltd */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2375 */
+
+#include <assert.h>
+
+int main(void)
+{
+ int r, z;
+
+ asm("msr fpcr, %2\n\t"
+ "fjcvtzs %w0, %d3\n\t"
+ "cset %1, eq"
+ : "=r"(r), "=r"(z)
+ : "r"(0x01000000L), /* FZ = 1 */
+ "w"(0xfcff00L)); /* denormal */
+
+ assert(r == 0);
+ assert(z == 0);
+ return 0;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 70d728ae9a..4ecbca6a41 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -41,8 +41,9 @@ endif
# Pauth Tests
ifneq ($(CROSS_CC_HAS_ARMV8_3),)
-AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5
+AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5 test-2375
pauth-%: CFLAGS += -march=armv8.3-a
+test-2375: CFLAGS += -march=armv8.3-a
run-pauth-1: QEMU_OPTS += -cpu max
run-pauth-2: QEMU_OPTS += -cpu max
# Choose a cpu with FEAT_Pauth but without FEAT_FPAC for pauth-[45].
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (2 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 03/13] target/arm: Fix FJCVTZS vs flush-to-zero Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 05/13] target/arm: Convert SDOT, UDOT " Richard Henderson
` (9 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 10 ++
target/arm/tcg/a64.decode | 16 +++
target/arm/tcg/translate-a64.c | 206 +++++++++++++--------------------
target/arm/tcg/vec_helper.c | 72 ++++++++++++
4 files changed, 180 insertions(+), 124 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index eca2043fc2..970d059dec 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -979,6 +979,16 @@ DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_h, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlah_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlah_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(neon_sqrdmlsh_idx_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmlsh_idx_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2b7a3254a0..613cc9365c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -781,6 +781,8 @@ CMEQ_s 0111 1110 111 ..... 10001 1 ..... ..... @rrr_d
SQDMULH_s 0101 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
SQRDMULH_s 0111 1110 ..1 ..... 10110 1 ..... ..... @rrr_e
+SQRDMLAH_s 0111 1110 ..0 ..... 10000 1 ..... ..... @rrr_e
+SQRDMLSH_s 0111 1110 ..0 ..... 10001 1 ..... ..... @rrr_e
### Advanced SIMD scalar pairwise
@@ -941,6 +943,8 @@ MLS_v 0.10 1110 ..1 ..... 10010 1 ..... ..... @qrrr_e
SQDMULH_v 0.00 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
SQRDMULH_v 0.10 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
+SQRDMLAH_v 0.10 1110 ..0 ..... 10000 1 ..... ..... @qrrr_e
+SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
### Advanced SIMD scalar x indexed element
@@ -966,6 +970,12 @@ SQDMULH_si 0101 1111 10 .. .... 1100 . 0 ..... ..... @rrx_s
SQRDMULH_si 0101 1111 01 .. .... 1101 . 0 ..... ..... @rrx_h
SQRDMULH_si 0101 1111 10 . ..... 1101 . 0 ..... ..... @rrx_s
+SQRDMLAH_si 0111 1111 01 .. .... 1101 . 0 ..... ..... @rrx_h
+SQRDMLAH_si 0111 1111 10 .. .... 1101 . 0 ..... ..... @rrx_s
+
+SQRDMLSH_si 0111 1111 01 .. .... 1111 . 0 ..... ..... @rrx_h
+SQRDMLSH_si 0111 1111 10 .. .... 1111 . 0 ..... ..... @rrx_s
+
### Advanced SIMD vector x indexed element
FMUL_vi 0.00 1111 00 .. .... 1001 . 0 ..... ..... @qrrx_h
@@ -1004,6 +1014,12 @@ SQDMULH_vi 0.00 1111 10 . ..... 1100 . 0 ..... ..... @qrrx_s
SQRDMULH_vi 0.00 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
SQRDMULH_vi 0.00 1111 10 . ..... 1101 . 0 ..... ..... @qrrx_s
+SQRDMLAH_vi 0.10 1111 01 .. .... 1101 . 0 ..... ..... @qrrx_h
+SQRDMLAH_vi 0.10 1111 10 .. .... 1101 . 0 ..... ..... @qrrx_s
+
+SQRDMLSH_vi 0.10 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_h
+SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 93543da39c..32c24c7422 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5235,6 +5235,43 @@ static const ENVScalar2 f_scalar_sqrdmulh = {
};
TRANS(SQRDMULH_s, do_env_scalar2_hs, a, &f_scalar_sqrdmulh)
+typedef struct ENVScalar3 {
+ NeonGenThreeOpEnvFn *gen_hs[2];
+} ENVScalar3;
+
+static bool do_env_scalar3_hs(DisasContext *s, arg_rrr_e *a,
+ const ENVScalar3 *f)
+{
+ TCGv_i32 t0, t1, t2;
+
+ if (a->esz != MO_16 && a->esz != MO_32) {
+ return false;
+ }
+ if (!fp_access_check(s)) {
+ return true;
+ }
+
+ t0 = tcg_temp_new_i32();
+ t1 = tcg_temp_new_i32();
+ t2 = tcg_temp_new_i32();
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, 0, a->esz);
+ read_vec_element_i32(s, t2, a->rd, 0, a->esz);
+ f->gen_hs[a->esz - 1](t0, tcg_env, t0, t1, t2);
+ write_fp_sreg(s, a->rd, t0);
+ return true;
+}
+
+static const ENVScalar3 f_scalar_sqrdmlah = {
+ { gen_helper_neon_qrdmlah_s16, gen_helper_neon_qrdmlah_s32 }
+};
+TRANS_FEAT(SQRDMLAH_s, aa64_rdm, do_env_scalar3_hs, a, &f_scalar_sqrdmlah)
+
+static const ENVScalar3 f_scalar_sqrdmlsh = {
+ { gen_helper_neon_qrdmlsh_s16, gen_helper_neon_qrdmlsh_s32 }
+};
+TRANS_FEAT(SQRDMLSH_s, aa64_rdm, do_env_scalar3_hs, a, &f_scalar_sqrdmlsh)
+
static bool do_cmop_d(DisasContext *s, arg_rrr_e *a, TCGCond cond)
{
if (fp_access_check(s)) {
@@ -5552,6 +5589,8 @@ TRANS(CMTST_v, do_gvec_fn3, a, gen_gvec_cmtst)
TRANS(SQDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqdmulh_qc)
TRANS(SQRDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmulh_qc)
+TRANS_FEAT(SQRDMLAH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlah_qc)
+TRANS_FEAT(SQRDMLSH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlsh_qc)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5681,6 +5720,29 @@ static bool do_env_scalar2_idx_hs(DisasContext *s, arg_rrx_e *a,
TRANS(SQDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqdmulh)
TRANS(SQRDMULH_si, do_env_scalar2_idx_hs, a, &f_scalar_sqrdmulh)
+static bool do_env_scalar3_idx_hs(DisasContext *s, arg_rrx_e *a,
+ const ENVScalar3 *f)
+{
+ if (a->esz < MO_16 || a->esz > MO_32) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+ TCGv_i32 t2 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+ read_vec_element_i32(s, t1, a->rm, a->idx, a->esz);
+ read_vec_element_i32(s, t2, a->rd, 0, a->esz);
+ f->gen_hs[a->esz - 1](t0, tcg_env, t0, t1, t2);
+ write_fp_sreg(s, a->rd, t0);
+ }
+ return true;
+}
+
+TRANS_FEAT(SQRDMLAH_si, aa64_rdm, do_env_scalar3_idx_hs, a, &f_scalar_sqrdmlah)
+TRANS_FEAT(SQRDMLSH_si, aa64_rdm, do_env_scalar3_idx_hs, a, &f_scalar_sqrdmlsh)
+
static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a,
gen_helper_gvec_3_ptr * const fns[3])
{
@@ -5838,6 +5900,20 @@ static gen_helper_gvec_4 * const f_vector_idx_sqrdmulh[2] = {
};
TRANS(SQRDMULH_vi, do_int3_qc_vector_idx, a, f_vector_idx_sqrdmulh)
+static gen_helper_gvec_4 * const f_vector_idx_sqrdmlah[2] = {
+ gen_helper_neon_sqrdmlah_idx_h,
+ gen_helper_neon_sqrdmlah_idx_s,
+};
+TRANS_FEAT(SQRDMLAH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
+ f_vector_idx_sqrdmlah)
+
+static gen_helper_gvec_4 * const f_vector_idx_sqrdmlsh[2] = {
+ gen_helper_neon_sqrdmlsh_idx_h,
+ gen_helper_neon_sqrdmlsh_idx_s,
+};
+TRANS_FEAT(SQRDMLSH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
+ f_vector_idx_sqrdmlsh)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -9536,84 +9612,6 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD scalar three same extra
- * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
- * +-----+---+-----------+------+---+------+---+--------+---+----+----+
- * | 0 1 | U | 1 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
- * +-----+---+-----------+------+---+------+---+--------+---+----+----+
- */
-static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
- uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int opcode = extract32(insn, 11, 4);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- bool u = extract32(insn, 29, 1);
- TCGv_i32 ele1, ele2, ele3;
- TCGv_i64 res;
- bool feature;
-
- switch (u * 16 + opcode) {
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- if (size != 1 && size != 2) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_rdm, s);
- break;
- default:
- unallocated_encoding(s);
- return;
- }
- if (!feature) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
-
- /* Do a single operation on the lowest element in the vector.
- * We use the standard Neon helpers and rely on 0 OP 0 == 0
- * with no side effects for all these operations.
- * OPTME: special-purpose helpers would avoid doing some
- * unnecessary work in the helper for the 16 bit cases.
- */
- ele1 = tcg_temp_new_i32();
- ele2 = tcg_temp_new_i32();
- ele3 = tcg_temp_new_i32();
-
- read_vec_element_i32(s, ele1, rn, 0, size);
- read_vec_element_i32(s, ele2, rm, 0, size);
- read_vec_element_i32(s, ele3, rd, 0, size);
-
- switch (opcode) {
- case 0x0: /* SQRDMLAH */
- if (size == 1) {
- gen_helper_neon_qrdmlah_s16(ele3, tcg_env, ele1, ele2, ele3);
- } else {
- gen_helper_neon_qrdmlah_s32(ele3, tcg_env, ele1, ele2, ele3);
- }
- break;
- case 0x1: /* SQRDMLSH */
- if (size == 1) {
- gen_helper_neon_qrdmlsh_s16(ele3, tcg_env, ele1, ele2, ele3);
- } else {
- gen_helper_neon_qrdmlsh_s32(ele3, tcg_env, ele1, ele2, ele3);
- }
- break;
- default:
- g_assert_not_reached();
- }
-
- res = tcg_temp_new_i64();
- tcg_gen_extu_i32_i64(res, ele3);
- write_fp_dreg(s, rd, res);
-}
-
static void handle_2misc_64(DisasContext *s, int opcode, bool u,
TCGv_i64 tcg_rd, TCGv_i64 tcg_rn,
TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus)
@@ -10892,14 +10890,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- if (size != 1 && size != 2) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_rdm, s);
- break;
case 0x02: /* SDOT (vector) */
case 0x12: /* UDOT (vector) */
if (size != MO_32) {
@@ -10957,6 +10947,8 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x10: /* SQRDMLAH (vector) */
+ case 0x11: /* SQRDMLSH (vector) */
unallocated_encoding(s);
return;
}
@@ -10969,14 +10961,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x0: /* SQRDMLAH (vector) */
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmlah_qc, size);
- return;
-
- case 0x1: /* SQRDMLSH (vector) */
- gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmlsh_qc, size);
- return;
-
case 0x2: /* SDOT / UDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0,
u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b);
@@ -12059,13 +12043,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x1d: /* SQRDMLAH */
- case 0x1f: /* SQRDMLSH */
- if (!dc_isar_feature(aa64_rdm, s)) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x0e: /* SDOT */
case 0x1e: /* UDOT */
if (is_scalar || size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
@@ -12127,6 +12104,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
+ case 0x1d: /* SQRDMLAH */
+ case 0x1f: /* SQRDMLSH */
unallocated_encoding(s);
return;
}
@@ -12320,33 +12299,13 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
tcg_op, tcg_idx);
}
break;
- case 0x1d: /* SQRDMLAH */
- read_vec_element_i32(s, tcg_res, rd, pass,
- is_scalar ? size : MO_32);
- if (size == 1) {
- gen_helper_neon_qrdmlah_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- } else {
- gen_helper_neon_qrdmlah_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- }
- break;
- case 0x1f: /* SQRDMLSH */
- read_vec_element_i32(s, tcg_res, rd, pass,
- is_scalar ? size : MO_32);
- if (size == 1) {
- gen_helper_neon_qrdmlsh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- } else {
- gen_helper_neon_qrdmlsh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx, tcg_res);
- }
- break;
default:
case 0x01: /* FMLA */
case 0x05: /* FMLS */
case 0x09: /* FMUL */
case 0x19: /* FMULX */
+ case 0x1d: /* SQRDMLAH */
+ case 0x1f: /* SQRDMLSH */
g_assert_not_reached();
}
@@ -12538,7 +12497,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
{ 0x0e000000, 0xbf208c00, disas_simd_tb },
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x2e000000, 0xbf208400, disas_simd_ext },
- { 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
{ 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5f000000, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index d477479bb1..98604d170f 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -347,6 +347,42 @@ void HELPER(neon_sqrdmulh_idx_h)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqrdmlah_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
+
+ for (i = 0; i < elements; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, d[i + j], false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmlsh_idx_h)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx);
+ intptr_t elements = opr_sz / 2;
+ intptr_t eltspersegment = MIN(16 / 2, elements);
+
+ for (i = 0; i < elements; i += 16 / 2) {
+ int16_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_h(n[i + j], mm, d[i + j], true, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_h)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
@@ -546,6 +582,42 @@ void HELPER(neon_sqrdmulh_idx_s)(void *vd, void *vn, void *vm,
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+void HELPER(neon_sqrdmlah_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
+
+ for (i = 0; i < elements; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, d[i + j], false, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmlsh_idx_s)(void *vd, void *vn, void *vm,
+ void *vq, uint32_t desc)
+{
+ intptr_t i, j, opr_sz = simd_oprsz(desc);
+ int idx = simd_data(desc);
+ int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx);
+ intptr_t elements = opr_sz / 4;
+ intptr_t eltspersegment = MIN(16 / 4, elements);
+
+ for (i = 0; i < elements; i += 16 / 4) {
+ int32_t mm = m[i];
+ for (j = 0; j < eltspersegment; ++j) {
+ d[i + j] = do_sqrdmlah_s(n[i + j], mm, d[i + j], true, true, vq);
+ }
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
void HELPER(sve2_sqrdmlah_s)(void *vd, void *vn, void *vm,
void *va, uint32_t desc)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 05/13] target/arm: Convert SDOT, UDOT to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (3 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 04/13] target/arm: Convert SQRDMLAH, SQRDMLSH to decodetree Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 06/13] target/arm: Convert SUDOT, USDOT " Richard Henderson
` (8 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 7 +++++
target/arm/tcg/translate-a64.c | 54 ++++++++++++++++++----------------
2 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 613cc9365c..7411d4ba97 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -61,6 +61,7 @@
@qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0
@qrrr_h . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=1
+@qrrr_s . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=2
@qrrr_sd . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=%esz_sd
@qrrr_e . q:1 ...... esz:2 . rm:5 ...... rn:5 rd:5 &qrrr_e
@qr2r_e . q:1 ...... esz:2 . ..... ...... rm:5 rd:5 &qrrr_e rn=%rd
@@ -946,6 +947,9 @@ SQRDMULH_v 0.10 1110 ..1 ..... 10110 1 ..... ..... @qrrr_e
SQRDMLAH_v 0.10 1110 ..0 ..... 10000 1 ..... ..... @qrrr_e
SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
+SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
@@ -1020,6 +1024,9 @@ SQRDMLAH_vi 0.10 1111 10 .. .... 1101 . 0 ..... ..... @qrrx_s
SQRDMLSH_vi 0.10 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_h
SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 32c24c7422..f2e7d8d75c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5592,6 +5592,18 @@ TRANS(SQRDMULH_v, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmulh_qc)
TRANS_FEAT(SQRDMLAH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlah_qc)
TRANS_FEAT(SQRDMLSH_v, aa64_rdm, do_gvec_fn3_no8_no64, a, gen_gvec_sqrdmlsh_qc)
+static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
+ gen_helper_gvec_4 *fn)
+{
+ if (fp_access_check(s)) {
+ gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd, 0, fn);
+ }
+ return true;
+}
+
+TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
+TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5914,6 +5926,18 @@ static gen_helper_gvec_4 * const f_vector_idx_sqrdmlsh[2] = {
TRANS_FEAT(SQRDMLSH_vi, aa64_rdm, do_int3_qc_vector_idx, a,
f_vector_idx_sqrdmlsh)
+static bool do_dot_vector_idx(DisasContext *s, arg_qrrx_e *a,
+ gen_helper_gvec_4 *fn)
+{
+ if (fp_access_check(s)) {
+ gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd, a->idx, fn);
+ }
+ return true;
+}
+
+TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b)
+TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b)
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10890,14 +10914,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x02: /* SDOT (vector) */
- case 0x12: /* UDOT (vector) */
- if (size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_dp, s);
- break;
case 0x03: /* USDOT */
if (size != MO_32) {
unallocated_encoding(s);
@@ -10947,8 +10963,10 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
break;
default:
+ case 0x02: /* SDOT (vector) */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
+ case 0x12: /* UDOT (vector) */
unallocated_encoding(s);
return;
}
@@ -10961,11 +10979,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x2: /* SDOT / UDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0,
- u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b);
- return;
-
case 0x3: /* USDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
return;
@@ -12043,13 +12056,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x0e: /* SDOT */
- case 0x1e: /* UDOT */
- if (is_scalar || size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
- unallocated_encoding(s);
- return;
- }
- break;
case 0x0f:
switch (size) {
case 0: /* SUDOT */
@@ -12099,12 +12105,14 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x09: /* FMUL */
case 0x0c: /* SQDMULH */
case 0x0d: /* SQRDMULH */
+ case 0x0e: /* SDOT */
case 0x10: /* MLA */
case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
case 0x1d: /* SQRDMLAH */
+ case 0x1e: /* UDOT */
case 0x1f: /* SQRDMLSH */
unallocated_encoding(s);
return;
@@ -12180,12 +12188,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
switch (16 * u + opcode) {
- case 0x0e: /* SDOT */
- case 0x1e: /* UDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- u ? gen_helper_gvec_udot_idx_b
- : gen_helper_gvec_sdot_idx_b);
- return;
case 0x0f:
switch (extract32(insn, 22, 2)) {
case 0: /* SUDOT */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 06/13] target/arm: Convert SUDOT, USDOT to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (4 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 05/13] target/arm: Convert SDOT, UDOT " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 07/13] target/arm: Convert BFDOT " Richard Henderson
` (7 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 3 +++
target/arm/tcg/translate-a64.c | 35 ++++++++--------------------------
2 files changed, 11 insertions(+), 27 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7411d4ba97..8a0251f83c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -949,6 +949,7 @@ SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
+USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
### Advanced SIMD scalar x indexed element
@@ -1026,6 +1027,8 @@ SQRDMLSH_vi 0.10 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
+SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
+USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index f2e7d8d75c..9a658ca876 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5603,6 +5603,7 @@ static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
+TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5937,6 +5938,10 @@ static bool do_dot_vector_idx(DisasContext *s, arg_qrrx_e *a,
TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b)
TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b)
+TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
+ gen_helper_gvec_sudot_idx_b)
+TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
+ gen_helper_gvec_usdot_idx_b)
/*
* Advanced SIMD scalar pairwise
@@ -10914,13 +10919,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x03: /* USDOT */
- if (size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_i8mm, s);
- break;
case 0x04: /* SMMLA */
case 0x14: /* UMMLA */
case 0x05: /* USMMLA */
@@ -10964,6 +10962,7 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
break;
default:
case 0x02: /* SDOT (vector) */
+ case 0x03: /* USDOT */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
@@ -10979,10 +10978,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x3: /* USDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
- return;
-
case 0x04: /* SMMLA, UMMLA */
gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0,
u ? gen_helper_gvec_ummla_b
@@ -12058,14 +12053,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
case 0x0f:
switch (size) {
- case 0: /* SUDOT */
- case 2: /* USDOT */
- if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
- unallocated_encoding(s);
- return;
- }
- size = MO_32;
- break;
case 1: /* BFDOT */
if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
unallocated_encoding(s);
@@ -12082,6 +12069,8 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
size = MO_16;
break;
default:
+ case 0: /* SUDOT */
+ case 2: /* USDOT */
unallocated_encoding(s);
return;
}
@@ -12190,18 +12179,10 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
switch (16 * u + opcode) {
case 0x0f:
switch (extract32(insn, 22, 2)) {
- case 0: /* SUDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_sudot_idx_b);
- return;
case 1: /* BFDOT */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
gen_helper_gvec_bfdot_idx);
return;
- case 2: /* USDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_usdot_idx_b);
- return;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
gen_helper_gvec_bfmlal_idx);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 07/13] target/arm: Convert BFDOT to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (5 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 06/13] target/arm: Convert SUDOT, USDOT " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 08/13] target/arm: Convert BFMLALB, BFMLALT " Richard Henderson
` (6 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 ++
target/arm/tcg/translate-a64.c | 20 +++++---------------
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 8a0251f83c..6819fd2587 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -950,6 +950,7 @@ SQRDMLSH_v 0.10 1110 ..0 ..... 10001 1 ..... ..... @qrrr_e
SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
+BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
### Advanced SIMD scalar x indexed element
@@ -1029,6 +1030,7 @@ SDOT_vi 0.00 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
+BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9a658ca876..0f44cd5aee 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5604,6 +5604,7 @@ static bool do_dot_vector(DisasContext *s, arg_qrrr_e *a,
TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
+TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
/*
* Advanced SIMD scalar/vector x indexed element
@@ -5942,6 +5943,8 @@ TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
gen_helper_gvec_sudot_idx_b)
TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
gen_helper_gvec_usdot_idx_b)
+TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx, a,
+ gen_helper_gvec_bfdot_idx)
/*
* Advanced SIMD scalar pairwise
@@ -10951,11 +10954,11 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
break;
case 0x1f:
switch (size) {
- case 1: /* BFDOT */
case 3: /* BFMLAL{B,T} */
feature = dc_isar_feature(aa64_bf16, s);
break;
default:
+ case 1: /* BFDOT */
unallocated_encoding(s);
return;
}
@@ -11036,9 +11039,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
return;
case 0xf:
switch (size) {
- case 1: /* BFDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
- break;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
gen_helper_gvec_bfmlal);
@@ -12053,13 +12053,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
case 0x0f:
switch (size) {
- case 1: /* BFDOT */
- if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
- unallocated_encoding(s);
- return;
- }
- size = MO_32;
- break;
case 3: /* BFMLAL{B,T} */
if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
unallocated_encoding(s);
@@ -12070,6 +12063,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
break;
default:
case 0: /* SUDOT */
+ case 1: /* BFDOT */
case 2: /* USDOT */
unallocated_encoding(s);
return;
@@ -12179,10 +12173,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
switch (16 * u + opcode) {
case 0x0f:
switch (extract32(insn, 22, 2)) {
- case 1: /* BFDOT */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
- gen_helper_gvec_bfdot_idx);
- return;
case 3: /* BFMLAL{B,T} */
gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
gen_helper_gvec_bfmlal_idx);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 08/13] target/arm: Convert BFMLALB, BFMLALT to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (6 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 07/13] target/arm: Convert BFDOT " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA " Richard Henderson
` (5 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 2 +
target/arm/tcg/translate-a64.c | 77 +++++++++++++---------------------
2 files changed, 31 insertions(+), 48 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6819fd2587..15344a73de 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -951,6 +951,7 @@ SDOT_v 0.00 1110 100 ..... 10010 1 ..... ..... @qrrr_s
UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
+BFMLAL_v 0.10 1110 110 ..... 11111 1 ..... ..... @qrrr_h
### Advanced SIMD scalar x indexed element
@@ -1031,6 +1032,7 @@ UDOT_vi 0.10 1111 10 .. .... 1110 . 0 ..... ..... @qrrx_s
SUDOT_vi 0.00 1111 00 .. .... 1111 . 0 ..... ..... @qrrx_s
USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
+BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
# Floating-point conditional select
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 0f44cd5aee..95be862dde 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5606,6 +5606,19 @@ TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
+static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
+{
+ if (!dc_isar_feature(aa64_bf16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ /* Q bit selects BFMLALB vs BFMLALT. */
+ gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, false, a->q,
+ gen_helper_gvec_bfmlal);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5946,6 +5959,20 @@ TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a,
TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx, a,
gen_helper_gvec_bfdot_idx)
+static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
+{
+ if (!dc_isar_feature(aa64_bf16, s)) {
+ return false;
+ }
+ if (fp_access_check(s)) {
+ /* Q bit selects BFMLALB vs BFMLALT. */
+ gen_gvec_op4_fpst(s, true, a->rd, a->rn, a->rm, a->rd, 0,
+ (a->idx << 1) | a->q,
+ gen_helper_gvec_bfmlal_idx);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10952,23 +10979,13 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_bf16, s);
break;
- case 0x1f:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- feature = dc_isar_feature(aa64_bf16, s);
- break;
- default:
- case 1: /* BFDOT */
- unallocated_encoding(s);
- return;
- }
- break;
default:
case 0x02: /* SDOT (vector) */
case 0x03: /* USDOT */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
+ case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
}
@@ -11037,17 +11054,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0xd: /* BFMMLA */
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
return;
- case 0xf:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
- gen_helper_gvec_bfmlal);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
default:
g_assert_not_reached();
}
@@ -12051,24 +12057,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x0f:
- switch (size) {
- case 3: /* BFMLAL{B,T} */
- if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
- unallocated_encoding(s);
- return;
- }
- /* can't set is_fp without other incorrect size checks */
- size = MO_16;
- break;
- default:
- case 0: /* SUDOT */
- case 1: /* BFDOT */
- case 2: /* USDOT */
- unallocated_encoding(s);
- return;
- }
- break;
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
@@ -12089,6 +12077,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0c: /* SQDMULH */
case 0x0d: /* SQRDMULH */
case 0x0e: /* SDOT */
+ case 0x0f: /* SUDOT / BFDOT / USDOT / BFMLAL */
case 0x10: /* MLA */
case 0x14: /* MLS */
case 0x18: /* FMLAL2 */
@@ -12171,14 +12160,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
switch (16 * u + opcode) {
- case 0x0f:
- switch (extract32(insn, 22, 2)) {
- case 3: /* BFMLAL{B,T} */
- gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
- gen_helper_gvec_bfmlal_idx);
- return;
- }
- g_assert_not_reached();
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (7 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 08/13] target/arm: Convert BFMLALB, BFMLALT " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 10/13] target/arm: Add data argument to do_fp3_vector Richard Henderson
` (4 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 4 ++++
target/arm/tcg/translate-a64.c | 36 ++++++++--------------------------
2 files changed, 12 insertions(+), 28 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 15344a73de..b2c7e36969 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -952,6 +952,10 @@ UDOT_v 0.10 1110 100 ..... 10010 1 ..... ..... @qrrr_s
USDOT_v 0.00 1110 100 ..... 10011 1 ..... ..... @qrrr_s
BFDOT_v 0.10 1110 010 ..... 11111 1 ..... ..... @qrrr_s
BFMLAL_v 0.10 1110 110 ..... 11111 1 ..... ..... @qrrr_h
+BFMMLA 0110 1110 010 ..... 11101 1 ..... ..... @rrr_q1e0
+SMMLA 0100 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
+UMMLA 0110 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
+USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
### Advanced SIMD scalar x indexed element
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 95be862dde..2697c4b305 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5605,6 +5605,10 @@ TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b)
TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b)
TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b)
TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfdot)
+TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector, a, gen_helper_gvec_bfmmla)
+TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b)
+TRANS_FEAT(UMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_ummla_b)
+TRANS_FEAT(USMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usmmla_b)
static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
{
@@ -10949,15 +10953,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int rot;
switch (u * 16 + opcode) {
- case 0x04: /* SMMLA */
- case 0x14: /* UMMLA */
- case 0x05: /* USMMLA */
- if (!is_q || size != MO_32) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_i8mm, s);
- break;
case 0x18: /* FCMLA, #0 */
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
@@ -10972,19 +10967,16 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_fcma, s);
break;
- case 0x1d: /* BFMMLA */
- if (size != MO_16 || !is_q) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_bf16, s);
- break;
default:
case 0x02: /* SDOT (vector) */
case 0x03: /* USDOT */
+ case 0x04: /* SMMLA */
+ case 0x05: /* USMMLA */
case 0x10: /* SQRDMLAH (vector) */
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
+ case 0x14: /* UMMLA */
+ case 0x1d: /* BFMMLA */
case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
@@ -10998,15 +10990,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
switch (opcode) {
- case 0x04: /* SMMLA, UMMLA */
- gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0,
- u ? gen_helper_gvec_ummla_b
- : gen_helper_gvec_smmla_b);
- return;
- case 0x05: /* USMMLA */
- gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0, gen_helper_gvec_usmmla_b);
- return;
-
case 0x8: /* FCMLA, #0 */
case 0x9: /* FCMLA, #90 */
case 0xa: /* FCMLA, #180 */
@@ -11051,9 +11034,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
return;
- case 0xd: /* BFMMLA */
- gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
- return;
default:
g_assert_not_reached();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 10/13] target/arm: Add data argument to do_fp3_vector
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (8 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 09/13] target/arm: Convert BFMMLA, SMMLA, UMMLA, USMMLA " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 11/13] target/arm: Convert FCADD to decodetree Richard Henderson
` (3 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 52 +++++++++++++++++-----------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2697c4b305..57cdde008e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5290,7 +5290,7 @@ TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
-static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
+static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data,
gen_helper_gvec_3_ptr * const fns[3])
{
MemOp esz = a->esz;
@@ -5313,7 +5313,7 @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
}
if (fp_access_check(s)) {
gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm,
- esz == MO_16, 0, fns[esz - 1]);
+ esz == MO_16, data, fns[esz - 1]);
}
return true;
}
@@ -5323,168 +5323,168 @@ static gen_helper_gvec_3_ptr * const f_vector_fadd[3] = {
gen_helper_gvec_fadd_s,
gen_helper_gvec_fadd_d,
};
-TRANS(FADD_v, do_fp3_vector, a, f_vector_fadd)
+TRANS(FADD_v, do_fp3_vector, a, 0, f_vector_fadd)
static gen_helper_gvec_3_ptr * const f_vector_fsub[3] = {
gen_helper_gvec_fsub_h,
gen_helper_gvec_fsub_s,
gen_helper_gvec_fsub_d,
};
-TRANS(FSUB_v, do_fp3_vector, a, f_vector_fsub)
+TRANS(FSUB_v, do_fp3_vector, a, 0, f_vector_fsub)
static gen_helper_gvec_3_ptr * const f_vector_fdiv[3] = {
gen_helper_gvec_fdiv_h,
gen_helper_gvec_fdiv_s,
gen_helper_gvec_fdiv_d,
};
-TRANS(FDIV_v, do_fp3_vector, a, f_vector_fdiv)
+TRANS(FDIV_v, do_fp3_vector, a, 0, f_vector_fdiv)
static gen_helper_gvec_3_ptr * const f_vector_fmul[3] = {
gen_helper_gvec_fmul_h,
gen_helper_gvec_fmul_s,
gen_helper_gvec_fmul_d,
};
-TRANS(FMUL_v, do_fp3_vector, a, f_vector_fmul)
+TRANS(FMUL_v, do_fp3_vector, a, 0, f_vector_fmul)
static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
gen_helper_gvec_fmax_h,
gen_helper_gvec_fmax_s,
gen_helper_gvec_fmax_d,
};
-TRANS(FMAX_v, do_fp3_vector, a, f_vector_fmax)
+TRANS(FMAX_v, do_fp3_vector, a, 0, f_vector_fmax)
static gen_helper_gvec_3_ptr * const f_vector_fmin[3] = {
gen_helper_gvec_fmin_h,
gen_helper_gvec_fmin_s,
gen_helper_gvec_fmin_d,
};
-TRANS(FMIN_v, do_fp3_vector, a, f_vector_fmin)
+TRANS(FMIN_v, do_fp3_vector, a, 0, f_vector_fmin)
static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[3] = {
gen_helper_gvec_fmaxnum_h,
gen_helper_gvec_fmaxnum_s,
gen_helper_gvec_fmaxnum_d,
};
-TRANS(FMAXNM_v, do_fp3_vector, a, f_vector_fmaxnm)
+TRANS(FMAXNM_v, do_fp3_vector, a, 0, f_vector_fmaxnm)
static gen_helper_gvec_3_ptr * const f_vector_fminnm[3] = {
gen_helper_gvec_fminnum_h,
gen_helper_gvec_fminnum_s,
gen_helper_gvec_fminnum_d,
};
-TRANS(FMINNM_v, do_fp3_vector, a, f_vector_fminnm)
+TRANS(FMINNM_v, do_fp3_vector, a, 0, f_vector_fminnm)
static gen_helper_gvec_3_ptr * const f_vector_fmulx[3] = {
gen_helper_gvec_fmulx_h,
gen_helper_gvec_fmulx_s,
gen_helper_gvec_fmulx_d,
};
-TRANS(FMULX_v, do_fp3_vector, a, f_vector_fmulx)
+TRANS(FMULX_v, do_fp3_vector, a, 0, f_vector_fmulx)
static gen_helper_gvec_3_ptr * const f_vector_fmla[3] = {
gen_helper_gvec_vfma_h,
gen_helper_gvec_vfma_s,
gen_helper_gvec_vfma_d,
};
-TRANS(FMLA_v, do_fp3_vector, a, f_vector_fmla)
+TRANS(FMLA_v, do_fp3_vector, a, 0, f_vector_fmla)
static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
gen_helper_gvec_vfms_h,
gen_helper_gvec_vfms_s,
gen_helper_gvec_vfms_d,
};
-TRANS(FMLS_v, do_fp3_vector, a, f_vector_fmls)
+TRANS(FMLS_v, do_fp3_vector, a, 0, f_vector_fmls)
static gen_helper_gvec_3_ptr * const f_vector_fcmeq[3] = {
gen_helper_gvec_fceq_h,
gen_helper_gvec_fceq_s,
gen_helper_gvec_fceq_d,
};
-TRANS(FCMEQ_v, do_fp3_vector, a, f_vector_fcmeq)
+TRANS(FCMEQ_v, do_fp3_vector, a, 0, f_vector_fcmeq)
static gen_helper_gvec_3_ptr * const f_vector_fcmge[3] = {
gen_helper_gvec_fcge_h,
gen_helper_gvec_fcge_s,
gen_helper_gvec_fcge_d,
};
-TRANS(FCMGE_v, do_fp3_vector, a, f_vector_fcmge)
+TRANS(FCMGE_v, do_fp3_vector, a, 0, f_vector_fcmge)
static gen_helper_gvec_3_ptr * const f_vector_fcmgt[3] = {
gen_helper_gvec_fcgt_h,
gen_helper_gvec_fcgt_s,
gen_helper_gvec_fcgt_d,
};
-TRANS(FCMGT_v, do_fp3_vector, a, f_vector_fcmgt)
+TRANS(FCMGT_v, do_fp3_vector, a, 0, f_vector_fcmgt)
static gen_helper_gvec_3_ptr * const f_vector_facge[3] = {
gen_helper_gvec_facge_h,
gen_helper_gvec_facge_s,
gen_helper_gvec_facge_d,
};
-TRANS(FACGE_v, do_fp3_vector, a, f_vector_facge)
+TRANS(FACGE_v, do_fp3_vector, a, 0, f_vector_facge)
static gen_helper_gvec_3_ptr * const f_vector_facgt[3] = {
gen_helper_gvec_facgt_h,
gen_helper_gvec_facgt_s,
gen_helper_gvec_facgt_d,
};
-TRANS(FACGT_v, do_fp3_vector, a, f_vector_facgt)
+TRANS(FACGT_v, do_fp3_vector, a, 0, f_vector_facgt)
static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
gen_helper_gvec_fabd_h,
gen_helper_gvec_fabd_s,
gen_helper_gvec_fabd_d,
};
-TRANS(FABD_v, do_fp3_vector, a, f_vector_fabd)
+TRANS(FABD_v, do_fp3_vector, a, 0, f_vector_fabd)
static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
gen_helper_gvec_recps_h,
gen_helper_gvec_recps_s,
gen_helper_gvec_recps_d,
};
-TRANS(FRECPS_v, do_fp3_vector, a, f_vector_frecps)
+TRANS(FRECPS_v, do_fp3_vector, a, 0, f_vector_frecps)
static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
gen_helper_gvec_rsqrts_h,
gen_helper_gvec_rsqrts_s,
gen_helper_gvec_rsqrts_d,
};
-TRANS(FRSQRTS_v, do_fp3_vector, a, f_vector_frsqrts)
+TRANS(FRSQRTS_v, do_fp3_vector, a, 0, f_vector_frsqrts)
static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
gen_helper_gvec_faddp_h,
gen_helper_gvec_faddp_s,
gen_helper_gvec_faddp_d,
};
-TRANS(FADDP_v, do_fp3_vector, a, f_vector_faddp)
+TRANS(FADDP_v, do_fp3_vector, a, 0, f_vector_faddp)
static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
gen_helper_gvec_fmaxp_h,
gen_helper_gvec_fmaxp_s,
gen_helper_gvec_fmaxp_d,
};
-TRANS(FMAXP_v, do_fp3_vector, a, f_vector_fmaxp)
+TRANS(FMAXP_v, do_fp3_vector, a, 0, f_vector_fmaxp)
static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
gen_helper_gvec_fminp_h,
gen_helper_gvec_fminp_s,
gen_helper_gvec_fminp_d,
};
-TRANS(FMINP_v, do_fp3_vector, a, f_vector_fminp)
+TRANS(FMINP_v, do_fp3_vector, a, 0, f_vector_fminp)
static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
gen_helper_gvec_fmaxnump_h,
gen_helper_gvec_fmaxnump_s,
gen_helper_gvec_fmaxnump_d,
};
-TRANS(FMAXNMP_v, do_fp3_vector, a, f_vector_fmaxnmp)
+TRANS(FMAXNMP_v, do_fp3_vector, a, 0, f_vector_fmaxnmp)
static gen_helper_gvec_3_ptr * const f_vector_fminnmp[3] = {
gen_helper_gvec_fminnump_h,
gen_helper_gvec_fminnump_s,
gen_helper_gvec_fminnump_d,
};
-TRANS(FMINNMP_v, do_fp3_vector, a, f_vector_fminnmp)
+TRANS(FMINNMP_v, do_fp3_vector, a, 0, f_vector_fminnmp)
static bool do_fmlal(DisasContext *s, arg_qrrr_e *a, bool is_s, bool is_2)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 11/13] target/arm: Convert FCADD to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (9 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 10/13] target/arm: Add data argument to do_fp3_vector Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 12/13] target/arm: Convert FCMLA " Richard Henderson
` (2 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, Peter Maydell
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 3 +++
target/arm/tcg/translate-a64.c | 33 ++++++++++-----------------------
2 files changed, 13 insertions(+), 23 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index b2c7e36969..f330919851 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -957,6 +957,9 @@ SMMLA 0100 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
UMMLA 0110 1110 100 ..... 10100 1 ..... ..... @rrr_q1e0
USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
+FCADD_90 0.10 1110 ..0 ..... 11100 1 ..... ..... @qrrr_e
+FCADD_270 0.10 1110 ..0 ..... 11110 1 ..... ..... @qrrr_e
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 57cdde008e..a1b338263f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5623,6 +5623,14 @@ static bool trans_BFMLAL_v(DisasContext *s, arg_qrrr_e *a)
return true;
}
+static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
+ gen_helper_gvec_fcaddh,
+ gen_helper_gvec_fcadds,
+ gen_helper_gvec_fcaddd,
+};
+TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
+TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -10957,8 +10965,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
case 0x1b: /* FCMLA, #270 */
- case 0x1c: /* FCADD, #90 */
- case 0x1e: /* FCADD, #270 */
if (size == 0
|| (size == 1 && !dc_isar_feature(aa64_fp16, s))
|| (size == 3 && !is_q)) {
@@ -10976,7 +10982,9 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
case 0x11: /* SQRDMLSH (vector) */
case 0x12: /* UDOT (vector) */
case 0x14: /* UMMLA */
+ case 0x1c: /* FCADD, #90 */
case 0x1d: /* BFMMLA */
+ case 0x1e: /* FCADD, #270 */
case 0x1f: /* BFDOT / BFMLAL */
unallocated_encoding(s);
return;
@@ -11013,27 +11021,6 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
return;
- case 0xc: /* FCADD, #90 */
- case 0xe: /* FCADD, #270 */
- rot = extract32(opcode, 1, 1);
- switch (size) {
- case 1:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcaddh);
- break;
- case 2:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcadds);
- break;
- case 3:
- gen_gvec_op3_fpst(s, is_q, rd, rn, rm, size == 1, rot,
- gen_helper_gvec_fcaddd);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
default:
g_assert_not_reached();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 12/13] target/arm: Convert FCMLA to decodetree
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (10 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 11/13] target/arm: Convert FCADD to decodetree Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-25 18:35 ` [PATCH v2 13/13] target/arm: Delete dead code from disas_simd_indexed Richard Henderson
2024-06-28 14:40 ` [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Peter Maydell
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/a64.decode | 6 +
target/arm/tcg/translate-a64.c | 238 ++++++++++-----------------------
2 files changed, 74 insertions(+), 170 deletions(-)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f330919851..223eac3cac 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -960,6 +960,8 @@ USMMLA 0100 1110 100 ..... 10101 1 ..... ..... @rrr_q1e0
FCADD_90 0.10 1110 ..0 ..... 11100 1 ..... ..... @qrrr_e
FCADD_270 0.10 1110 ..0 ..... 11110 1 ..... ..... @qrrr_e
+FCMLA_v 0 q:1 10 1110 esz:2 0 rm:5 110 rot:2 1 rn:5 rd:5
+
### Advanced SIMD scalar x indexed element
FMUL_si 0101 1111 00 .. .... 1001 . 0 ..... ..... @rrx_h
@@ -1041,6 +1043,10 @@ USDOT_vi 0.00 1111 10 .. .... 1111 . 0 ..... ..... @qrrx_s
BFDOT_vi 0.00 1111 01 .. .... 1111 . 0 ..... ..... @qrrx_s
BFMLAL_vi 0.00 1111 11 .. .... 1111 . 0 ..... ..... @qrrx_h
+FCMLA_vi 0 0 10 1111 01 idx:1 rm:5 0 rot:2 1 0 0 rn:5 rd:5 esz=1 q=0
+FCMLA_vi 0 1 10 1111 01 . rm:5 0 rot:2 1 . 0 rn:5 rd:5 esz=1 idx=%hl q=1
+FCMLA_vi 0 1 10 1111 10 0 rm:5 0 rot:2 1 idx:1 0 rn:5 rd:5 esz=2 q=1
+
# Floating-point conditional select
FCSEL 0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a1b338263f..161fa2659c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5631,6 +5631,39 @@ static gen_helper_gvec_3_ptr * const f_vector_fcadd[3] = {
TRANS_FEAT(FCADD_90, aa64_fcma, do_fp3_vector, a, 0, f_vector_fcadd)
TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd)
+static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
+{
+ gen_helper_gvec_4_ptr *fn;
+
+ if (!dc_isar_feature(aa64_fcma, s)) {
+ return false;
+ }
+ switch (a->esz) {
+ case MO_64:
+ if (!a->q) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlad;
+ break;
+ case MO_32:
+ fn = gen_helper_gvec_fcmlas;
+ break;
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlah;
+ break;
+ default:
+ return false;
+ }
+ if (fp_access_check(s)) {
+ gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->esz == MO_16, a->rot, fn);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar/vector x indexed element
*/
@@ -5985,6 +6018,33 @@ static bool trans_BFMLAL_vi(DisasContext *s, arg_qrrx_e *a)
return true;
}
+static bool trans_FCMLA_vi(DisasContext *s, arg_FCMLA_vi *a)
+{
+ gen_helper_gvec_4_ptr *fn;
+
+ if (!dc_isar_feature(aa64_fcma, s)) {
+ return false;
+ }
+ switch (a->esz) {
+ case MO_16:
+ if (!dc_isar_feature(aa64_fp16, s)) {
+ return false;
+ }
+ fn = gen_helper_gvec_fcmlah_idx;
+ break;
+ case MO_32:
+ fn = gen_helper_gvec_fcmlas_idx;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ if (fp_access_check(s)) {
+ gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->esz == MO_16, (a->idx << 2) | a->rot, fn);
+ }
+ return true;
+}
+
/*
* Advanced SIMD scalar pairwise
*/
@@ -10942,90 +11002,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
-/* AdvSIMD three same extra
- * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
- * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
- * | 0 | Q | U | 0 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
- * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
- */
-static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-{
- int rd = extract32(insn, 0, 5);
- int rn = extract32(insn, 5, 5);
- int opcode = extract32(insn, 11, 4);
- int rm = extract32(insn, 16, 5);
- int size = extract32(insn, 22, 2);
- bool u = extract32(insn, 29, 1);
- bool is_q = extract32(insn, 30, 1);
- bool feature;
- int rot;
-
- switch (u * 16 + opcode) {
- case 0x18: /* FCMLA, #0 */
- case 0x19: /* FCMLA, #90 */
- case 0x1a: /* FCMLA, #180 */
- case 0x1b: /* FCMLA, #270 */
- if (size == 0
- || (size == 1 && !dc_isar_feature(aa64_fp16, s))
- || (size == 3 && !is_q)) {
- unallocated_encoding(s);
- return;
- }
- feature = dc_isar_feature(aa64_fcma, s);
- break;
- default:
- case 0x02: /* SDOT (vector) */
- case 0x03: /* USDOT */
- case 0x04: /* SMMLA */
- case 0x05: /* USMMLA */
- case 0x10: /* SQRDMLAH (vector) */
- case 0x11: /* SQRDMLSH (vector) */
- case 0x12: /* UDOT (vector) */
- case 0x14: /* UMMLA */
- case 0x1c: /* FCADD, #90 */
- case 0x1d: /* BFMMLA */
- case 0x1e: /* FCADD, #270 */
- case 0x1f: /* BFDOT / BFMLAL */
- unallocated_encoding(s);
- return;
- }
- if (!feature) {
- unallocated_encoding(s);
- return;
- }
- if (!fp_access_check(s)) {
- return;
- }
-
- switch (opcode) {
- case 0x8: /* FCMLA, #0 */
- case 0x9: /* FCMLA, #90 */
- case 0xa: /* FCMLA, #180 */
- case 0xb: /* FCMLA, #270 */
- rot = extract32(opcode, 0, 2);
- switch (size) {
- case 1:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, true, rot,
- gen_helper_gvec_fcmlah);
- break;
- case 2:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot,
- gen_helper_gvec_fcmlas);
- break;
- case 3:
- gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot,
- gen_helper_gvec_fcmlad);
- break;
- default:
- g_assert_not_reached();
- }
- return;
-
- default:
- g_assert_not_reached();
- }
-}
-
static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q,
int size, int rn, int rd)
{
@@ -12001,10 +11977,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
bool is_long = false;
- int is_fp = 0;
- bool is_fp16 = false;
int index;
- TCGv_ptr fpst;
switch (16 * u + opcode) {
case 0x02: /* SMLAL, SMLAL2 */
@@ -12024,16 +11997,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
break;
- case 0x11: /* FCMLA #0 */
- case 0x13: /* FCMLA #90 */
- case 0x15: /* FCMLA #180 */
- case 0x17: /* FCMLA #270 */
- if (is_scalar || !dc_isar_feature(aa64_fcma, s)) {
- unallocated_encoding(s);
- return;
- }
- is_fp = 2;
- break;
default:
case 0x00: /* FMLAL */
case 0x01: /* FMLA */
@@ -12046,7 +12009,11 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
case 0x0e: /* SDOT */
case 0x0f: /* SUDOT / BFDOT / USDOT / BFMLAL */
case 0x10: /* MLA */
+ case 0x11: /* FCMLA #0 */
+ case 0x13: /* FCMLA #90 */
case 0x14: /* MLS */
+ case 0x15: /* FCMLA #180 */
+ case 0x17: /* FCMLA #270 */
case 0x18: /* FMLAL2 */
case 0x19: /* FMULX */
case 0x1c: /* FMLSL2 */
@@ -12057,46 +12024,12 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
return;
}
- switch (is_fp) {
- case 1: /* normal fp */
- unallocated_encoding(s); /* in decodetree */
- return;
-
- case 2: /* complex fp */
- /* Each indexable element is a complex pair. */
- size += 1;
- switch (size) {
- case MO_32:
- if (h && !is_q) {
- unallocated_encoding(s);
- return;
- }
- is_fp16 = true;
- break;
- case MO_64:
- break;
- default:
- unallocated_encoding(s);
- return;
- }
- break;
-
- default: /* integer */
- switch (size) {
- case MO_8:
- case MO_64:
- unallocated_encoding(s);
- return;
- }
- break;
- }
- if (is_fp16 && !dc_isar_feature(aa64_fp16, s)) {
- unallocated_encoding(s);
- return;
- }
-
/* Given MemOp size, adjust register and indexing. */
switch (size) {
+ case MO_8:
+ case MO_64:
+ unallocated_encoding(s);
+ return;
case MO_16:
index = h << 2 | l << 1 | m;
break;
@@ -12104,14 +12037,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
index = h << 1 | l;
rm |= m << 4;
break;
- case MO_64:
- if (l || !is_q) {
- unallocated_encoding(s);
- return;
- }
- index = h;
- rm |= m << 4;
- break;
default:
g_assert_not_reached();
}
@@ -12120,32 +12045,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
return;
}
- if (is_fp) {
- fpst = fpstatus_ptr(is_fp16 ? FPST_FPCR_F16 : FPST_FPCR);
- } else {
- fpst = NULL;
- }
-
- switch (16 * u + opcode) {
- case 0x11: /* FCMLA #0 */
- case 0x13: /* FCMLA #90 */
- case 0x15: /* FCMLA #180 */
- case 0x17: /* FCMLA #270 */
- {
- int rot = extract32(insn, 13, 2);
- int data = (index << 2) | rot;
- tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm),
- vec_full_reg_offset(s, rd), fpst,
- is_q ? 16 : 8, vec_full_reg_size(s), data,
- size == MO_64
- ? gen_helper_gvec_fcmlas_idx
- : gen_helper_gvec_fcmlah_idx);
- }
- return;
- }
-
if (size == 3) {
g_assert_not_reached();
} else if (!is_long) {
@@ -12407,7 +12306,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
*/
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
- { 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
{ 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
{ 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 13/13] target/arm: Delete dead code from disas_simd_indexed
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (11 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 12/13] target/arm: Convert FCMLA " Richard Henderson
@ 2024-06-25 18:35 ` Richard Henderson
2024-06-28 14:40 ` [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Peter Maydell
13 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2024-06-25 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
MLA, MLS, SQDMULH, SQRDMULH, were converted with 8db93dcd3def
and f80701cb44d, and this code should have been removed then.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 93 ----------------------------------
1 file changed, 93 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 161fa2659c..6c07aeaf3b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -11976,7 +11976,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
int h = extract32(insn, 11, 1);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- bool is_long = false;
int index;
switch (16 * u + opcode) {
@@ -11990,12 +11989,10 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
unallocated_encoding(s);
return;
}
- is_long = true;
break;
case 0x03: /* SQDMLAL, SQDMLAL2 */
case 0x07: /* SQDMLSL, SQDMLSL2 */
case 0x0b: /* SQDMULL, SQDMULL2 */
- is_long = true;
break;
default:
case 0x00: /* FMLAL */
@@ -12047,96 +12044,6 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
if (size == 3) {
g_assert_not_reached();
- } else if (!is_long) {
- /* 32 bit floating point, or 16 or 32 bit integer.
- * For the 16 bit scalar case we use the usual Neon helpers and
- * rely on the fact that 0 op 0 == 0 with no side effects.
- */
- TCGv_i32 tcg_idx = tcg_temp_new_i32();
- int pass, maxpasses;
-
- if (is_scalar) {
- maxpasses = 1;
- } else {
- maxpasses = is_q ? 4 : 2;
- }
-
- read_vec_element_i32(s, tcg_idx, rm, index, size);
-
- if (size == 1 && !is_scalar) {
- /* The simplest way to handle the 16x16 indexed ops is to duplicate
- * the index into both halves of the 32 bit tcg_idx and then use
- * the usual Neon helpers.
- */
- tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
- }
-
- for (pass = 0; pass < maxpasses; pass++) {
- TCGv_i32 tcg_op = tcg_temp_new_i32();
- TCGv_i32 tcg_res = tcg_temp_new_i32();
-
- read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
-
- switch (16 * u + opcode) {
- case 0x10: /* MLA */
- case 0x14: /* MLS */
- {
- static NeonGenTwoOpFn * const fns[2][2] = {
- { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
- { tcg_gen_add_i32, tcg_gen_sub_i32 },
- };
- NeonGenTwoOpFn *genfn;
- bool is_sub = opcode == 0x4;
-
- if (size == 1) {
- gen_helper_neon_mul_u16(tcg_res, tcg_op, tcg_idx);
- } else {
- tcg_gen_mul_i32(tcg_res, tcg_op, tcg_idx);
- }
- if (opcode == 0x8) {
- break;
- }
- read_vec_element_i32(s, tcg_op, rd, pass, MO_32);
- genfn = fns[size - 1][is_sub];
- genfn(tcg_res, tcg_op, tcg_res);
- break;
- }
- case 0x0c: /* SQDMULH */
- if (size == 1) {
- gen_helper_neon_qdmulh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- } else {
- gen_helper_neon_qdmulh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- }
- break;
- case 0x0d: /* SQRDMULH */
- if (size == 1) {
- gen_helper_neon_qrdmulh_s16(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- } else {
- gen_helper_neon_qrdmulh_s32(tcg_res, tcg_env,
- tcg_op, tcg_idx);
- }
- break;
- default:
- case 0x01: /* FMLA */
- case 0x05: /* FMLS */
- case 0x09: /* FMUL */
- case 0x19: /* FMULX */
- case 0x1d: /* SQRDMLAH */
- case 0x1f: /* SQRDMLSH */
- g_assert_not_reached();
- }
-
- if (is_scalar) {
- write_fp_sreg(s, rd, tcg_res);
- } else {
- write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
- }
- }
-
- clear_vec_high(s, is_q, rd);
} else {
/* long ops: 16x16->32 or 32x32->64 */
TCGv_i64 tcg_res[2];
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2
2024-06-25 18:35 [PATCH v2 00/13] target/arm: AdvSIMD conversion, part 2 Richard Henderson
` (12 preceding siblings ...)
2024-06-25 18:35 ` [PATCH v2 13/13] target/arm: Delete dead code from disas_simd_indexed Richard Henderson
@ 2024-06-28 14:40 ` Peter Maydell
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2024-06-28 14:40 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Tue, 25 Jun 2024 at 19:41, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Convert another hand-full of instructions, plus fixes
> for two issues that are related.
>
>
Applied to target-arm.next, thanks.
-- PMM
^ permalink raw reply [flat|nested] 17+ messages in thread