* [PATCH v3 00/10] target/arm: SME1/SVE2 fixes
@ 2025-07-02 12:22 Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
` (9 more replies)
0 siblings, 10 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
- A couple of fixes for EC_SMETRAP, plus some insns that missed
being updated for non-streaming.
(Removed the switch (bool) that PMM commented on, and as it
turns out our clang CI Werrors on; now perhaps clearer using
if + goto, where the label name helps clarify things.)
- Disable FEAT_F64MM if the command-line sve vector size cannot
support it (PMM)
- Fix a gvec assert in PSEL
- Fix NaN selection per FPDot pseudocode.
r~
Richard Henderson (10):
target/arm: Fix SME vs AdvSIMD exception priority
target/arm: Fix sve_access_check for SME
target/arm: Fix 128-bit element ZIP, UZP, TRN
target/arm: Replace @rda_rn_rm_e0 in sve.decode
target/arm: Fix FMMLA (64-bit element) for 128-bit VL
target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
target/arm: Fix f16_dotadd vs nan selection
target/arm: Fix bfdotadd_ebf vs nan selection
target/arm: Remove CPUARMState.vfp.scratch
target/arm/cpu.h | 3 --
target/arm/cpu64.c | 6 +++
target/arm/tcg/sme_helper.c | 62 ++++++++++++++++++++--------
target/arm/tcg/translate-a64.c | 29 +++++++++----
target/arm/tcg/translate-sve.c | 67 +++++++++++++++++++++---------
target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++----------
target/arm/tcg/sve.decode | 48 +++++++++++-----------
7 files changed, 197 insertions(+), 93 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
` (8 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
We failed to raise an exception when
sme_excp_el == 0 and fp_excp_el == 1.
Cc: qemu-stable@nongnu.org
Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index ac80f572a2..bb49a2ce90 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1494,7 +1494,8 @@ bool sme_enabled_check(DisasContext *s)
* to be zero when fp_excp_el has priority. This is because we need
* sme_excp_el by itself for cpregs access checks.
*/
- if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
+ if (s->sme_excp_el
+ && (!s->fp_excp_el || s->sme_excp_el <= s->fp_excp_el)) {
bool ret = sme_access_check(s);
s->fp_access_checked = (ret ? 1 : -1);
return ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 02/10] target/arm: Fix sve_access_check for SME
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
` (7 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
Do not assume SME implies SVE. Ensure that the non-streaming
check is present along the SME path, since it is not implied
by sme_*_enabled_check.
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-a64.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index bb49a2ce90..7f8671e2e8 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1387,11 +1387,8 @@ static bool fp_access_check_only(DisasContext *s)
return true;
}
-static bool fp_access_check(DisasContext *s)
+static bool nonstreaming_check(DisasContext *s)
{
- if (!fp_access_check_only(s)) {
- return false;
- }
if (s->sme_trap_nonstreaming && s->is_nonstreaming) {
gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_Streaming, false));
@@ -1400,6 +1397,11 @@ static bool fp_access_check(DisasContext *s)
return true;
}
+static bool fp_access_check(DisasContext *s)
+{
+ return fp_access_check_only(s) && nonstreaming_check(s);
+}
+
/*
* Return <0 for non-supported element sizes, with MO_16 controlled by
* FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success.
@@ -1450,14 +1452,24 @@ static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz)
*/
bool sve_access_check(DisasContext *s)
{
- if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) {
+ if (dc_isar_feature(aa64_sme, s)) {
bool ret;
- assert(dc_isar_feature(aa64_sme, s));
- ret = sme_sm_enabled_check(s);
+ if (s->pstate_sm) {
+ ret = sme_enabled_check(s);
+ } else if (dc_isar_feature(aa64_sve, s)) {
+ goto continue_sve;
+ } else {
+ ret = sme_sm_enabled_check(s);
+ }
+ if (ret) {
+ ret = nonstreaming_check(s);
+ }
s->sve_access_checked = (ret ? 1 : -1);
return ret;
}
+
+ continue_sve:
if (s->sve_excp_el) {
/* Assert that we only raise one exception per instruction. */
assert(!s->sve_access_checked);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-03 9:08 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
` (6 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
We missed the instructions UDEF when the vector size is too small.
We missed marking the instructions non-streaming with SME.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sve.c | 43 ++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index f3cf028cb9..588a5b006b 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -2352,6 +2352,23 @@ TRANS_FEAT(PUNPKHI, aa64_sve, do_perm_pred2, a, 1, gen_helper_sve_punpk_p)
*** SVE Permute - Interleaving Group
*/
+static bool do_interleave_q(DisasContext *s, gen_helper_gvec_3 *fn,
+ arg_rrr_esz *a, int data)
+{
+ if (sve_access_check(s)) {
+ unsigned vsz = vec_full_reg_size(s);
+ if (vsz < 32) {
+ unallocated_encoding(s);
+ } else {
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ vsz, vsz, data, fn);
+ }
+ }
+ return true;
+}
+
static gen_helper_gvec_3 * const zip_fns[4] = {
gen_helper_sve_zip_b, gen_helper_sve_zip_h,
gen_helper_sve_zip_s, gen_helper_sve_zip_d,
@@ -2361,11 +2378,11 @@ TRANS_FEAT(ZIP1_z, aa64_sve, gen_gvec_ool_arg_zzz,
TRANS_FEAT(ZIP2_z, aa64_sve, gen_gvec_ool_arg_zzz,
zip_fns[a->esz], a, vec_full_reg_size(s) / 2)
-TRANS_FEAT(ZIP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_zip_q, a, 0)
-TRANS_FEAT(ZIP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_zip_q, a,
- QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2)
+TRANS_FEAT_NONSTREAMING(ZIP1_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_zip_q, a, 0)
+TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_zip_q, a,
+ QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2)
static gen_helper_gvec_3 * const uzp_fns[4] = {
gen_helper_sve_uzp_b, gen_helper_sve_uzp_h,
@@ -2377,10 +2394,10 @@ TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz,
TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz,
uzp_fns[a->esz], a, 1 << a->esz)
-TRANS_FEAT(UZP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_uzp_q, a, 0)
-TRANS_FEAT(UZP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_uzp_q, a, 16)
+TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_uzp_q, a, 0)
+TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_uzp_q, a, 16)
static gen_helper_gvec_3 * const trn_fns[4] = {
gen_helper_sve_trn_b, gen_helper_sve_trn_h,
@@ -2392,10 +2409,10 @@ TRANS_FEAT(TRN1_z, aa64_sve, gen_gvec_ool_arg_zzz,
TRANS_FEAT(TRN2_z, aa64_sve, gen_gvec_ool_arg_zzz,
trn_fns[a->esz], a, 1 << a->esz)
-TRANS_FEAT(TRN1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_trn_q, a, 0)
-TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
- gen_helper_sve2_trn_q, a, 16)
+TRANS_FEAT_NONSTREAMING(TRN1_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_trn_q, a, 0)
+TRANS_FEAT_NONSTREAMING(TRN2_q, aa64_sve_f64mm, do_interleave_q,
+ gen_helper_sve2_trn_q, a, 16)
/*
*** SVE Permute Vector - Predicated Group
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (2 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-02 14:11 ` Richard Henderson
2025-07-03 9:14 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
` (5 subsequent siblings)
9 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
users to supply an explicit esz.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 04b6fcc0cf..3a99eb7299 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -131,11 +131,11 @@
@rda_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 \
&rrrr_esz ra=%reg_movprfx
-# Four operand with unused vector element size
-@rda_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 \
- &rrrr_esz esz=0 ra=%reg_movprfx
-@rdn_ra_rm_e0 ........ ... rm:5 ... ... ra:5 rd:5 \
- &rrrr_esz esz=0 rn=%reg_movprfx
+# Four operand with explicit vector element size
+@rda_rn_rm_ex ........ ... rm:5 ... ... rn:5 rd:5 \
+ &rrrr_esz ra=%reg_movprfx
+@rdn_ra_rm_ex ........ ... rm:5 ... ... ra:5 rd:5 \
+ &rrrr_esz rn=%reg_movprfx
# Three operand with "memory" size, aka immediate left shift
@rd_rn_msz_rm ........ ... rm:5 .... imm:2 rn:5 rd:5 &rrri
@@ -428,12 +428,12 @@ XAR 00000100 .. 1 ..... 001 101 rm:5 rd:5 &rrri_esz \
rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr
# SVE2 bitwise ternary operations
-EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0
-BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0
-BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0
-BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0
-BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0
-NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0
+EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0
+BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0
+BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0
+BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0
+BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0
+NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0
### SVE Index Generation Group
@@ -1450,9 +1450,9 @@ EORTB 01000101 .. 0 ..... 10010 1 ..... ..... @rd_rn_rm
## SVE integer matrix multiply accumulate
-SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
-USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
-UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
+SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2
+USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2
+UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2
## SVE2 bitwise permute
@@ -1602,9 +1602,9 @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx
USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm
### SVE2 floating point matrix multiply accumulate
-BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_e0
-FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_e0
-FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_e0
+BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1
+FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=2
+FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=3
### SVE2 Memory Gather Load Group
@@ -1654,16 +1654,16 @@ FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0
FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz
### SVE2 floating-point multiply-add long (vectors)
-FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0
-FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0
-FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_e0
-FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_e0
+FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2
+FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2
+FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2
+FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2
-BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0
-BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0
+BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2
+BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2
### SVE2 floating-point bfloat16 dot-product
-BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0
+BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2
### SVE2 floating-point multiply-add long (indexed)
FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (3 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-03 9:08 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
` (4 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sve.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 588a5b006b..a0de5b488d 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7025,17 +7025,26 @@ DO_ZPZZ_FP(FMINNMP, aa64_sve2, sve2_fminnmp_zpzz)
DO_ZPZZ_FP(FMAXP, aa64_sve2, sve2_fmaxp_zpzz)
DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
+static bool do_fmmla(DisasContext *s, arg_rrrr_esz *a,
+ gen_helper_gvec_4_ptr *fn)
+{
+ if (sve_access_check(s)) {
+ if (vec_full_reg_size(s) < 4 * memop_size(a->esz)) {
+ unallocated_encoding(s);
+ } else {
+ gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, 0, FPST_A64);
+ }
+ }
+ return true;
+}
+
+TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, do_fmmla, a, gen_helper_fmmla_s)
+TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, do_fmmla, a, gen_helper_fmmla_d)
+
/*
* SVE Integer Multiply-Add (unpredicated)
*/
-TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz,
- gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
- 0, FPST_A64)
-TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz,
- gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
- 0, FPST_A64)
-
static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
NULL, gen_helper_sve2_sqdmlal_zzzw_h,
gen_helper_sve2_sqdmlal_zzzw_s, gen_helper_sve2_sqdmlal_zzzw_d,
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (4 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-02 18:15 ` Richard Henderson
2025-07-03 9:10 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
` (3 subsequent siblings)
9 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
All F64MM instructions operate on a 256-bit vector.
If only 128-bit vectors is supported by the cpu,
then the cpu cannot enable F64MM.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu64.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 200da1c489..c5c289eadf 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
/* From now on sve_max_vq is the actual maximum supported length. */
cpu->sve_max_vq = max_vq;
cpu->sve_vq.map = vq_map;
+
+ /* FEAT_F64MM requires the existence of a 256-bit vector size. */
+ if (max_vq < 2) {
+ cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
+ ID_AA64ZFR0, F64MM, 0);
+ }
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (5 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-03 9:09 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
Gvec only operates on size 8 and multiples of 16.
Predicates may be any multiple of 2.
Round up the size using the appropriate function.
Cc: qemu-stable@nongnu.org
Fixes: 598ab0b24c0 ("target/arm: Implement PSEL")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sve.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index a0de5b488d..8403034a0e 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7291,6 +7291,7 @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
tcg_gen_neg_i64(tmp, tmp);
/* Apply to either copy the source, or write zeros. */
+ pl = size_for_gvec(pl);
tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
pred_full_reg_offset(s, a->pn), tmp, pl, pl);
return true;
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (6 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-03 9:12 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
Implement FPProcessNaNs4 within f16_dotadd, rather than
simply letting NaNs propagate through the function.
Cc: qemu-stable@nongnu.org
Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++----------
1 file changed, 46 insertions(+), 16 deletions(-)
diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index de0c6e54d4..8f33387e4b 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1005,25 +1005,55 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
* - we have pre-set-up copy of s_std which is set to round-to-odd,
* for the multiply (see below)
*/
- float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
- float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
- float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
- float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
- float64 t64;
+ float16 h1r = e1 & 0xffff;
+ float16 h1c = e1 >> 16;
+ float16 h2r = e2 & 0xffff;
+ float16 h2c = e2 >> 16;
float32 t32;
- /*
- * The ARM pseudocode function FPDot performs both multiplies
- * and the add with a single rounding operation. Emulate this
- * by performing the first multiply in round-to-odd, then doing
- * the second multiply as fused multiply-add, and rounding to
- * float32 all in one step.
- */
- t64 = float64_mul(e1r, e2r, s_odd);
- t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+ /* C.f. FPProcessNaNs4 */
+ if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) ||
+ float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) {
+ float16 t16;
- /* This conversion is exact, because we've already rounded. */
- t32 = float64_to_float32(t64, s_std);
+ if (float16_is_signaling_nan(h1r, s_f16)) {
+ t16 = h1r;
+ } else if (float16_is_signaling_nan(h1c, s_f16)) {
+ t16 = h1c;
+ } else if (float16_is_signaling_nan(h2r, s_f16)) {
+ t16 = h2r;
+ } else if (float16_is_signaling_nan(h2c, s_f16)) {
+ t16 = h2c;
+ } else if (float16_is_any_nan(h1r)) {
+ t16 = h1r;
+ } else if (float16_is_any_nan(h1c)) {
+ t16 = h1c;
+ } else if (float16_is_any_nan(h2r)) {
+ t16 = h2r;
+ } else {
+ t16 = h2c;
+ }
+ t32 = float16_to_float32(t16, true, s_f16);
+ } else {
+ float64 e1r = float16_to_float64(h1r, true, s_f16);
+ float64 e1c = float16_to_float64(h1c, true, s_f16);
+ float64 e2r = float16_to_float64(h2r, true, s_f16);
+ float64 e2c = float16_to_float64(h2c, true, s_f16);
+ float64 t64;
+
+ /*
+ * The ARM pseudocode function FPDot performs both multiplies
+ * and the add with a single rounding operation. Emulate this
+ * by performing the first multiply in round-to-odd, then doing
+ * the second multiply as fused multiply-add, and rounding to
+ * float32 all in one step.
+ */
+ t64 = float64_mul(e1r, e2r, s_odd);
+ t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+
+ /* This conversion is exact, because we've already rounded. */
+ t32 = float64_to_float32(t64, s_std);
+ }
/* The final accumulation step is not fused. */
return float32_add(sum, t32, s_std);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf vs nan selection
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (7 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-03 9:13 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable
Implement FPProcessNaNs4 within f16_dotadd, rather than
simply letting NaNs propagate through the function.
Cc: qemu-stable@nongnu.org
Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++++-----------
1 file changed, 53 insertions(+), 22 deletions(-)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 986eaf8ffa..21c6175d2e 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2989,31 +2989,62 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst)
float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
float_status *fpst, float_status *fpst_odd)
{
- /*
- * Compare f16_dotadd() in sme_helper.c, but here we have
- * bfloat16 inputs. In particular that means that we do not
- * want the FPCR.FZ16 flush semantics, so we use the normal
- * float_status for the input handling here.
- */
- float64 e1r = float32_to_float64(e1 << 16, fpst);
- float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst);
- float64 e2r = float32_to_float64(e2 << 16, fpst);
- float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst);
- float64 t64;
+ float32 s1r = e1 << 16;
+ float32 s1c = e1 & 0xffff0000u;
+ float32 s2r = e2 << 16;
+ float32 s2c = e2 & 0xffff0000u;
float32 t32;
- /*
- * The ARM pseudocode function FPDot performs both multiplies
- * and the add with a single rounding operation. Emulate this
- * by performing the first multiply in round-to-odd, then doing
- * the second multiply as fused multiply-add, and rounding to
- * float32 all in one step.
- */
- t64 = float64_mul(e1r, e2r, fpst_odd);
- t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst);
+ /* C.f. FPProcessNaNs4 */
+ if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) ||
+ float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) {
+ if (float32_is_signaling_nan(s2r, fpst)) {
+ t32 = s2r;
+ } else if (float32_is_signaling_nan(s2c, fpst)) {
+ t32 = s2c;
+ } else if (float32_is_signaling_nan(s2r, fpst)) {
+ t32 = s2r;
+ } else if (float32_is_signaling_nan(s2c, fpst)) {
+ t32 = s2c;
+ } else if (float32_is_any_nan(s2r)) {
+ t32 = s2r;
+ } else if (float32_is_any_nan(s2c)) {
+ t32 = s2c;
+ } else if (float32_is_any_nan(s2r)) {
+ t32 = s2r;
+ } else {
+ t32 = s2c;
+ }
+ /*
+ * FPConvertNaN(FPProcessNaN(t32)) will be done as part
+ * of the final addition below.
+ */
+ } else {
+ /*
+ * Compare f16_dotadd() in sme_helper.c, but here we have
+ * bfloat16 inputs. In particular that means that we do not
+ * want the FPCR.FZ16 flush semantics, so we use the normal
+ * float_status for the input handling here.
+ */
+ float64 e1r = float32_to_float64(s1r, fpst);
+ float64 e1c = float32_to_float64(s1c, fpst);
+ float64 e2r = float32_to_float64(s2r, fpst);
+ float64 e2c = float32_to_float64(s2c, fpst);
+ float64 t64;
- /* This conversion is exact, because we've already rounded. */
- t32 = float64_to_float32(t64, fpst);
+ /*
+ * The ARM pseudocode function FPDot performs both multiplies
+ * and the add with a single rounding operation. Emulate this
+ * by performing the first multiply in round-to-odd, then doing
+ * the second multiply as fused multiply-add, and rounding to
+ * float32 all in one step.
+ */
+ t64 = float64_mul(e1r, e2r, fpst_odd);
+ t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst);
+
+ /* This conversion is exact, because we've already rounded. */
+ t32 = float64_to_float32(t64, fpst);
+ }
/* The final accumulation step is not fused. */
return float32_add(sum, t32, fpst);
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
` (8 preceding siblings ...)
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
2025-07-02 13:47 ` Alex Bennée
9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
The last use of this field was removed in b2fc7be972b9.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 302c24e232..15b47a5bfc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -668,9 +668,6 @@ typedef struct CPUArchState {
uint32_t xregs[16];
- /* Scratch space for aa32 neon expansion. */
- uint32_t scratch[8];
-
/* There are a number of distinct float control structures. */
float_status fp_status[FPST_COUNT];
--
2.43.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
@ 2025-07-02 13:47 ` Alex Bennée
0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2025-07-02 13:47 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, peter.maydell
Richard Henderson <richard.henderson@linaro.org> writes:
> The last use of this field was removed in b2fc7be972b9.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
@ 2025-07-02 14:11 ` Richard Henderson
2025-07-03 9:14 ` Peter Maydell
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 14:11 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
On 7/2/25 06:22, Richard Henderson wrote:
> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
> users to supply an explicit esz.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
> 1 file changed, 24 insertions(+), 24 deletions(-)
Bah. Sorting error with too many patches.
This is not a SME1 bug fix, merely a code reorg for SME2.
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
@ 2025-07-02 18:15 ` Richard Henderson
2025-07-03 9:10 ` Peter Maydell
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 18:15 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm, peter.maydell
On 7/2/25 06:22, Richard Henderson wrote:
> All F64MM instructions operate on a 256-bit vector.
> If only 128-bit vectors is supported by the cpu,
> then the cpu cannot enable F64MM.
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu64.c | 6 ++++++
> 1 file changed, 6 insertions(+)
Ho hum. The idregs reorg landed overnight. I will update my branch, but will not re-post
right away.
r~
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 200da1c489..c5c289eadf 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> /* From now on sve_max_vq is the actual maximum supported length. */
> cpu->sve_max_vq = max_vq;
> cpu->sve_vq.map = vq_map;
> +
> + /* FEAT_F64MM requires the existence of a 256-bit vector size. */
> + if (max_vq < 2) {
> + cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
> + ID_AA64ZFR0, F64MM, 0);
> + }
> }
>
> /*
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
@ 2025-07-03 9:08 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:08 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We missed the instructions UDEF when the vector size is too small.
> We missed marking the instructions non-streaming with SME.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
@ 2025-07-03 9:08 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:08 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-sve.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
@ 2025-07-03 9:09 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:09 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Gvec only operates on size 8 and multiples of 16.
> Predicates may be any multiple of 2.
> Round up the size using the appropriate function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 598ab0b24c0 ("target/arm: Implement PSEL")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-sve.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
> index a0de5b488d..8403034a0e 100644
> --- a/target/arm/tcg/translate-sve.c
> +++ b/target/arm/tcg/translate-sve.c
> @@ -7291,6 +7291,7 @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
> tcg_gen_neg_i64(tmp, tmp);
>
> /* Apply to either copy the source, or write zeros. */
> + pl = size_for_gvec(pl);
> tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
> pred_full_reg_offset(s, a->pn), tmp, pl, pl);
> return true;
> --
> 2.43.0
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
2025-07-02 18:15 ` Richard Henderson
@ 2025-07-03 9:10 ` Peter Maydell
1 sibling, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:10 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> All F64MM instructions operate on a 256-bit vector.
> If only 128-bit vectors is supported by the cpu,
> then the cpu cannot enable F64MM.
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu64.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 200da1c489..c5c289eadf 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> /* From now on sve_max_vq is the actual maximum supported length. */
> cpu->sve_max_vq = max_vq;
> cpu->sve_vq.map = vq_map;
> +
> + /* FEAT_F64MM requires the existence of a 256-bit vector size. */
> + if (max_vq < 2) {
> + cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
> + ID_AA64ZFR0, F64MM, 0);
> + }
> }
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
(with, as you say, the obvious fixup for the id register changes)
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection
2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
@ 2025-07-03 9:12 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Implement FPProcessNaNs4 within f16_dotadd, rather than
> simply letting NaNs propagate through the function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++----------
> 1 file changed, 46 insertions(+), 16 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf vs nan selection
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
@ 2025-07-03 9:13 ` Peter Maydell
0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:13 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Implement FPProcessNaNs4 within f16_dotadd, rather than
should be "bfdotadd_ebf" ?
> simply letting NaNs propagate through the function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> + /* C.f. FPProcessNaNs4 */
> + if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) ||
> + float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) {
> + if (float32_is_signaling_nan(s2r, fpst)) {
> + t32 = s2r;
> + } else if (float32_is_signaling_nan(s2c, fpst)) {
> + t32 = s2c;
> + } else if (float32_is_signaling_nan(s2r, fpst)) {
> + t32 = s2r;
> + } else if (float32_is_signaling_nan(s2c, fpst)) {
> + t32 = s2c;
> + } else if (float32_is_any_nan(s2r)) {
> + t32 = s2r;
> + } else if (float32_is_any_nan(s2c)) {
> + t32 = s2c;
> + } else if (float32_is_any_nan(s2r)) {
> + t32 = s2r;
> + } else {
> + t32 = s2c;
> + }
Looks like a cut-and-paste error -- we check s2r and s2c
twice and never look at s1r and s1c.
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
2025-07-02 14:11 ` Richard Henderson
@ 2025-07-03 9:14 ` Peter Maydell
1 sibling, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03 9:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
> users to supply an explicit esz.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
> 1 file changed, 24 insertions(+), 24 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-03 9:15 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
2025-07-03 9:08 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
2025-07-02 14:11 ` Richard Henderson
2025-07-03 9:14 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
2025-07-03 9:08 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
2025-07-02 18:15 ` Richard Henderson
2025-07-03 9:10 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
2025-07-03 9:09 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
2025-07-03 9:12 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
2025-07-03 9:13 ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
2025-07-02 13:47 ` Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).