[PATCH v3 00/10] target/arm: SME1/SVE2 fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/10] target/arm: SME1/SVE2 fixes
@ 2025-07-02 12:22 Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

- A couple of fixes for EC_SMETRAP, plus some insns that missed
  being updated for non-streaming.
  (Removed the switch (bool) that PMM commented on, and as it
  turns out our clang CI Werrors on; now perhaps clearer using
  if + goto, where the label name helps clarify things.)
- Disable FEAT_F64MM if the command-line sve vector size cannot
  support it (PMM)
- Fix a gvec assert in PSEL
- Fix NaN selection per FPDot pseudocode.

r~

Richard Henderson (10):
  target/arm: Fix SME vs AdvSIMD exception priority
  target/arm: Fix sve_access_check for SME
  target/arm: Fix 128-bit element ZIP, UZP, TRN
  target/arm: Replace @rda_rn_rm_e0 in sve.decode
  target/arm: Fix FMMLA (64-bit element) for 128-bit VL
  target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
  target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
  target/arm: Fix f16_dotadd vs nan selection
  target/arm: Fix bfdotadd_ebf vs nan selection
  target/arm: Remove CPUARMState.vfp.scratch

 target/arm/cpu.h               |  3 --
 target/arm/cpu64.c             |  6 +++
 target/arm/tcg/sme_helper.c    | 62 ++++++++++++++++++++--------
 target/arm/tcg/translate-a64.c | 29 +++++++++----
 target/arm/tcg/translate-sve.c | 67 +++++++++++++++++++++---------
 target/arm/tcg/vec_helper.c    | 75 ++++++++++++++++++++++++----------
 target/arm/tcg/sve.decode      | 48 +++++++++++-----------
 7 files changed, 197 insertions(+), 93 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

We failed to raise an exception when
sme_excp_el == 0 and fp_excp_el == 1.

Cc: qemu-stable@nongnu.org
Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index ac80f572a2..bb49a2ce90 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1494,7 +1494,8 @@ bool sme_enabled_check(DisasContext *s)
      * to be zero when fp_excp_el has priority.  This is because we need
      * sme_excp_el by itself for cpregs access checks.
      */
-    if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
+    if (s->sme_excp_el
+        && (!s->fp_excp_el || s->sme_excp_el <= s->fp_excp_el)) {
         bool ret = sme_access_check(s);
         s->fp_access_checked = (ret ? 1 : -1);
         return ret;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 02/10] target/arm: Fix sve_access_check for SME
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

Do not assume SME implies SVE.  Ensure that the non-streaming
check is present along the SME path, since it is not implied
by sme_*_enabled_check.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-a64.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index bb49a2ce90..7f8671e2e8 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1387,11 +1387,8 @@ static bool fp_access_check_only(DisasContext *s)
     return true;
 }
 
-static bool fp_access_check(DisasContext *s)
+static bool nonstreaming_check(DisasContext *s)
 {
-    if (!fp_access_check_only(s)) {
-        return false;
-    }
     if (s->sme_trap_nonstreaming && s->is_nonstreaming) {
         gen_exception_insn(s, 0, EXCP_UDEF,
                            syn_smetrap(SME_ET_Streaming, false));
@@ -1400,6 +1397,11 @@ static bool fp_access_check(DisasContext *s)
     return true;
 }
 
+static bool fp_access_check(DisasContext *s)
+{
+    return fp_access_check_only(s) && nonstreaming_check(s);
+}
+
 /*
  * Return <0 for non-supported element sizes, with MO_16 controlled by
  * FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success.
@@ -1450,14 +1452,24 @@ static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz)
  */
 bool sve_access_check(DisasContext *s)
 {
-    if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) {
+    if (dc_isar_feature(aa64_sme, s)) {
         bool ret;
 
-        assert(dc_isar_feature(aa64_sme, s));
-        ret = sme_sm_enabled_check(s);
+        if (s->pstate_sm) {
+            ret = sme_enabled_check(s);
+        } else if (dc_isar_feature(aa64_sve, s)) {
+            goto continue_sve;
+        } else {
+            ret = sme_sm_enabled_check(s);
+        }
+        if (ret) {
+            ret = nonstreaming_check(s);
+        }
         s->sve_access_checked = (ret ? 1 : -1);
         return ret;
     }
+
+ continue_sve:
     if (s->sve_excp_el) {
         /* Assert that we only raise one exception per instruction. */
         assert(!s->sve_access_checked);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-03  9:08   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

We missed the instructions UDEF when the vector size is too small.
We missed marking the instructions non-streaming with SME.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sve.c | 43 ++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index f3cf028cb9..588a5b006b 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -2352,6 +2352,23 @@ TRANS_FEAT(PUNPKHI, aa64_sve, do_perm_pred2, a, 1, gen_helper_sve_punpk_p)
  *** SVE Permute - Interleaving Group
  */
 
+static bool do_interleave_q(DisasContext *s, gen_helper_gvec_3 *fn,
+                            arg_rrr_esz *a, int data)
+{
+    if (sve_access_check(s)) {
+        unsigned vsz = vec_full_reg_size(s);
+        if (vsz < 32) {
+            unallocated_encoding(s);
+        } else {
+            tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                               vec_full_reg_offset(s, a->rn),
+                               vec_full_reg_offset(s, a->rm),
+                               vsz, vsz, data, fn);
+        }
+    }
+    return true;
+}
+
 static gen_helper_gvec_3 * const zip_fns[4] = {
     gen_helper_sve_zip_b, gen_helper_sve_zip_h,
     gen_helper_sve_zip_s, gen_helper_sve_zip_d,
@@ -2361,11 +2378,11 @@ TRANS_FEAT(ZIP1_z, aa64_sve, gen_gvec_ool_arg_zzz,
 TRANS_FEAT(ZIP2_z, aa64_sve, gen_gvec_ool_arg_zzz,
            zip_fns[a->esz], a, vec_full_reg_size(s) / 2)
 
-TRANS_FEAT(ZIP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_zip_q, a, 0)
-TRANS_FEAT(ZIP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_zip_q, a,
-           QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2)
+TRANS_FEAT_NONSTREAMING(ZIP1_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_zip_q, a, 0)
+TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_zip_q, a,
+                        QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2)
 
 static gen_helper_gvec_3 * const uzp_fns[4] = {
     gen_helper_sve_uzp_b, gen_helper_sve_uzp_h,
@@ -2377,10 +2394,10 @@ TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz,
 TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz,
            uzp_fns[a->esz], a, 1 << a->esz)
 
-TRANS_FEAT(UZP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_uzp_q, a, 0)
-TRANS_FEAT(UZP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_uzp_q, a, 16)
+TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_uzp_q, a, 0)
+TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_uzp_q, a, 16)
 
 static gen_helper_gvec_3 * const trn_fns[4] = {
     gen_helper_sve_trn_b, gen_helper_sve_trn_h,
@@ -2392,10 +2409,10 @@ TRANS_FEAT(TRN1_z, aa64_sve, gen_gvec_ool_arg_zzz,
 TRANS_FEAT(TRN2_z, aa64_sve, gen_gvec_ool_arg_zzz,
            trn_fns[a->esz], a, 1 << a->esz)
 
-TRANS_FEAT(TRN1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_trn_q, a, 0)
-TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
-           gen_helper_sve2_trn_q, a, 16)
+TRANS_FEAT_NONSTREAMING(TRN1_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_trn_q, a, 0)
+TRANS_FEAT_NONSTREAMING(TRN2_q, aa64_sve_f64mm, do_interleave_q,
+                        gen_helper_sve2_trn_q, a, 16)
 
 /*
  *** SVE Permute Vector - Predicated Group
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (2 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-02 14:11   ` Richard Henderson
  2025-07-03  9:14   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
users to supply an explicit esz.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 04b6fcc0cf..3a99eb7299 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -131,11 +131,11 @@
 @rda_rn_rm      ........ esz:2 . rm:5 ... ... rn:5 rd:5 \
                 &rrrr_esz ra=%reg_movprfx
 
-# Four operand with unused vector element size
-@rda_rn_rm_e0   ........ ... rm:5 ... ... rn:5 rd:5 \
-                &rrrr_esz esz=0 ra=%reg_movprfx
-@rdn_ra_rm_e0   ........ ... rm:5 ... ... ra:5 rd:5 \
-                &rrrr_esz esz=0 rn=%reg_movprfx
+# Four operand with explicit vector element size
+@rda_rn_rm_ex   ........ ... rm:5 ... ... rn:5 rd:5 \
+                &rrrr_esz ra=%reg_movprfx
+@rdn_ra_rm_ex   ........ ... rm:5 ... ... ra:5 rd:5 \
+                &rrrr_esz rn=%reg_movprfx
 
 # Three operand with "memory" size, aka immediate left shift
 @rd_rn_msz_rm   ........ ... rm:5 .... imm:2 rn:5 rd:5          &rrri
@@ -428,12 +428,12 @@ XAR             00000100 .. 1 ..... 001 101 rm:5  rd:5   &rrri_esz \
                 rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr
 
 # SVE2 bitwise ternary operations
-EOR3            00000100 00 1 ..... 001 110 ..... .....         @rdn_ra_rm_e0
-BSL             00000100 00 1 ..... 001 111 ..... .....         @rdn_ra_rm_e0
-BCAX            00000100 01 1 ..... 001 110 ..... .....         @rdn_ra_rm_e0
-BSL1N           00000100 01 1 ..... 001 111 ..... .....         @rdn_ra_rm_e0
-BSL2N           00000100 10 1 ..... 001 111 ..... .....         @rdn_ra_rm_e0
-NBSL            00000100 11 1 ..... 001 111 ..... .....         @rdn_ra_rm_e0
+EOR3            00000100 00 1 ..... 001 110 ..... .....     @rdn_ra_rm_ex esz=0
+BSL             00000100 00 1 ..... 001 111 ..... .....     @rdn_ra_rm_ex esz=0
+BCAX            00000100 01 1 ..... 001 110 ..... .....     @rdn_ra_rm_ex esz=0
+BSL1N           00000100 01 1 ..... 001 111 ..... .....     @rdn_ra_rm_ex esz=0
+BSL2N           00000100 10 1 ..... 001 111 ..... .....     @rdn_ra_rm_ex esz=0
+NBSL            00000100 11 1 ..... 001 111 ..... .....     @rdn_ra_rm_ex esz=0
 
 ### SVE Index Generation Group
 
@@ -1450,9 +1450,9 @@ EORTB           01000101 .. 0 ..... 10010 1 ..... .....  @rd_rn_rm
 
 ## SVE integer matrix multiply accumulate
 
-SMMLA           01000101 00 0 ..... 10011 0 ..... .....  @rda_rn_rm_e0
-USMMLA          01000101 10 0 ..... 10011 0 ..... .....  @rda_rn_rm_e0
-UMMLA           01000101 11 0 ..... 10011 0 ..... .....  @rda_rn_rm_e0
+SMMLA           01000101 00 0 ..... 10011 0 ..... .....  @rda_rn_rm_ex esz=2
+USMMLA          01000101 10 0 ..... 10011 0 ..... .....  @rda_rn_rm_ex esz=2
+UMMLA           01000101 11 0 ..... 10011 0 ..... .....  @rda_rn_rm_ex esz=2
 
 ## SVE2 bitwise permute
 
@@ -1602,9 +1602,9 @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
 USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
 
 ### SVE2 floating point matrix multiply accumulate
-BFMMLA          01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
-FMMLA_s         01100100 10 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
-FMMLA_d         01100100 11 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
+BFMMLA          01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_ex esz=1
+FMMLA_s         01100100 10 1 ..... 111 001 ..... .....  @rda_rn_rm_ex esz=2
+FMMLA_d         01100100 11 1 ..... 111 001 ..... .....  @rda_rn_rm_ex esz=3
 
 ### SVE2 Memory Gather Load Group
 
@@ -1654,16 +1654,16 @@ FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
 FLOGB           01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5  &rpr_esz
 
 ### SVE2 floating-point multiply-add long (vectors)
-FMLALB_zzzw     01100100 10 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
-FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
-FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
-FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
+FMLALB_zzzw     01100100 10 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_ex esz=2
+FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_ex esz=2
+FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_ex esz=2
+FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_ex esz=2
 
-BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
-BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_ex esz=2
+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_ex esz=2
 
 ### SVE2 floating-point bfloat16 dot-product
-BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_ex esz=2
 
 ### SVE2 floating-point multiply-add long (indexed)
 FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (3 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-03  9:08   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sve.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 588a5b006b..a0de5b488d 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7025,17 +7025,26 @@ DO_ZPZZ_FP(FMINNMP, aa64_sve2, sve2_fminnmp_zpzz)
 DO_ZPZZ_FP(FMAXP, aa64_sve2, sve2_fmaxp_zpzz)
 DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
 
+static bool do_fmmla(DisasContext *s, arg_rrrr_esz *a,
+                     gen_helper_gvec_4_ptr *fn)
+{
+    if (sve_access_check(s)) {
+        if (vec_full_reg_size(s) < 4 * memop_size(a->esz)) {
+            unallocated_encoding(s);
+        } else {
+            gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, 0, FPST_A64);
+        }
+    }
+    return true;
+}
+
+TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, do_fmmla, a, gen_helper_fmmla_s)
+TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, do_fmmla, a, gen_helper_fmmla_d)
+
 /*
  * SVE Integer Multiply-Add (unpredicated)
  */
 
-TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz,
-                        gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
-                        0, FPST_A64)
-TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz,
-                        gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
-                        0, FPST_A64)
-
 static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
     NULL,                           gen_helper_sve2_sqdmlal_zzzw_h,
     gen_helper_sve2_sqdmlal_zzzw_s, gen_helper_sve2_sqdmlal_zzzw_d,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (4 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-02 18:15   ` Richard Henderson
  2025-07-03  9:10   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

All F64MM instructions operate on a 256-bit vector.
If only 128-bit vectors is supported by the cpu,
then the cpu cannot enable F64MM.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu64.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 200da1c489..c5c289eadf 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
     /* From now on sve_max_vq is the actual maximum supported length. */
     cpu->sve_max_vq = max_vq;
     cpu->sve_vq.map = vq_map;
+
+    /* FEAT_F64MM requires the existence of a 256-bit vector size. */
+    if (max_vq < 2) {
+        cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
+                                           ID_AA64ZFR0, F64MM, 0);
+    }
 }
 
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (5 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-03  9:09   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

Gvec only operates on size 8 and multiples of 16.
Predicates may be any multiple of 2.
Round up the size using the appropriate function.

Cc: qemu-stable@nongnu.org
Fixes: 598ab0b24c0 ("target/arm: Implement PSEL")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sve.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index a0de5b488d..8403034a0e 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7291,6 +7291,7 @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
     tcg_gen_neg_i64(tmp, tmp);
 
     /* Apply to either copy the source, or write zeros. */
+    pl = size_for_gvec(pl);
     tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
                       pred_full_reg_offset(s, a->pn), tmp, pl, pl);
     return true;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (6 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-03  9:12   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
  2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

Implement FPProcessNaNs4 within f16_dotadd, rather than
simply letting NaNs propagate through the function.

Cc: qemu-stable@nongnu.org
Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index de0c6e54d4..8f33387e4b 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1005,25 +1005,55 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
      *  - we have pre-set-up copy of s_std which is set to round-to-odd,
      *    for the multiply (see below)
      */
-    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
-    float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
-    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
-    float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
-    float64 t64;
+    float16 h1r = e1 & 0xffff;
+    float16 h1c = e1 >> 16;
+    float16 h2r = e2 & 0xffff;
+    float16 h2c = e2 >> 16;
     float32 t32;
 
-    /*
-     * The ARM pseudocode function FPDot performs both multiplies
-     * and the add with a single rounding operation.  Emulate this
-     * by performing the first multiply in round-to-odd, then doing
-     * the second multiply as fused multiply-add, and rounding to
-     * float32 all in one step.
-     */
-    t64 = float64_mul(e1r, e2r, s_odd);
-    t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+    /* C.f. FPProcessNaNs4 */
+    if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) ||
+        float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) {
+        float16 t16;
 
-    /* This conversion is exact, because we've already rounded. */
-    t32 = float64_to_float32(t64, s_std);
+        if (float16_is_signaling_nan(h1r, s_f16)) {
+            t16 = h1r;
+        } else if (float16_is_signaling_nan(h1c, s_f16)) {
+            t16 = h1c;
+        } else if (float16_is_signaling_nan(h2r, s_f16)) {
+            t16 = h2r;
+        } else if (float16_is_signaling_nan(h2c, s_f16)) {
+            t16 = h2c;
+        } else if (float16_is_any_nan(h1r)) {
+            t16 = h1r;
+        } else if (float16_is_any_nan(h1c)) {
+            t16 = h1c;
+        } else if (float16_is_any_nan(h2r)) {
+            t16 = h2r;
+        } else {
+            t16 = h2c;
+        }
+        t32 = float16_to_float32(t16, true, s_f16);
+    } else {
+        float64 e1r = float16_to_float64(h1r, true, s_f16);
+        float64 e1c = float16_to_float64(h1c, true, s_f16);
+        float64 e2r = float16_to_float64(h2r, true, s_f16);
+        float64 e2c = float16_to_float64(h2c, true, s_f16);
+        float64 t64;
+
+        /*
+         * The ARM pseudocode function FPDot performs both multiplies
+         * and the add with a single rounding operation.  Emulate this
+         * by performing the first multiply in round-to-odd, then doing
+         * the second multiply as fused multiply-add, and rounding to
+         * float32 all in one step.
+         */
+        t64 = float64_mul(e1r, e2r, s_odd);
+        t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+
+        /* This conversion is exact, because we've already rounded. */
+        t32 = float64_to_float32(t64, s_std);
+    }
 
     /* The final accumulation step is not fused. */
     return float32_add(sum, t32, s_std);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf vs nan selection
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (7 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-03  9:13   ` Peter Maydell
  2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell, qemu-stable

Implement FPProcessNaNs4 within f16_dotadd, rather than
simply letting NaNs propagate through the function.

Cc: qemu-stable@nongnu.org
Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 22 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 986eaf8ffa..21c6175d2e 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2989,31 +2989,62 @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst)
 float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2,
                      float_status *fpst, float_status *fpst_odd)
 {
-    /*
-     * Compare f16_dotadd() in sme_helper.c, but here we have
-     * bfloat16 inputs. In particular that means that we do not
-     * want the FPCR.FZ16 flush semantics, so we use the normal
-     * float_status for the input handling here.
-     */
-    float64 e1r = float32_to_float64(e1 << 16, fpst);
-    float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst);
-    float64 e2r = float32_to_float64(e2 << 16, fpst);
-    float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst);
-    float64 t64;
+    float32 s1r = e1 << 16;
+    float32 s1c = e1 & 0xffff0000u;
+    float32 s2r = e2 << 16;
+    float32 s2c = e2 & 0xffff0000u;
     float32 t32;
 
-    /*
-     * The ARM pseudocode function FPDot performs both multiplies
-     * and the add with a single rounding operation.  Emulate this
-     * by performing the first multiply in round-to-odd, then doing
-     * the second multiply as fused multiply-add, and rounding to
-     * float32 all in one step.
-     */
-    t64 = float64_mul(e1r, e2r, fpst_odd);
-    t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst);
+    /* C.f. FPProcessNaNs4 */
+    if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) ||
+        float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) {
+        if (float32_is_signaling_nan(s2r, fpst)) {
+            t32 = s2r;
+        } else if (float32_is_signaling_nan(s2c, fpst)) {
+            t32 = s2c;
+        } else if (float32_is_signaling_nan(s2r, fpst)) {
+            t32 = s2r;
+        } else if (float32_is_signaling_nan(s2c, fpst)) {
+            t32 = s2c;
+        } else if (float32_is_any_nan(s2r)) {
+            t32 = s2r;
+        } else if (float32_is_any_nan(s2c)) {
+            t32 = s2c;
+        } else if (float32_is_any_nan(s2r)) {
+            t32 = s2r;
+        } else {
+            t32 = s2c;
+        }
+        /*
+         * FPConvertNaN(FPProcessNaN(t32)) will be done as part
+         * of the final addition below.
+         */
+    } else {
+        /*
+         * Compare f16_dotadd() in sme_helper.c, but here we have
+         * bfloat16 inputs. In particular that means that we do not
+         * want the FPCR.FZ16 flush semantics, so we use the normal
+         * float_status for the input handling here.
+         */
+        float64 e1r = float32_to_float64(s1r, fpst);
+        float64 e1c = float32_to_float64(s1c, fpst);
+        float64 e2r = float32_to_float64(s2r, fpst);
+        float64 e2c = float32_to_float64(s2c, fpst);
+        float64 t64;
 
-    /* This conversion is exact, because we've already rounded. */
-    t32 = float64_to_float32(t64, fpst);
+        /*
+         * The ARM pseudocode function FPDot performs both multiplies
+         * and the add with a single rounding operation.  Emulate this
+         * by performing the first multiply in round-to-odd, then doing
+         * the second multiply as fused multiply-add, and rounding to
+         * float32 all in one step.
+         */
+        t64 = float64_mul(e1r, e2r, fpst_odd);
+        t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst);
+
+        /* This conversion is exact, because we've already rounded. */
+        t32 = float64_to_float32(t64, fpst);
+    }
 
     /* The final accumulation step is not fused. */
     return float32_add(sum, t32, fpst);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch
  2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
                   ` (8 preceding siblings ...)
  2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
@ 2025-07-02 12:22 ` Richard Henderson
  2025-07-02 13:47   ` Alex Bennée
  9 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

The last use of this field was removed in b2fc7be972b9.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 302c24e232..15b47a5bfc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -668,9 +668,6 @@ typedef struct CPUArchState {
 
         uint32_t xregs[16];
 
-        /* Scratch space for aa32 neon expansion.  */
-        uint32_t scratch[8];
-
         /* There are a number of distinct float control structures. */
         float_status fp_status[FPST_COUNT];
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch
  2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
@ 2025-07-02 13:47   ` Alex Bennée
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2025-07-02 13:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, peter.maydell

Richard Henderson <richard.henderson@linaro.org> writes:

> The last use of this field was removed in b2fc7be972b9.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
  2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
@ 2025-07-02 14:11   ` Richard Henderson
  2025-07-03  9:14   ` Peter Maydell
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 14:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

On 7/2/25 06:22, Richard Henderson wrote:
> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
> users to supply an explicit esz.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
>   1 file changed, 24 insertions(+), 24 deletions(-)

Bah.  Sorting error with too many patches.
This is not a SME1 bug fix, merely a code reorg for SME2.


r~


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
  2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
@ 2025-07-02 18:15   ` Richard Henderson
  2025-07-03  9:10   ` Peter Maydell
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2025-07-02 18:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, peter.maydell

On 7/2/25 06:22, Richard Henderson wrote:
> All F64MM instructions operate on a 256-bit vector.
> If only 128-bit vectors is supported by the cpu,
> then the cpu cannot enable F64MM.
> 
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/arm/cpu64.c | 6 ++++++
>   1 file changed, 6 insertions(+)

Ho hum.  The idregs reorg landed overnight.  I will update my branch, but will not re-post 
right away.


r~

> 
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 200da1c489..c5c289eadf 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
>       /* From now on sve_max_vq is the actual maximum supported length. */
>       cpu->sve_max_vq = max_vq;
>       cpu->sve_vq.map = vq_map;
> +
> +    /* FEAT_F64MM requires the existence of a 256-bit vector size. */
> +    if (max_vq < 2) {
> +        cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
> +                                           ID_AA64ZFR0, F64MM, 0);
> +    }
>   }
>   
>   /*



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN
  2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
@ 2025-07-03  9:08   ` Peter Maydell
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We missed the instructions UDEF when the vector size is too small.
> We missed marking the instructions non-streaming with SME.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL
  2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
@ 2025-07-03  9:08   ` Peter Maydell
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/translate-sve.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands
  2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
@ 2025-07-03  9:09   ` Peter Maydell
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Gvec only operates on size 8 and multiples of 16.
> Predicates may be any multiple of 2.
> Round up the size using the appropriate function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 598ab0b24c0 ("target/arm: Implement PSEL")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/translate-sve.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
> index a0de5b488d..8403034a0e 100644
> --- a/target/arm/tcg/translate-sve.c
> +++ b/target/arm/tcg/translate-sve.c
> @@ -7291,6 +7291,7 @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
>      tcg_gen_neg_i64(tmp, tmp);
>
>      /* Apply to either copy the source, or write zeros. */
> +    pl = size_for_gvec(pl);
>      tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
>                        pred_full_reg_offset(s, a->pn), tmp, pl, pl);
>      return true;
> --
> 2.43.0

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small
  2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
  2025-07-02 18:15   ` Richard Henderson
@ 2025-07-03  9:10   ` Peter Maydell
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> All F64MM instructions operate on a 256-bit vector.
> If only 128-bit vectors is supported by the cpu,
> then the cpu cannot enable F64MM.
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu64.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 200da1c489..c5c289eadf 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -237,6 +237,12 @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
>      /* From now on sve_max_vq is the actual maximum supported length. */
>      cpu->sve_max_vq = max_vq;
>      cpu->sve_vq.map = vq_map;
> +
> +    /* FEAT_F64MM requires the existence of a 256-bit vector size. */
> +    if (max_vq < 2) {
> +        cpu->isar.id_aa64zfr0 = FIELD_DP64(cpu->isar.id_aa64zfr0,
> +                                           ID_AA64ZFR0, F64MM, 0);
> +    }
>  }

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

(with, as you say, the obvious fixup for the id register changes)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection
  2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
@ 2025-07-03  9:12   ` Peter Maydell
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Implement FPProcessNaNs4 within f16_dotadd, rather than
> simply letting NaNs propagate through the function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++----------
>  1 file changed, 46 insertions(+), 16 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf vs nan selection
  2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
@ 2025-07-03  9:13   ` Peter Maydell
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Implement FPProcessNaNs4 within f16_dotadd, rather than

should be "bfdotadd_ebf" ?

> simply letting NaNs propagate through the function.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


> +    /* C.f. FPProcessNaNs4 */
> +    if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) ||
> +        float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) {
> +        if (float32_is_signaling_nan(s2r, fpst)) {
> +            t32 = s2r;
> +        } else if (float32_is_signaling_nan(s2c, fpst)) {
> +            t32 = s2c;
> +        } else if (float32_is_signaling_nan(s2r, fpst)) {
> +            t32 = s2r;
> +        } else if (float32_is_signaling_nan(s2c, fpst)) {
> +            t32 = s2c;
> +        } else if (float32_is_any_nan(s2r)) {
> +            t32 = s2r;
> +        } else if (float32_is_any_nan(s2c)) {
> +            t32 = s2c;
> +        } else if (float32_is_any_nan(s2r)) {
> +            t32 = s2r;
> +        } else {
> +            t32 = s2c;
> +        }

Looks like a cut-and-paste error -- we check s2r and s2c
twice and never look at s1r and s1c.

-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode
  2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
  2025-07-02 14:11   ` Richard Henderson
@ 2025-07-03  9:14   ` Peter Maydell
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Maydell @ 2025-07-03  9:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Wed, 2 Jul 2025 at 13:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require
> users to supply an explicit esz.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/sve.decode | 48 +++++++++++++++++++--------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-07-03  9:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 12:22 [PATCH v3 00/10] target/arm: SME1/SVE2 fixes Richard Henderson
2025-07-02 12:22 ` [PATCH v3 01/10] target/arm: Fix SME vs AdvSIMD exception priority Richard Henderson
2025-07-02 12:22 ` [PATCH v3 02/10] target/arm: Fix sve_access_check for SME Richard Henderson
2025-07-02 12:22 ` [PATCH v3 03/10] target/arm: Fix 128-bit element ZIP, UZP, TRN Richard Henderson
2025-07-03  9:08   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 04/10] target/arm: Replace @rda_rn_rm_e0 in sve.decode Richard Henderson
2025-07-02 14:11   ` Richard Henderson
2025-07-03  9:14   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 05/10] target/arm: Fix FMMLA (64-bit element) for 128-bit VL Richard Henderson
2025-07-03  9:08   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 06/10] target/arm: Disable FEAT_F64MM if maximum SVE vector size too small Richard Henderson
2025-07-02 18:15   ` Richard Henderson
2025-07-03  9:10   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 07/10] target/arm: Fix PSEL size operands to tcg_gen_gvec_ands Richard Henderson
2025-07-03  9:09   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 08/10] target/arm: Fix f16_dotadd vs nan selection Richard Henderson
2025-07-03  9:12   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 09/10] target/arm: Fix bfdotadd_ebf " Richard Henderson
2025-07-03  9:13   ` Peter Maydell
2025-07-02 12:22 ` [PATCH v3 10/10] target/arm: Remove CPUARMState.vfp.scratch Richard Henderson
2025-07-02 13:47   ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).