[PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes
@ 2025-07-18 17:30 Peter Maydell
  2025-07-18 17:30 ` [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated) Peter Maydell
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchset fixes some problems with the recently landed SVE2p1
feature. Notably, we missed out several bfloat16 insns entirely...
This patchset adds in the missing insns, and also fixes some
bugs in the FMAXQV etc insns, and an incorrect assertion in LD1Q.

I know this is slightly pushing the boundary of what counts as a
bugfix for 10.1, but it would be much more disruptive to try to unwind
the SVE2p1/SME2p1/SME2 features, and these are fixing the bug that we
UNDEF when we should not :-). I've set the Fixes: tag to point at the
commit where we enabled an accidentally incomplete FEAT_SVE_B16B16.

thanks
-- PMM

Peter Maydell (10):
  target/arm: Add BFADD, BFSUB, BFMUL (unpredicated)
  target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated)
  target/arm: Add BFMIN, BFMAX (predicated)
  target/arm: Add BFMUL (indexed)
  target/arm: Add BFMLA, BFMLS (vectors)
  target/arm: Add BFMLA, BFMLS (indexed)
  target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV
  target/arm: Don't nest H() macro calls in SVE DO_REDUCE
  target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV
  target/arm: Make LD1Q decode and trans fn agree about a->u

 target/arm/tcg/helper-sve.h    |  32 ++++++++++
 target/arm/tcg/helper.h        |   5 ++
 target/arm/tcg/sve.decode      |   5 +-
 target/arm/tcg/sve_helper.c    | 109 +++++++++++++++++++++++++++++----
 target/arm/tcg/translate-sve.c |  97 +++++++++++++++++++++--------
 target/arm/tcg/vec_helper.c    |   4 ++
 6 files changed, 212 insertions(+), 40 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:29   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated) Peter Maydell
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
(unpredicated) instructions, which are encoded via sz==0b00.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper.h        | 3 +++
 target/arm/tcg/translate-sve.c | 6 +++++-
 target/arm/tcg/vec_helper.c    | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h
index 0a006d95142..d9ca5b7c56e 100644
--- a/target/arm/tcg/helper.h
+++ b/target/arm/tcg/helper.h
@@ -728,16 +728,19 @@ DEF_HELPER_FLAGS_4(gvec_fclt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_fclt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_4(gvec_fclt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fadd_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_bfadd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fsub_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_bfsub, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fmul_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 7b575734fde..f00cccf1548 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -190,6 +190,10 @@ static bool gen_gvec_fpst_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
 static bool gen_gvec_fpst_arg_zzz(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                                   arg_rrr_esz *a, int data)
 {
+    /* These insns use MO_8 to encode BFloat16 */
+    if (a->esz == MO_8 && !dc_isar_feature(aa64_sve_b16b16, s)) {
+        return false;
+    }
     return gen_gvec_fpst_zzz(s, fn, a->rd, a->rn, a->rm, data,
                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 }
@@ -4146,7 +4150,7 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
 
 #define DO_FP3(NAME, name) \
     static gen_helper_gvec_3_ptr * const name##_fns[4] = {          \
-        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_b16, gen_helper_gvec_##name##_h,   \
         gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
     };                                                              \
     TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_arg_zzz, name##_fns[a->esz], a, 0)
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index bae6165b505..76a9ab0da39 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1467,16 +1467,19 @@ void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
     clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
+DO_3OP(gvec_fadd_b16, bfloat16_add, float16)
 DO_3OP(gvec_fadd_h, float16_add, float16)
 DO_3OP(gvec_fadd_s, float32_add, float32)
 DO_3OP(gvec_fadd_d, float64_add, float64)
 DO_3OP(gvec_bfadd, bfloat16_add, bfloat16)
 
+DO_3OP(gvec_fsub_b16, bfloat16_sub, float16)
 DO_3OP(gvec_fsub_h, float16_sub, float16)
 DO_3OP(gvec_fsub_s, float32_sub, float32)
 DO_3OP(gvec_fsub_d, float64_sub, float64)
 DO_3OP(gvec_bfsub, bfloat16_sub, bfloat16)
 
+DO_3OP(gvec_fmul_b16, bfloat16_mul, float16)
 DO_3OP(gvec_fmul_h, float16_mul, float16)
 DO_3OP(gvec_fmul_s, float32_mul, float32)
 DO_3OP(gvec_fmul_d, float64_mul, float64)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
  2025-07-18 17:30 ` [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:32   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated) Peter Maydell
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
(predicated) instructions, which are encoded via sz=0b00.
Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM; these are all the insns
in this group which do not change behaviour for AH=1.

We will deal with BFMAX/BFMIN (which do have different AH=1
behaviour) in a following commit.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 10 ++++++++++
 target/arm/tcg/sve_helper.c    |  5 +++++
 target/arm/tcg/translate-sve.c | 22 +++++++++++++++++-----
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index c36090d13d1..d612bcaded3 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1196,6 +1196,8 @@ DEF_HELPER_FLAGS_5(sve_fcmne0_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_fcmne0_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fadd_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG,
@@ -1203,6 +1205,8 @@ DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fadd_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fsub_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fsub_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fsub_s, TCG_CALL_NO_RWG,
@@ -1210,6 +1214,8 @@ DEF_HELPER_FLAGS_6(sve_fsub_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fsub_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fmul_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmul_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmul_s, TCG_CALL_NO_RWG,
@@ -1252,6 +1258,8 @@ DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_ah_fmax_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fminnum_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
@@ -1259,6 +1267,8 @@ DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fminnum_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fmaxnum_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmaxnum_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmaxnum_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 43b872c7fd6..a229503bc21 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4629,14 +4629,17 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,       \
     } while (i != 0);                                           \
 }
 
+DO_ZPZZ_FP(sve_fadd_b16, uint16_t, H1_2, bfloat16_add)
 DO_ZPZZ_FP(sve_fadd_h, uint16_t, H1_2, float16_add)
 DO_ZPZZ_FP(sve_fadd_s, uint32_t, H1_4, float32_add)
 DO_ZPZZ_FP(sve_fadd_d, uint64_t, H1_8, float64_add)
 
+DO_ZPZZ_FP(sve_fsub_b16, uint16_t, H1_2, bfloat16_sub)
 DO_ZPZZ_FP(sve_fsub_h, uint16_t, H1_2, float16_sub)
 DO_ZPZZ_FP(sve_fsub_s, uint32_t, H1_4, float32_sub)
 DO_ZPZZ_FP(sve_fsub_d, uint64_t, H1_8, float64_sub)
 
+DO_ZPZZ_FP(sve_fmul_b16, uint16_t, H1_2, bfloat16_mul)
 DO_ZPZZ_FP(sve_fmul_h, uint16_t, H1_2, float16_mul)
 DO_ZPZZ_FP(sve_fmul_s, uint32_t, H1_4, float32_mul)
 DO_ZPZZ_FP(sve_fmul_d, uint64_t, H1_8, float64_mul)
@@ -4661,10 +4664,12 @@ DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
 DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
 DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
 
+DO_ZPZZ_FP(sve_fminnum_b16, uint16_t, H1_2, bfloat16_minnum)
 DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
 DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum)
 DO_ZPZZ_FP(sve_fminnum_d, uint64_t, H1_8, float64_minnum)
 
+DO_ZPZZ_FP(sve_fmaxnum_b16, uint16_t, H1_2, bfloat16_maxnum)
 DO_ZPZZ_FP(sve_fmaxnum_h, uint16_t, H1_2, float16_maxnum)
 DO_ZPZZ_FP(sve_fmaxnum_s, uint32_t, H1_4, float32_maxnum)
 DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t, H1_8, float64_maxnum)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index f00cccf1548..2739c226d73 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -407,6 +407,10 @@ static bool gen_gvec_fpst_zzzp(DisasContext *s, gen_helper_gvec_4_ptr *fn,
 static bool gen_gvec_fpst_arg_zpzz(DisasContext *s, gen_helper_gvec_4_ptr *fn,
                                    arg_rprr_esz *a)
 {
+    /* These insns use MO_8 to encode BFloat16. */
+    if (a->esz == MO_8 && !dc_isar_feature(aa64_sve_b16b16, s)) {
+        return false;
+    }
     return gen_gvec_fpst_zzzp(s, fn, a->rd, a->rn, a->rm, a->pg, 0,
                               a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
 }
@@ -4206,13 +4210,21 @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
                s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
                name##_zpzz_fns[a->esz], a)
 
-DO_ZPZZ_FP(FADD_zpzz, aa64_sve, sve_fadd)
-DO_ZPZZ_FP(FSUB_zpzz, aa64_sve, sve_fsub)
-DO_ZPZZ_FP(FMUL_zpzz, aa64_sve, sve_fmul)
+/* Similar, but for insns where sz == 0 encodes bfloat16 */
+#define DO_ZPZZ_FP_B16(NAME, FEAT, name) \
+    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = { \
+        gen_helper_##name##_b16, gen_helper_##name##_h,         \
+        gen_helper_##name##_s, gen_helper_##name##_d            \
+    };                                                          \
+    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
+
+DO_ZPZZ_FP_B16(FADD_zpzz, aa64_sve, sve_fadd)
+DO_ZPZZ_FP_B16(FSUB_zpzz, aa64_sve, sve_fsub)
+DO_ZPZZ_FP_B16(FMUL_zpzz, aa64_sve, sve_fmul)
 DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
 DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
-DO_ZPZZ_FP(FMINNM_zpzz, aa64_sve, sve_fminnum)
-DO_ZPZZ_FP(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
+DO_ZPZZ_FP_B16(FMINNM_zpzz, aa64_sve, sve_fminnum)
+DO_ZPZZ_FP_B16(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
 DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
 DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
 DO_ZPZZ_FP(FDIV, aa64_sve, sve_fdiv)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
  2025-07-18 17:30 ` [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated) Peter Maydell
  2025-07-18 17:30 ` [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:34   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed) Peter Maydell
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
(predicated) instructions, which are encoded via sz=0b00.  Add the
BFMAX and BFMIN insns.  These have separate behaviour for AH=1 and
AH=0; we have already implemented the AH=1 helper for the SME2
versions of these insns.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    |  8 ++++++++
 target/arm/tcg/sve_helper.c    |  4 ++++
 target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index d612bcaded3..cb6c2355e58 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1230,6 +1230,8 @@ DEF_HELPER_FLAGS_6(sve_fdiv_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fdiv_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fmin_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmin_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmin_s, TCG_CALL_NO_RWG,
@@ -1237,6 +1239,8 @@ DEF_HELPER_FLAGS_6(sve_fmin_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmin_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_fmax_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmax_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
@@ -1244,6 +1248,8 @@ DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_ah_fmin_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_ah_fmin_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
@@ -1251,6 +1257,8 @@ DEF_HELPER_FLAGS_6(sve_ah_fmin_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_ah_fmin_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_6(sve_ah_fmax_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_ah_fmax_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_6(sve_ah_fmax_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index a229503bc21..1a56fa86d9c 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4648,18 +4648,22 @@ DO_ZPZZ_FP(sve_fdiv_h, uint16_t, H1_2, float16_div)
 DO_ZPZZ_FP(sve_fdiv_s, uint32_t, H1_4, float32_div)
 DO_ZPZZ_FP(sve_fdiv_d, uint64_t, H1_8, float64_div)
 
+DO_ZPZZ_FP(sve_fmin_b16, uint16_t, H1_2, bfloat16_min)
 DO_ZPZZ_FP(sve_fmin_h, uint16_t, H1_2, float16_min)
 DO_ZPZZ_FP(sve_fmin_s, uint32_t, H1_4, float32_min)
 DO_ZPZZ_FP(sve_fmin_d, uint64_t, H1_8, float64_min)
 
+DO_ZPZZ_FP(sve_fmax_b16, uint16_t, H1_2, bfloat16_max)
 DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
 DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max)
 DO_ZPZZ_FP(sve_fmax_d, uint64_t, H1_8, float64_max)
 
+DO_ZPZZ_FP(sve_ah_fmin_b16, uint16_t, H1_2, helper_sme2_ah_fmin_b16)
 DO_ZPZZ_FP(sve_ah_fmin_h, uint16_t, H1_2, helper_vfp_ah_minh)
 DO_ZPZZ_FP(sve_ah_fmin_s, uint32_t, H1_4, helper_vfp_ah_mins)
 DO_ZPZZ_FP(sve_ah_fmin_d, uint64_t, H1_8, helper_vfp_ah_mind)
 
+DO_ZPZZ_FP(sve_ah_fmax_b16, uint16_t, H1_2, helper_sme2_ah_fmax_b16)
 DO_ZPZZ_FP(sve_ah_fmax_h, uint16_t, H1_2, helper_vfp_ah_maxh)
 DO_ZPZZ_FP(sve_ah_fmax_s, uint32_t, H1_4, helper_vfp_ah_maxs)
 DO_ZPZZ_FP(sve_ah_fmax_d, uint64_t, H1_8, helper_vfp_ah_maxd)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 2739c226d73..27af3df9a4b 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4218,11 +4218,24 @@ TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
     };                                                          \
     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
 
+#define DO_ZPZZ_AH_FP_B16(NAME, FEAT, name, ah_name)                    \
+    static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = {         \
+        gen_helper_##name##_b16, gen_helper_##name##_h,                 \
+        gen_helper_##name##_s, gen_helper_##name##_d                    \
+    };                                                                  \
+    static gen_helper_gvec_4_ptr * const name##_ah_zpzz_fns[4] = {      \
+        gen_helper_##ah_name##_b16, gen_helper_##ah_name##_h,           \
+        gen_helper_##ah_name##_s, gen_helper_##ah_name##_d              \
+    };                                                                  \
+    TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz,                      \
+               s->fpcr_ah ? name##_ah_zpzz_fns[a->esz] :                \
+               name##_zpzz_fns[a->esz], a)
+
 DO_ZPZZ_FP_B16(FADD_zpzz, aa64_sve, sve_fadd)
 DO_ZPZZ_FP_B16(FSUB_zpzz, aa64_sve, sve_fsub)
 DO_ZPZZ_FP_B16(FMUL_zpzz, aa64_sve, sve_fmul)
-DO_ZPZZ_AH_FP(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
-DO_ZPZZ_AH_FP(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
+DO_ZPZZ_AH_FP_B16(FMIN_zpzz, aa64_sve, sve_fmin, sve_ah_fmin)
+DO_ZPZZ_AH_FP_B16(FMAX_zpzz, aa64_sve, sve_fmax, sve_ah_fmax)
 DO_ZPZZ_FP_B16(FMINNM_zpzz, aa64_sve, sve_fminnum)
 DO_ZPZZ_FP_B16(FMAXNM_zpzz, aa64_sve, sve_fmaxnum)
 DO_ZPZZ_AH_FP(FABD, aa64_sve, sve_fabd, sve_ah_fabd)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (2 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:38   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors) Peter Maydell
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds a bfloat16 version of the FMUL insn in the
floating-point multiply (indexed) instruction group. The encoding
is slightly bespoke; in our implementation we use MO_8 to indicate
bfloat16, as with the other B16B16 insns.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper.h        | 2 ++
 target/arm/tcg/sve.decode      | 1 +
 target/arm/tcg/translate-sve.c | 2 +-
 target/arm/tcg/vec_helper.c    | 1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h
index d9ca5b7c56e..4da32db9021 100644
--- a/target/arm/tcg/helper.h
+++ b/target/arm/tcg/helper.h
@@ -823,6 +823,8 @@ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_ftsmul_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fmul_idx_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_idx_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_idx_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 2efd5f57e45..a76f2236f43 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1062,6 +1062,7 @@ FMLS_zzxz       01100100 11 1 ..... 000001 ..... .....  @rrxr_1 esz=3
 ### SVE FP Multiply Indexed Group
 
 # SVE floating-point multiply (indexed)
+FMUL_zzx        01100100 0. 1 ..... 001010 ..... .....   @rrx_3 esz=0
 FMUL_zzx        01100100 0. 1 ..... 001000 ..... .....   @rrx_3 esz=1
 FMUL_zzx        01100100 10 1 ..... 001000 ..... .....   @rrx_2 esz=2
 FMUL_zzx        01100100 11 1 ..... 001000 ..... .....   @rrx_1 esz=3
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 27af3df9a4b..918cf6e1bd4 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3907,7 +3907,7 @@ TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
  */
 
 static gen_helper_gvec_3_ptr * const fmul_idx_fns[4] = {
-    NULL,                       gen_helper_gvec_fmul_idx_h,
+    gen_helper_gvec_fmul_idx_b16, gen_helper_gvec_fmul_idx_h,
     gen_helper_gvec_fmul_idx_s, gen_helper_gvec_fmul_idx_d,
 };
 TRANS_FEAT(FMUL_zzx, aa64_sve, gen_gvec_fpst_zzz,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 76a9ab0da39..33a136b90a6 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -1785,6 +1785,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm,                            \
 
 #define nop(N, M, S) (M)
 
+DO_FMUL_IDX(gvec_fmul_idx_b16, nop, bfloat16_mul, float16, H2)
 DO_FMUL_IDX(gvec_fmul_idx_h, nop, float16_mul, float16, H2)
 DO_FMUL_IDX(gvec_fmul_idx_s, nop, float32_mul, float32, H4)
 DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64_mul, float64, H8)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (3 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:40   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed) Peter Maydell
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds bfloat16 versions of the FMLA and FMLS insns in
the "SVE floating-point multiply-accumulate writing addend" group,
encoded as sz=0b00.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/helper-sve.h    | 14 +++++++
 target/arm/tcg/sve_helper.c    | 69 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/translate-sve.c | 21 ++++++++---
 3 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h
index cb6c2355e58..5e4b7fd8cf4 100644
--- a/target/arm/tcg/helper-sve.h
+++ b/target/arm/tcg/helper-sve.h
@@ -1541,6 +1541,8 @@ DEF_HELPER_FLAGS_6(sve_fcadd_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fcadd_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1548,6 +1550,8 @@ DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1555,6 +1559,8 @@ DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1562,6 +1568,8 @@ DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1569,6 +1577,8 @@ DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1576,6 +1586,8 @@ DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_ah_fmls_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
@@ -1583,6 +1595,8 @@ DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_7(sve_ah_fnmla_zpzzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 
+DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_b16, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, fpst, i32)
 DEF_HELPER_FLAGS_7(sve_ah_fnmls_zpzzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 1a56fa86d9c..105cc5dd122 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5099,6 +5099,75 @@ DO_ZPZ_FP(flogb_d, float64, H1_8, do_float64_logb_as_int)
 
 #undef DO_ZPZ_FP
 
+static void do_fmla_zpzzz_b16(void *vd, void *vn, void *vm, void *va, void *vg,
+                              float_status *status, uint32_t desc,
+                              uint16_t neg1, uint16_t neg3, int flags)
+{
+    intptr_t i = simd_oprsz(desc);
+    uint64_t *g = vg;
+
+    do {
+        uint64_t pg = g[(i - 1) >> 6];
+        do {
+            i -= 2;
+            if (likely((pg >> (i & 63)) & 1)) {
+                float16 e1, e2, e3, r;
+
+                e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1;
+                e2 = *(uint16_t *)(vm + H1_2(i));
+                e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3;
+                r = bfloat16_muladd(e1, e2, e3, flags, status);
+                *(uint16_t *)(vd + H1_2(i)) = r;
+            }
+        } while (i & 63);
+    } while (i != 0);
+}
+
+void HELPER(sve_fmla_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0, 0, 0);
+}
+
+void HELPER(sve_fmls_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0x8000, 0, 0);
+}
+
+void HELPER(sve_fnmla_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0x8000, 0x8000, 0);
+}
+
+void HELPER(sve_fnmls_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0, 0x8000, 0);
+}
+
+void HELPER(sve_ah_fmls_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                              void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0, 0,
+                      float_muladd_negate_product);
+}
+
+void HELPER(sve_ah_fnmla_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0, 0,
+                      float_muladd_negate_product | float_muladd_negate_c);
+}
+
+void HELPER(sve_ah_fnmls_zpzzz_b16)(void *vd, void *vn, void *vm, void *va,
+                               void *vg, float_status *status, uint32_t desc)
+{
+    do_fmla_zpzzz_b16(vd, vn, vm, va, vg, status, desc, 0, 0,
+                      float_muladd_negate_c);
+}
+
 static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
                             float_status *status, uint32_t desc,
                             uint16_t neg1, uint16_t neg3, int flags)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 918cf6e1bd4..37ecbc2b7c0 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4368,19 +4368,28 @@ TRANS_FEAT(FCADD, aa64_sve, gen_gvec_fpst_zzzp, fcadd_fns[a->esz],
            a->rd, a->rn, a->rm, a->pg, a->rot | (s->fpcr_ah << 1),
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
+static bool do_fmla_zpzzz(DisasContext *s, arg_rprrr_esz *a,
+                          gen_helper_gvec_5_ptr *fn)
+{
+    /* These insns use MO_8 to encode BFloat16 */
+    if (a->esz == MO_8 && !dc_isar_feature(aa64_sve_b16b16, s)) {
+        return false;
+    }
+    return gen_gvec_fpst_zzzzp(s, fn, a->rd, a->rn, a->rm, a->ra, a->pg, 0,
+                               a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+}
+
 #define DO_FMLA(NAME, name, ah_name)                                    \
     static gen_helper_gvec_5_ptr * const name##_fns[4] = {              \
-        NULL, gen_helper_sve_##name##_h,                                \
+        gen_helper_sve_##name##_b16, gen_helper_sve_##name##_h,         \
         gen_helper_sve_##name##_s, gen_helper_sve_##name##_d            \
     };                                                                  \
     static gen_helper_gvec_5_ptr * const name##_ah_fns[4] = {           \
-        NULL, gen_helper_sve_##ah_name##_h,                             \
+        gen_helper_sve_##ah_name##_b16, gen_helper_sve_##ah_name##_h,   \
         gen_helper_sve_##ah_name##_s, gen_helper_sve_##ah_name##_d      \
     };                                                                  \
-    TRANS_FEAT(NAME, aa64_sve, gen_gvec_fpst_zzzzp,                     \
-               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz], \
-               a->rd, a->rn, a->rm, a->ra, a->pg, 0,                    \
-               a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+    TRANS_FEAT(NAME, aa64_sve, do_fmla_zpzzz, a,                        \
+               s->fpcr_ah ? name##_ah_fns[a->esz] : name##_fns[a->esz])
 
 /* We don't need an ah_fmla_zpzzz because fmla doesn't negate anything */
 DO_FMLA(FMLA_zpzzz, fmla_zpzzz, fmla_zpzzz)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed)
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (4 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:42   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV Peter Maydell
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

FEAT_SVE_B16B16 adds bfloat16 versions of the FMLA and FMLS insns in
the SVE floating-point multiply-add (indexed) insn group.  Implement
these.

Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve.decode      |  2 ++
 target/arm/tcg/translate-sve.c | 25 ++++++++++++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index a76f2236f43..a77b725c876 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1052,9 +1052,11 @@ FCMLA_zzxz      01100100 11 1 index:1 rm:4 0001 rot:2 rn:5 rd:5 \
 ### SVE FP Multiply-Add Indexed Group
 
 # SVE floating-point multiply-add (indexed)
+FMLA_zzxz       01100100 0. 1 ..... 000010 ..... .....  @rrxr_3 esz=0
 FMLA_zzxz       01100100 0. 1 ..... 000000 ..... .....  @rrxr_3 esz=1
 FMLA_zzxz       01100100 10 1 ..... 000000 ..... .....  @rrxr_2 esz=2
 FMLA_zzxz       01100100 11 1 ..... 000000 ..... .....  @rrxr_1 esz=3
+FMLS_zzxz       01100100 0. 1 ..... 000011 ..... .....  @rrxr_3 esz=0
 FMLS_zzxz       01100100 0. 1 ..... 000001 ..... .....  @rrxr_3 esz=1
 FMLS_zzxz       01100100 10 1 ..... 000001 ..... .....  @rrxr_2 esz=2
 FMLS_zzxz       01100100 11 1 ..... 000001 ..... .....  @rrxr_1 esz=3
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 37ecbc2b7c0..fc76624b5a1 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -3883,24 +3883,31 @@ DO_SVE2_RRXR_ROT(CDOT_zzxw_d, gen_helper_sve2_cdot_idx_d)
  *** SVE Floating Point Multiply-Add Indexed Group
  */
 
+static bool do_fmla_zzxz(DisasContext *s, arg_rrxr_esz *a,
+                         gen_helper_gvec_4_ptr *fn)
+{
+    /* These insns use MO_8 to encode BFloat16 */
+    if (a->esz == MO_8 && !dc_isar_feature(aa64_sve_b16b16, s)) {
+        return false;
+    }
+    return gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, a->index,
+                              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64);
+}
+
 static gen_helper_gvec_4_ptr * const fmla_idx_fns[4] = {
-    NULL,                       gen_helper_gvec_fmla_idx_h,
+    gen_helper_gvec_bfmla_idx, gen_helper_gvec_fmla_idx_h,
     gen_helper_gvec_fmla_idx_s, gen_helper_gvec_fmla_idx_d
 };
-TRANS_FEAT(FMLA_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
-           fmla_idx_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->index,
-           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+TRANS_FEAT(FMLA_zzxz, aa64_sve, do_fmla_zzxz, a, fmla_idx_fns[a->esz])
 
 static gen_helper_gvec_4_ptr * const fmls_idx_fns[4][2] = {
-    { NULL, NULL },
+    { gen_helper_gvec_bfmls_idx, gen_helper_gvec_ah_bfmls_idx },
     { gen_helper_gvec_fmls_idx_h, gen_helper_gvec_ah_fmls_idx_h },
     { gen_helper_gvec_fmls_idx_s, gen_helper_gvec_ah_fmls_idx_s },
     { gen_helper_gvec_fmls_idx_d, gen_helper_gvec_ah_fmls_idx_d },
 };
-TRANS_FEAT(FMLS_zzxz, aa64_sve, gen_gvec_fpst_zzzz,
-           fmls_idx_fns[a->esz][s->fpcr_ah],
-           a->rd, a->rn, a->rm, a->ra, a->index,
-           a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
+TRANS_FEAT(FMLS_zzxz, aa64_sve, do_fmla_zzxz, a,
+           fmls_idx_fns[a->esz][s->fpcr_ah])
 
 /*
  *** SVE Floating Point Multiply Indexed Group
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (5 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed) Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:43   ` Richard Henderson
  2025-07-21  8:21   ` Philippe Mathieu-Daudé
  2025-07-18 17:30 ` [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE Peter Maydell
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

When we implemented the FMAXQV and FMINQV insns we accidentally
inverted the sense of the FPCR.AH test, so we gave the AH=1 behaviour
when FPCR.AH was zero, and vice-versa.  (The difference is limited to
hadling of negative zero and NaN inputs.)

Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-sve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index fc76624b5a1..2ed440aff15 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -4020,7 +4020,7 @@ static gen_helper_gvec_3_ptr * const fmaxqv_ah_fns[4] = {
     gen_helper_sve2p1_ah_fmaxqv_s, gen_helper_sve2p1_ah_fmaxqv_d,
 };
 TRANS_FEAT(FMAXQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz,
-           (s->fpcr_ah ? fmaxqv_fns : fmaxqv_ah_fns)[a->esz], a, 0,
+           (s->fpcr_ah ? fmaxqv_ah_fns : fmaxqv_fns)[a->esz], a, 0,
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 static gen_helper_gvec_3_ptr * const fminqv_fns[4] = {
@@ -4032,7 +4032,7 @@ static gen_helper_gvec_3_ptr * const fminqv_ah_fns[4] = {
     gen_helper_sve2p1_ah_fminqv_s, gen_helper_sve2p1_ah_fminqv_d,
 };
 TRANS_FEAT(FMINQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz,
-           (s->fpcr_ah ? fminqv_fns : fminqv_ah_fns)[a->esz], a, 0,
+           (s->fpcr_ah ? fminqv_ah_fns : fminqv_fns)[a->esz], a, 0,
            a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
 
 /*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (6 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:44   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV Peter Maydell
  2025-07-18 17:30 ` [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u Peter Maydell
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In the part of the SVE DO_REDUCE macro used by the SVE2p1 FMAXQV,
FMINQV, etc insns, we incorrectly applied the H() macro twice when
calculating an offset to add to the vn pointer.  This has no effect
on little-endian hosts but on big-endian hosts the two invocations
will cancel each other out and we will access the wrong part of the
array.

The "s * 16" part of the expression is already aligned, so we only
need to use the H macro on the "e". Correct the macro usage.

Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index 105cc5dd122..bf894f0bf13 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4509,7 +4509,7 @@ void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg,     \
         TYPE data[ARM_MAX_VQ];                                        \
         for (unsigned s = 0; s < segments; s++) {                     \
             uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2));            \
-            TYPE nn = *(TYPE *)(vn + H(s * 16 + H(e)));               \
+            TYPE nn = *(TYPE *)(vn + (s * 16 + H(e)));                \
             data[s] = (pg >> e) & 1 ? nn : IDENT;                     \
         }                                                             \
         *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (7 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:45   ` Richard Henderson
  2025-07-18 17:30 ` [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u Peter Maydell
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The FMAXNMQV and FMINNMQV insns use the default NaN as their identity
value for inactive source vector elements. We open-coded this in
sve_helper.c, hoping to avoid a function call. However, this fails
to account for FPCR.AH=1 changing the default NaN value to set the
sign bit. Use a call to floatN_default_nan() to obtain this value.

Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve_helper.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index bf894f0bf13..803f0a094dc 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -4484,33 +4484,35 @@ static TYPE FUNC##_reduce(TYPE *data, float_status *status, uintptr_t n) \
     }                                                                 \
 }                                                                     \
 uint64_t helper_sve_##NAME##v_##SUF(void *vn, void *vg,               \
-                                    float_status *s, uint32_t desc)   \
+                                    float_status *status, uint32_t desc) \
 {                                                                     \
     uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_data(desc);   \
     TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)];                   \
+    TYPE ident = IDENT;                                               \
     for (i = 0; i < oprsz; ) {                                        \
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));               \
         do {                                                          \
             TYPE nn = *(TYPE *)(vn + H(i));                           \
-            *(TYPE *)((void *)data + i) = (pg & 1 ? nn : IDENT);      \
+            *(TYPE *)((void *)data + i) = (pg & 1 ? nn : ident);      \
             i += sizeof(TYPE), pg >>= sizeof(TYPE);                   \
         } while (i & 15);                                             \
     }                                                                 \
     for (; i < maxsz; i += sizeof(TYPE)) {                            \
-        *(TYPE *)((void *)data + i) = IDENT;                          \
+        *(TYPE *)((void *)data + i) = ident;                          \
     }                                                                 \
-    return FUNC##_reduce(data, s, maxsz / sizeof(TYPE));              \
+    return FUNC##_reduce(data, status, maxsz / sizeof(TYPE));         \
 }                                                                     \
 void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg,     \
                                     float_status *status, uint32_t desc) \
 {                                                                     \
     unsigned oprsz = simd_oprsz(desc), segments = oprsz / 16;         \
+    TYPE ident = IDENT;                                               \
     for (unsigned e = 0; e < 16; e += sizeof(TYPE)) {                 \
         TYPE data[ARM_MAX_VQ];                                        \
         for (unsigned s = 0; s < segments; s++) {                     \
             uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2));            \
             TYPE nn = *(TYPE *)(vn + (s * 16 + H(e)));                \
-            data[s] = (pg >> e) & 1 ? nn : IDENT;                     \
+            data[s] = (pg >> e) & 1 ? nn : ident;                     \
         }                                                             \
         *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \
     }                                                                 \
@@ -4521,14 +4523,17 @@ DO_REDUCE(fadd,h, float16, H1_2, float16_add, float16_zero)
 DO_REDUCE(fadd,s, float32, H1_4, float32_add, float32_zero)
 DO_REDUCE(fadd,d, float64, H1_8, float64_add, float64_zero)
 
-/* Identity is floatN_default_nan, without the function call.  */
-DO_REDUCE(fminnm,h, float16, H1_2, float16_minnum, 0x7E00)
-DO_REDUCE(fminnm,s, float32, H1_4, float32_minnum, 0x7FC00000)
-DO_REDUCE(fminnm,d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL)
+/*
+ * We can't avoid the function call for the default NaN value, because
+ * it changes when FPCR.AH is set.
+ */
+DO_REDUCE(fminnm,h, float16, H1_2, float16_minnum, float16_default_nan(status))
+DO_REDUCE(fminnm,s, float32, H1_4, float32_minnum, float32_default_nan(status))
+DO_REDUCE(fminnm,d, float64, H1_8, float64_minnum, float64_default_nan(status))
 
-DO_REDUCE(fmaxnm,h, float16, H1_2, float16_maxnum, 0x7E00)
-DO_REDUCE(fmaxnm,s, float32, H1_4, float32_maxnum, 0x7FC00000)
-DO_REDUCE(fmaxnm,d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL)
+DO_REDUCE(fmaxnm,h, float16, H1_2, float16_maxnum, float16_default_nan(status))
+DO_REDUCE(fmaxnm,s, float32, H1_4, float32_maxnum, float32_default_nan(status))
+DO_REDUCE(fmaxnm,d, float64, H1_8, float64_maxnum, float64_default_nan(status))
 
 DO_REDUCE(fmin,h, float16, H1_2, float16_min, float16_infinity)
 DO_REDUCE(fmin,s, float32, H1_4, float32_min, float32_infinity)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u
  2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
                   ` (8 preceding siblings ...)
  2025-07-18 17:30 ` [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV Peter Maydell
@ 2025-07-18 17:30 ` Peter Maydell
  2025-07-18 21:46   ` Richard Henderson
  9 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2025-07-18 17:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

For the LD1Q instruction (gather load of quadwords) we use the
LD1_zprz pattern with MO_128 elements.  At this element size there is
no signed vs unsigned distinction, and we only set the 'u' bit in the
arg_LD1_zprz struct because we share the code and decode struct with
smaller element sizes.

However, we set u=0 in the decode pattern line but then accidentally
asserted that it was 1 in the trans function.  Since our usual convention
is that the "default" is unsigned and we only mark operations as signed
when they really do need to extend, change the decode pattern line to
set u=1 to match the assert.

Fixes: d2aa9a804ee6 ("target/arm: Implement LD1Q, ST1Q for SVE2p1")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sve.decode | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index a77b725c876..aea7f519730 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1345,7 +1345,7 @@ LD1_zprz        1100010 11 1. ..... 11. ... ..... ..... \
 
 # LD1Q
 LD1_zprz        1100 0100 000 rm:5 101 pg:3 rn:5 rd:5 \
-                &rprr_gather_load u=0 ff=0 xs=2 esz=4 msz=4 scale=0
+                &rprr_gather_load u=1 ff=0 xs=2 esz=4 msz=4 scale=0
 
 # SVE 64-bit gather load (vector plus immediate)
 LD1_zpiz        1100010 .. 01 ..... 1.. ... ..... ..... \
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated)
  2025-07-18 17:30 ` [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated) Peter Maydell
@ 2025-07-18 21:29   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:29 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
> (unpredicated) instructions, which are encoded via sz==0b00.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper.h        | 3 +++
>   target/arm/tcg/translate-sve.c | 6 +++++-
>   target/arm/tcg/vec_helper.c    | 3 +++
>   3 files changed, 11 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated)
  2025-07-18 17:30 ` [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated) Peter Maydell
@ 2025-07-18 21:32   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:32 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
> (predicated) instructions, which are encoded via sz=0b00.
> Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM; these are all the insns
> in this group which do not change behaviour for AH=1.
> 
> We will deal with BFMAX/BFMIN (which do have different AH=1
> behaviour) in a following commit.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 10 ++++++++++
>   target/arm/tcg/sve_helper.c    |  5 +++++
>   target/arm/tcg/translate-sve.c | 22 +++++++++++++++++-----
>   3 files changed, 32 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated)
  2025-07-18 17:30 ` [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated) Peter Maydell
@ 2025-07-18 21:34   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:34 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds bfloat16 versions of the SVE floating point
> (predicated) instructions, which are encoded via sz=0b00.  Add the
> BFMAX and BFMIN insns.  These have separate behaviour for AH=1 and
> AH=0; we have already implemented the AH=1 helper for the SME2
> versions of these insns.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    |  8 ++++++++
>   target/arm/tcg/sve_helper.c    |  4 ++++
>   target/arm/tcg/translate-sve.c | 17 +++++++++++++++--
>   3 files changed, 27 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed)
  2025-07-18 17:30 ` [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed) Peter Maydell
@ 2025-07-18 21:38   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:38 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds a bfloat16 version of the FMUL insn in the
> floating-point multiply (indexed) instruction group. The encoding
> is slightly bespoke; in our implementation we use MO_8 to indicate
> bfloat16, as with the other B16B16 insns.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper.h        | 2 ++
>   target/arm/tcg/sve.decode      | 1 +
>   target/arm/tcg/translate-sve.c | 2 +-
>   target/arm/tcg/vec_helper.c    | 1 +
>   4 files changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors)
  2025-07-18 17:30 ` [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors) Peter Maydell
@ 2025-07-18 21:40   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds bfloat16 versions of the FMLA and FMLS insns in
> the "SVE floating-point multiply-accumulate writing addend" group,
> encoded as sz=0b00.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/helper-sve.h    | 14 +++++++
>   target/arm/tcg/sve_helper.c    | 69 ++++++++++++++++++++++++++++++++++
>   target/arm/tcg/translate-sve.c | 21 ++++++++---
>   3 files changed, 98 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed)
  2025-07-18 17:30 ` [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed) Peter Maydell
@ 2025-07-18 21:42   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:42 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> FEAT_SVE_B16B16 adds bfloat16 versions of the FMLA and FMLS insns in
> the SVE floating-point multiply-add (indexed) insn group.  Implement
> these.
> 
> Fixes: 7b1613a1020d2942 ("target/arm: Enable FEAT_SME2p1 on -cpu max")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve.decode      |  2 ++
>   target/arm/tcg/translate-sve.c | 25 ++++++++++++++++---------
>   2 files changed, 18 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV
  2025-07-18 17:30 ` [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV Peter Maydell
@ 2025-07-18 21:43   ` Richard Henderson
  2025-07-21  8:21   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> When we implemented the FMAXQV and FMINQV insns we accidentally
> inverted the sense of the FPCR.AH test, so we gave the AH=1 behaviour
> when FPCR.AH was zero, and vice-versa.  (The difference is limited to
> hadling of negative zero and NaN inputs.)
> 
> Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-sve.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
> index fc76624b5a1..2ed440aff15 100644
> --- a/target/arm/tcg/translate-sve.c
> +++ b/target/arm/tcg/translate-sve.c
> @@ -4020,7 +4020,7 @@ static gen_helper_gvec_3_ptr * const fmaxqv_ah_fns[4] = {
>       gen_helper_sve2p1_ah_fmaxqv_s, gen_helper_sve2p1_ah_fmaxqv_d,
>   };
>   TRANS_FEAT(FMAXQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz,
> -           (s->fpcr_ah ? fmaxqv_fns : fmaxqv_ah_fns)[a->esz], a, 0,
> +           (s->fpcr_ah ? fmaxqv_ah_fns : fmaxqv_fns)[a->esz], a, 0,
>              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
>   
>   static gen_helper_gvec_3_ptr * const fminqv_fns[4] = {
> @@ -4032,7 +4032,7 @@ static gen_helper_gvec_3_ptr * const fminqv_ah_fns[4] = {
>       gen_helper_sve2p1_ah_fminqv_s, gen_helper_sve2p1_ah_fminqv_d,
>   };
>   TRANS_FEAT(FMINQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz,
> -           (s->fpcr_ah ? fminqv_fns : fminqv_ah_fns)[a->esz], a, 0,
> +           (s->fpcr_ah ? fminqv_ah_fns : fminqv_fns)[a->esz], a, 0,
>              a->esz == MO_16 ? FPST_A64_F16 : FPST_A64)
>   
>   /*

Whoopsie.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE
  2025-07-18 17:30 ` [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE Peter Maydell
@ 2025-07-18 21:44   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:44 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> In the part of the SVE DO_REDUCE macro used by the SVE2p1 FMAXQV,
> FMINQV, etc insns, we incorrectly applied the H() macro twice when
> calculating an offset to add to the vn pointer.  This has no effect
> on little-endian hosts but on big-endian hosts the two invocations
> will cancel each other out and we will access the wrong part of the
> array.
> 
> The "s * 16" part of the expression is already aligned, so we only
> need to use the H macro on the "e". Correct the macro usage.
> 
> Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve_helper.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
> index 105cc5dd122..bf894f0bf13 100644
> --- a/target/arm/tcg/sve_helper.c
> +++ b/target/arm/tcg/sve_helper.c
> @@ -4509,7 +4509,7 @@ void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg,     \
>           TYPE data[ARM_MAX_VQ];                                        \
>           for (unsigned s = 0; s < segments; s++) {                     \
>               uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2));            \
> -            TYPE nn = *(TYPE *)(vn + H(s * 16 + H(e)));               \
> +            TYPE nn = *(TYPE *)(vn + (s * 16 + H(e)));                \
>               data[s] = (pg >> e) & 1 ? nn : IDENT;                     \
>           }                                                             \
>           *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV
  2025-07-18 17:30 ` [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV Peter Maydell
@ 2025-07-18 21:45   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:45 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> The FMAXNMQV and FMINNMQV insns use the default NaN as their identity
> value for inactive source vector elements. We open-coded this in
> sve_helper.c, hoping to avoid a function call. However, this fails
> to account for FPCR.AH=1 changing the default NaN value to set the
> sign bit. Use a call to floatN_default_nan() to obtain this value.
> 
> Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve_helper.c | 29 +++++++++++++++++------------
>   1 file changed, 17 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u
  2025-07-18 17:30 ` [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u Peter Maydell
@ 2025-07-18 21:46   ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2025-07-18 21:46 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 7/18/25 10:30, Peter Maydell wrote:
> For the LD1Q instruction (gather load of quadwords) we use the
> LD1_zprz pattern with MO_128 elements.  At this element size there is
> no signed vs unsigned distinction, and we only set the 'u' bit in the
> arg_LD1_zprz struct because we share the code and decode struct with
> smaller element sizes.
> 
> However, we set u=0 in the decode pattern line but then accidentally
> asserted that it was 1 in the trans function.  Since our usual convention
> is that the "default" is unsigned and we only mark operations as signed
> when they really do need to extend, change the decode pattern line to
> set u=1 to match the assert.
> 
> Fixes: d2aa9a804ee6 ("target/arm: Implement LD1Q, ST1Q for SVE2p1")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/sve.decode | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
> index a77b725c876..aea7f519730 100644
> --- a/target/arm/tcg/sve.decode
> +++ b/target/arm/tcg/sve.decode
> @@ -1345,7 +1345,7 @@ LD1_zprz        1100010 11 1. ..... 11. ... ..... ..... \
>   
>   # LD1Q
>   LD1_zprz        1100 0100 000 rm:5 101 pg:3 rn:5 rd:5 \
> -                &rprr_gather_load u=0 ff=0 xs=2 esz=4 msz=4 scale=0
> +                &rprr_gather_load u=1 ff=0 xs=2 esz=4 msz=4 scale=0
>   
>   # SVE 64-bit gather load (vector plus immediate)
>   LD1_zpiz        1100010 .. 01 ..... 1.. ... ..... ..... \

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV
  2025-07-18 17:30 ` [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV Peter Maydell
  2025-07-18 21:43   ` Richard Henderson
@ 2025-07-21  8:21   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 22+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-21  8:21 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 18/7/25 19:30, Peter Maydell wrote:
> When we implemented the FMAXQV and FMINQV insns we accidentally
> inverted the sense of the FPCR.AH test, so we gave the AH=1 behaviour
> when FPCR.AH was zero, and vice-versa.  (The difference is limited to
> hadling of negative zero and NaN inputs.)

Typo "handling"

> 
> Fixes: 1de7ecfc12d05 ("target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1")
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/tcg/translate-sve.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-07-21  8:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-18 17:30 [PATCH for-10.1 00/10] target/arm: Some SVE2p1 fixes Peter Maydell
2025-07-18 17:30 ` [PATCH for-10.1 01/10] target/arm: Add BFADD, BFSUB, BFMUL (unpredicated) Peter Maydell
2025-07-18 21:29   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 02/10] target/arm: Add BFADD, BFSUB, BFMUL, BFMAXNM, BFMINNM (predicated) Peter Maydell
2025-07-18 21:32   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 03/10] target/arm: Add BFMIN, BFMAX (predicated) Peter Maydell
2025-07-18 21:34   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 04/10] target/arm: Add BFMUL (indexed) Peter Maydell
2025-07-18 21:38   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 05/10] target/arm: Add BFMLA, BFMLS (vectors) Peter Maydell
2025-07-18 21:40   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 06/10] target/arm: Add BFMLA, BFMLS (indexed) Peter Maydell
2025-07-18 21:42   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 07/10] target/arm: Correct sense of FPCR.AH test for FMAXQV and FMINQV Peter Maydell
2025-07-18 21:43   ` Richard Henderson
2025-07-21  8:21   ` Philippe Mathieu-Daudé
2025-07-18 17:30 ` [PATCH for-10.1 08/10] target/arm: Don't nest H() macro calls in SVE DO_REDUCE Peter Maydell
2025-07-18 21:44   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 09/10] target/arm: Honour FPCR.AH=1 default NaN value in FMAXNMQV, FMINNMQV Peter Maydell
2025-07-18 21:45   ` Richard Henderson
2025-07-18 17:30 ` [PATCH for-10.1 10/10] target/arm: Make LD1Q decode and trans fn agree about a->u Peter Maydell
2025-07-18 21:46   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).