* [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16
@ 2026-06-25 1:51 Richard Henderson
2026-06-25 1:51 ` [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16 Richard Henderson
` (11 more replies)
0 siblings, 12 replies; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Another minor feature working toward SME2.2.
r~
Richard Henderson (10):
target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16
target/arm: Rename SME FMLAL/FMLSL patterns
target/arm: Implement FMLAL (multiple, multiple and single, FP8 to
FP16)
target/arm: Implement FMLAL (multiple and indexed, FP8 to FP16)
target/arm: Implement FDOT (multiple, multiple and single, FP8 to
FP16)
target/arm: Implement DOT (multiple and indexed, FP8 to FP16)
target/arm: Implement FMOPA (widening, 2-way, FP8 to FP16)
target/arm: Rename FVDOT pattern
target/arm: Implement FVDOT (FP8 to FP16)
target/arm: Enable FEAT_SME_F8F16 for -cpu max
target/arm/cpu-features.h | 11 +++++
target/arm/tcg/helper-fp8-defs.h | 2 +
linux-user/aarch64/elfload.c | 1 +
target/arm/tcg/cpu64.c | 1 +
target/arm/tcg/fp8_helper.c | 58 +++++++++++++++++++++++++
target/arm/tcg/translate-sme.c | 72 +++++++++++++++++++++++---------
docs/system/arm/emulation.rst | 1 +
target/arm/tcg/sme.decode | 66 +++++++++++++++++++++--------
8 files changed, 176 insertions(+), 36 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:03 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns Richard Henderson
` (10 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
These two instructions can be enabled with either
FEAT_SME_F8F16 or FEAT_SME_F16F16.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu-features.h | 11 +++++++++++
target/arm/tcg/translate-sme.c | 4 ++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 45213f71ab..cd78353609 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -1605,6 +1605,11 @@ static inline bool isar_feature_aa64_sme_f8f32(const ARMISARegisters *id)
return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F8F32);
}
+static inline bool isar_feature_aa64_sme_f8f16(const ARMISARegisters *id)
+{
+ return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F8F16);
+}
+
static inline bool isar_feature_aa64_sme_f16f16(const ARMISARegisters *id)
{
return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F16F16);
@@ -1776,6 +1781,12 @@ isar_feature_aa64_sme2_or_sve2_lut(const ARMISARegisters *id)
return isar_feature_aa64_sme2_or_sve2(id) && isar_feature_aa64_lut(id);
}
+static inline bool
+isar_feature_aa64_sme_f16f16_or_f8f16(const ARMISARegisters *id)
+{
+ return isar_feature_aa64_sme_f16f16(id) && isar_feature_aa64_sme_f8f16(id);
+}
+
/*
* Feature tests for "does this exist in either 32-bit or 64-bit?"
*/
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index a79b0a9b80..b8dde80c20 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1335,9 +1335,9 @@ static bool do_faddsub(DisasContext *s, arg_az_n *a, ARMFPStatusFlavour fpst,
return true;
}
-TRANS_FEAT(FADD_nn_h, aa64_sme_f16f16, do_faddsub, a,
+TRANS_FEAT(FADD_nn_h, aa64_sme_f16f16_or_f8f16, do_faddsub, a,
FPST_ZA_F16, gen_helper_gvec_fadd_h)
-TRANS_FEAT(FSUB_nn_h, aa64_sme_f16f16, do_faddsub, a,
+TRANS_FEAT(FSUB_nn_h, aa64_sme_f16f16_or_f8f16, do_faddsub, a,
FPST_ZA_F16, gen_helper_gvec_fsub_h)
TRANS_FEAT(FADD_nn_s, aa64_sme2, do_faddsub, a,
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
2026-06-25 1:51 ` [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16 Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-25 10:17 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16) Richard Henderson
` (9 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Rename patterns to include _sh suffix, so that we can
distinguish insns of the same name from FEAT_SME_F8F16.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 12 ++++++------
target/arm/tcg/sme.decode | 32 ++++++++++++++++----------------
2 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index b8dde80c20..7aa270d3ec 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1107,10 +1107,10 @@ static bool do_fmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi)
multi, FPST_ENV, gen_helper_sve2_fmlal_zzzw_s);
}
-TRANS_FEAT(FMLAL_n1, aa64_sme2, do_fmlal, a, false, false)
-TRANS_FEAT(FMLSL_n1, aa64_sme2, do_fmlal, a, true, false)
-TRANS_FEAT(FMLAL_nn, aa64_sme2, do_fmlal, a, false, true)
-TRANS_FEAT(FMLSL_nn, aa64_sme2, do_fmlal, a, true, true)
+TRANS_FEAT(FMLAL_n1_sh, aa64_sme2, do_fmlal, a, false, false)
+TRANS_FEAT(FMLSL_n1_sh, aa64_sme2, do_fmlal, a, true, false)
+TRANS_FEAT(FMLAL_nn_sh, aa64_sme2, do_fmlal, a, false, true)
+TRANS_FEAT(FMLSL_nn_sh, aa64_sme2, do_fmlal, a, true, true)
static bool do_fmlall_fp8(DisasContext *s, arg_azz_n *a, bool multi)
{
@@ -1128,8 +1128,8 @@ static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub)
false, FPST_ENV, gen_helper_sve2_fmlal_zzxw_s);
}
-TRANS_FEAT(FMLAL_nx, aa64_sme2, do_fmlal_nx, a, false)
-TRANS_FEAT(FMLSL_nx, aa64_sme2, do_fmlal_nx, a, true)
+TRANS_FEAT(FMLAL_nx_sh, aa64_sme2, do_fmlal_nx, a, false)
+TRANS_FEAT(FMLSL_nx_sh, aa64_sme2, do_fmlal_nx, a, true)
static bool do_bfmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi)
{
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index 1de5f341ef..90ee161461 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -324,13 +324,13 @@ SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4
@azz_nx1_o2x2 ........ ... . zm:4 . .. ... zn:5 ... .. \
&azz_n off=%off2_x2 rv=%mova_rv
-FMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1
-FMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2
-FMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4
+FMLAL_n1_sh 11000001 001 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1
+FMLAL_n1_sh 11000001 001 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2
+FMLAL_n1_sh 11000001 001 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4
-FMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1
-FMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2
-FMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4
+FMLSL_n1_sh 11000001 001 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1
+FMLSL_n1_sh 11000001 001 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2
+FMLSL_n1_sh 11000001 001 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4
BFMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1
BFMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2
@@ -477,11 +477,11 @@ SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3
@azz_4x4_o2x2 ........ ... ..... . .. ... ..... ... .. \
&azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off2_x2
-FMLAL_nn 11000001 101 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2
-FMLAL_nn 11000001 101 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2
+FMLAL_nn_sh 11000001 101 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2
+FMLAL_nn_sh 11000001 101 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2
-FMLSL_nn 11000001 101 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2
-FMLSL_nn 11000001 101 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2
+FMLSL_nn_sh 11000001 101 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2
+FMLSL_nn_sh 11000001 101 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2
BFMLAL_nn 11000001 101 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2
BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2
@@ -617,13 +617,13 @@ BFSUB_nn 11000001 111 00101 0 .. 111 ...00 01 ... @az_4x4_o3
@azx_4x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \
&azx_n n=4 rv=%mova_rv off=%off2_x2 zn=%zn_ax4 idx=%idx2_10_2
-FMLAL_nx 11000001 1000 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2
-FMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2
-FMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2
+FMLAL_nx_sh 11000001 1000 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2
+FMLAL_nx_sh 11000001 1001 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2
+FMLAL_nx_sh 11000001 1001 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2
-FMLSL_nx 11000001 1000 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2
-FMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2
-FMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2
+FMLSL_nx_sh 11000001 1000 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2
+FMLSL_nx_sh 11000001 1001 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2
+FMLSL_nx_sh 11000001 1001 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2
BFMLAL_nx 11000001 1000 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2
BFMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
2026-06-25 1:51 ` [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16 Richard Henderson
2026-06-25 1:51 ` [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:11 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, " Richard Henderson
` (8 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 9 +++++++++
target/arm/tcg/sme.decode | 7 +++++++
2 files changed, 16 insertions(+)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 7aa270d3ec..7cb7b71e74 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1121,6 +1121,15 @@ static bool do_fmlall_fp8(DisasContext *s, arg_azz_n *a, bool multi)
TRANS_FEAT(FMLALL_n1_b, aa64_sme_f8f32, do_fmlall_fp8, a, false)
TRANS_FEAT(FMLALL_nn_b, aa64_sme_f8f32, do_fmlall_fp8, a, true)
+static bool do_fmlal_fp8(DisasContext *s, arg_azz_n *a, bool multi)
+{
+ return do_azz_acc_fp8(s, a->n, 2, a->rv, a->off, a->zn, a->zm,
+ 0, 0, multi, gen_helper_gvec_fmla_hb);
+}
+
+TRANS_FEAT(FMLAL_n1_hb, aa64_sme_f8f16, do_fmlal_fp8, a, false)
+TRANS_FEAT(FMLAL_nn_hb, aa64_sme_f8f16, do_fmlal_fp8, a, true)
+
static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub)
{
return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm,
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index 90ee161461..b735f3de82 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -452,6 +452,10 @@ FMLALL_n1_b 11000001 001 1 .... 0 .. 000 ..... 0001 . @azz_nx1_o1x4 n=4
FDOT_n1_sb 11000001 001 0 .... 0 .. 100 ..... 11 ... @azz_nx1_o3 n=2
FDOT_n1_sb 11000001 001 1 .... 0 .. 100 ..... 11 ... @azz_nx1_o3 n=4
+FMLAL_n1_hb 11000001 001 1 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1
+FMLAL_n1_hb 11000001 001 0 .... 0 .. 010 ..... 001 .. @azz_nx1_o2x2 n=2
+FMLAL_n1_hb 11000001 001 1 .... 0 .. 010 ..... 001 .. @azz_nx1_o2x2 n=4
+
### SME2 Multi-vector Multiple Array Vectors
%zn_ax2 6:4 !function=times_2
@@ -578,6 +582,9 @@ FMLALL_nn_b 11000001 101 ...01 0 .. 000 ...01 0000 . @azz_4x4_o1x4
FDOT_nn_sb 11000001 101 ....0 0 .. 100 ....1 10 ... @azz_2x2_o3
FDOT_nn_sb 11000001 101 ...01 0 .. 100 ...01 10 ... @azz_4x4_o3
+FMLAL_nn_hb 11000001 101 ....0 0 .. 010 ....1 000 .. @azz_2x2_o2x2
+FMLAL_nn_hb 11000001 101 ...01 0 .. 010 ...01 000 .. @azz_4x4_o2x2
+
&az_n n off rv zm
@az_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \
&az_n n=2 rv=%mova_rv zm=%zn_ax2
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (2 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16) Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:12 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 05/10] target/arm: Implement FDOT (multiple, multiple and single, " Richard Henderson
` (7 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 4 ++++
target/arm/tcg/sme.decode | 13 +++++++++++++
2 files changed, 17 insertions(+)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 7cb7b71e74..98f6eb0b70 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1140,6 +1140,10 @@ static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub)
TRANS_FEAT(FMLAL_nx_sh, aa64_sme2, do_fmlal_nx, a, false)
TRANS_FEAT(FMLSL_nx_sh, aa64_sme2, do_fmlal_nx, a, true)
+TRANS_FEAT(FMLAL_nx_hb, aa64_sme_f8f16, do_azz_acc_fp8,
+ a->n, 2, a->rv, a->off, a->zn, a->zm,
+ a->idx << 2, 0, false, gen_helper_gvec_fmla_idx_hb)
+
static bool do_bfmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi)
{
return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm,
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index b735f3de82..c6e22c4999 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -791,6 +791,19 @@ FMLALL_nx_b 11000001 0001 .... 1 .. 0.. ...10 00.. . @azx_4x1_i4_o1
FDOT_nx_b 11000001 0101 .... 0 .. 0.. ....1 11 ... @azx_2x1_i2_o3
FDOT_nx_b 11000001 0101 .... 1 .. 0.. ...00 01 ... @azx_4x1_i2_o3
+%idx4_15_10_3 15:1 10:2 3:1
+%idx4_10_2 10:2 2:2
+@azx_1x1_i4_o3x2 ........ .... zm:4 . .. . .. zn:5 .. ... \
+ &azx_n n=1 rv=%mova_rv off=%off3_x2 idx=%idx4_15_10_3
+@azx_2x2_i4_o3x2 ........ .... zm:4 . .. . .. .... .. .. .. \
+ &azx_n n=2 rv=%mova_rv zn=%zn_ax2 off=%off2_x2 idx=%idx4_10_2
+@azx_4x4_i4_o3x2 ........ .... zm:4 . .. . .. ... ... .. .. \
+ &azx_n n=4 rv=%mova_rv zn=%zn_ax4 off=%off2_x2 idx=%idx4_10_2
+
+FMLAL_nx_hb 11000001 1100 .... . .. 0.. ..... 0. ... @azx_1x1_i4_o3x2
+FMLAL_nx_hb 11000001 1001 .... 0 .. 1.. ....1 1.. .. @azx_2x2_i4_o3x2
+FMLAL_nx_hb 11000001 1001 .... 1 .. 1.. ...01 0.. .. @azx_4x4_i4_o3x2
+
%idx2_10_3 10:1 3:1
@azx_4x2_i2_o3 ........ .... zm:4 . .. ... .... ... off:3 \
&azx_n n=4 rv=%mova_rv zn=%zn_ax2 idx=%idx2_10_3
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 05/10] target/arm: Implement FDOT (multiple, multiple and single, FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (3 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, " Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-25 1:51 ` [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, " Richard Henderson
` (6 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 9 +++++++++
target/arm/tcg/sme.decode | 6 ++++++
2 files changed, 15 insertions(+)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 98f6eb0b70..3174b10c30 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1196,6 +1196,15 @@ static bool do_fdot_fp8(DisasContext *s, arg_azz_n *a, bool multi)
TRANS_FEAT(FDOT_n1_sb, aa64_sme_f8f32, do_fdot_fp8, a, false)
TRANS_FEAT(FDOT_nn_sb, aa64_sme_f8f32, do_fdot_fp8, a, true)
+static bool do_fdot_hb(DisasContext *s, arg_azz_n *a, bool multi)
+{
+ return do_azz_acc_fp8(s, a->n, 1, a->rv, a->off, a->zn, a->zm,
+ 0, 0, multi, gen_helper_gvec_fdot_hb);
+}
+
+TRANS_FEAT(FDOT_n1_hb, aa64_sme_f8f16, do_fdot_hb, a, false)
+TRANS_FEAT(FDOT_nn_hb, aa64_sme_f8f16, do_fdot_hb, a, true)
+
static bool do_fdot_nx(DisasContext *s, arg_azx_n *a)
{
return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm,
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index c6e22c4999..fbf5f3720d 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -456,6 +456,9 @@ FMLAL_n1_hb 11000001 001 1 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1
FMLAL_n1_hb 11000001 001 0 .... 0 .. 010 ..... 001 .. @azz_nx1_o2x2 n=2
FMLAL_n1_hb 11000001 001 1 .... 0 .. 010 ..... 001 .. @azz_nx1_o2x2 n=4
+FDOT_n1_hb 11000001 001 0 .... 0 .. 100 ..... 01 ... @azz_nx1_o3 n=2
+FDOT_n1_hb 11000001 001 1 .... 0 .. 100 ..... 01 ... @azz_nx1_o3 n=4
+
### SME2 Multi-vector Multiple Array Vectors
%zn_ax2 6:4 !function=times_2
@@ -585,6 +588,9 @@ FDOT_nn_sb 11000001 101 ...01 0 .. 100 ...01 10 ... @azz_4x4_o3
FMLAL_nn_hb 11000001 101 ....0 0 .. 010 ....1 000 .. @azz_2x2_o2x2
FMLAL_nn_hb 11000001 101 ...01 0 .. 010 ...01 000 .. @azz_4x4_o2x2
+FDOT_nn_hb 11000001 101 ....0 0 .. 100 ....1 00 ... @azz_2x2_o3
+FDOT_nn_hb 11000001 101 ...01 0 .. 100 ...01 00 ... @azz_4x4_o3
+
&az_n n off rv zm
@az_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \
&az_n n=2 rv=%mova_rv zm=%zn_ax2
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (4 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 05/10] target/arm: Implement FDOT (multiple, multiple and single, " Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:16 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, " Richard Henderson
` (5 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 4 ++++
target/arm/tcg/sme.decode | 3 +++
2 files changed, 7 insertions(+)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 3174b10c30..197274d00e 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1178,6 +1178,10 @@ TRANS_FEAT(FDOT_nx_b, aa64_sme_f8f32, do_azz_acc_fp8,
a->n, 1, a->rv, a->off, a->zn, a->zm,
a->idx, 0, false, gen_helper_gvec_fdot_idx_sb)
+TRANS_FEAT(FDOT_nx_hb, aa64_sme_f8f16, do_azz_acc_fp8,
+ a->n, 1, a->rv, a->off, a->zn, a->zm,
+ a->idx, 0, false, gen_helper_gvec_fdot_idx_hb)
+
static bool do_fdot(DisasContext *s, arg_azz_n *a, bool multi)
{
return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 1, 0,
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index fbf5f3720d..1dd3b7c8b2 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -810,6 +810,9 @@ FMLAL_nx_hb 11000001 1100 .... . .. 0.. ..... 0. ... @azx_1x1_i4_o3x2
FMLAL_nx_hb 11000001 1001 .... 0 .. 1.. ....1 1.. .. @azx_2x2_i4_o3x2
FMLAL_nx_hb 11000001 1001 .... 1 .. 1.. ...01 0.. .. @azx_4x4_i4_o3x2
+FDOT_nx_hb 11000001 1101 .... 0 .. 0.. ....1 0. ... @azx_2x1_i3_o3
+FDOT_nx_hb 11000001 0001 .... 1 .. 1.. ...10 0. ... @azx_4x1_i3_o3
+
%idx2_10_3 10:1 3:1
@azx_4x2_i2_o3 ........ .... zm:4 . .. ... .... ... off:3 \
&azx_n n=4 rv=%mova_rv zn=%zn_ax2 idx=%idx2_10_3
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (5 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, " Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:22 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 08/10] target/arm: Rename FVDOT pattern Richard Henderson
` (4 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/helper-fp8-defs.h | 1 +
target/arm/tcg/fp8_helper.c | 35 ++++++++++++++++++++++++++++++++
target/arm/tcg/translate-sme.c | 24 +++++++++++++---------
target/arm/tcg/sme.decode | 1 +
4 files changed, 51 insertions(+), 10 deletions(-)
diff --git a/target/arm/tcg/helper-fp8-defs.h b/target/arm/tcg/helper-fp8-defs.h
index ef1375fea7..05bf8dbdc2 100644
--- a/target/arm/tcg/helper-fp8-defs.h
+++ b/target/arm/tcg/helper-fp8-defs.h
@@ -40,5 +40,6 @@ DEF_HELPER_FLAGS_5(gvec_fmmla_sb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32
DEF_HELPER_FLAGS_5(gvec_fmmla_hb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
DEF_HELPER_FLAGS_7(sme_fmopa_sb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_7(sme_fmopa_hb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32)
DEF_HELPER_FLAGS_5(sme_fvdot_idx_sb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
diff --git a/target/arm/tcg/fp8_helper.c b/target/arm/tcg/fp8_helper.c
index 3c2d959099..5606f0fd8e 100644
--- a/target/arm/tcg/fp8_helper.c
+++ b/target/arm/tcg/fp8_helper.c
@@ -893,6 +893,41 @@ void HELPER(sme_fmopa_sb)(void *vza, void *vzn, void *vzm, void *vpn,
}
}
+void HELPER(sme_fmopa_hb)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, CPUARMState *env, uint32_t desc)
+{
+ FP8MulContext ctx = fp8_mul_start(env, 0xf);
+ intptr_t oprsz = simd_maxsz(desc);
+ uint16_t *pn = vpn, *pm = vpm;
+
+ for (intptr_t row = 0; row < oprsz; ) {
+ uint16_t prow = pn[H2(row >> 4)];
+ do {
+ void *vza_row = vza + tile_vslice_offset(row);
+ uint16_t n = *(uint16_t *)(vzn + H1_2(row));
+
+ n &= expand_pred_b(prow & 3);
+
+ for (intptr_t col = 0; col < oprsz; ) {
+ uint16_t pcol = pm[H2(col >> 4)];
+ do {
+ if (prow & pcol & 0x3) {
+ uint16_t *a = vza_row + H1_2(col);
+ uint16_t m = *(uint16_t *)(vzm + H1_2(col));
+
+ m &= expand_pred_b(pcol & 3);
+ *a = f8dotadd_h(n, m, 2, *a, &ctx);
+ }
+ col += 2;
+ pcol >>= 2;
+ } while (col & 15);
+ }
+ row += 2;
+ prow >>= 2;
+ } while (row & 15);
+ }
+}
+
void HELPER(sme_fvdot_idx_sb)(void *vd, void *vn, void *vm,
CPUARMState *env, uint32_t desc)
{
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 197274d00e..7eeac28480 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -616,25 +616,29 @@ TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32,
: !s->fpcr_ah ? gen_helper_sme_bfmops_w
: gen_helper_sme_ah_bfmops_w)
-static bool trans_FMOPA_sb(DisasContext *s, arg_op *a)
+static bool do_outprod_fp8(DisasContext *s, arg_op *a, MemOp esz,
+ gen_helper_gvec_5_ptr *fn)
{
- if (!dc_isar_feature(aa64_sme_f8f32, s)) {
- return false;
- }
if (fpmr_access_check(s) && sme_smza_enabled_check(s)) {
int svl = streaming_vec_reg_size(s);
uint32_t desc = simd_desc(svl, svl, 0);
- gen_helper_sme_fmopa_sb(get_tile(s, MO_32, a->zad),
- vec_full_reg_ptr(s, a->zn),
- vec_full_reg_ptr(s, a->zm),
- pred_full_reg_ptr(s, a->pn),
- pred_full_reg_ptr(s, a->pm),
- tcg_env, tcg_constant_i32(desc));
+ TCGv_ptr za = get_tile(s, esz, a->zad);
+ TCGv_ptr zn = vec_full_reg_ptr(s, a->zn);
+ TCGv_ptr zm = vec_full_reg_ptr(s, a->zm);
+ TCGv_ptr pn = pred_full_reg_ptr(s, a->pn);
+ TCGv_ptr pm = pred_full_reg_ptr(s, a->pm);
+
+ fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc));
}
return true;
}
+TRANS_FEAT(FMOPA_sb, aa64_sme_f8f32, do_outprod_fp8,
+ a, MO_32, gen_helper_sme_fmopa_sb)
+TRANS_FEAT(FMOPA_hb, aa64_sme_f8f16, do_outprod_fp8,
+ a, MO_16, gen_helper_sme_fmopa_hb)
+
TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s)
TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s)
TRANS_FEAT(SUMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_sumopa_s)
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index 1dd3b7c8b2..755d5f00d0 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -199,6 +199,7 @@ BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32
FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32
FMOPA_sb 10000000 101 zm:5 pm:3 pn:3 zn:5 0 00 zad:2 &op sub=0
+FMOPA_hb 10000000 101 zm:5 pm:3 pn:3 zn:5 0100 zad:1 &op sub=0
SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32
SUMOPA_s 1010000 0 10 1 ..... ... ... ..... . 00 .. @op_32
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 08/10] target/arm: Rename FVDOT pattern
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (6 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, " Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-25 10:19 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16) Richard Henderson
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Rename to FVDOT_sh so that we can introduce an insn
of the same name from FEAT_SME_F8F16.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/translate-sme.c | 2 +-
target/arm/tcg/sme.decode | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 7eeac28480..267a6b0d9b 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1260,7 +1260,7 @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn)
return true;
}
-TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h)
+TRANS_FEAT(FVDOT_sh, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h)
TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx)
static bool do_fvdot_sb(DisasContext *s, arg_azx_n *a, bool top)
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index 755d5f00d0..160cf130d4 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -662,7 +662,7 @@ FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3
BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3
BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3
-FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3
+FVDOT_sh 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3
BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3
SDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_i2_o3
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16)
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (7 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 08/10] target/arm: Rename FVDOT pattern Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-26 9:27 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max Richard Henderson
` (2 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/tcg/helper-fp8-defs.h | 1 +
target/arm/tcg/fp8_helper.c | 23 +++++++++++++++++++++++
target/arm/tcg/translate-sme.c | 4 ++++
target/arm/tcg/sme.decode | 2 ++
4 files changed, 30 insertions(+)
diff --git a/target/arm/tcg/helper-fp8-defs.h b/target/arm/tcg/helper-fp8-defs.h
index 05bf8dbdc2..126dcadf77 100644
--- a/target/arm/tcg/helper-fp8-defs.h
+++ b/target/arm/tcg/helper-fp8-defs.h
@@ -43,3 +43,4 @@ DEF_HELPER_FLAGS_7(sme_fmopa_sb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr,
DEF_HELPER_FLAGS_7(sme_fmopa_hb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32)
DEF_HELPER_FLAGS_5(sme_fvdot_idx_sb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(sme_fvdot_idx_hb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
diff --git a/target/arm/tcg/fp8_helper.c b/target/arm/tcg/fp8_helper.c
index 5606f0fd8e..e1dcc2b70f 100644
--- a/target/arm/tcg/fp8_helper.c
+++ b/target/arm/tcg/fp8_helper.c
@@ -950,3 +950,26 @@ void HELPER(sme_fvdot_idx_sb)(void *vd, void *vn, void *vm,
} while (++i & 3);
} while (i < elements);
}
+
+void HELPER(sme_fvdot_idx_hb)(void *vd, void *vn, void *vm,
+ CPUARMState *env, uint32_t desc)
+{
+ FP8MulContext ctx = fp8_mul_start(env, 0xf);
+ intptr_t oprsz = simd_maxsz(desc);
+ intptr_t elements = oprsz / sizeof(float32);
+ int idx_n = extract32(desc, SIMD_DATA_SHIFT, 1);
+ int idx_m = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
+ float16 *d = vd;
+ uint8_t *n0 = vn;
+ uint8_t *n1 = vn + sizeof(ARMVectorReg);
+ uint16_t *m = vm;
+ intptr_t i = 0;
+
+ do {
+ uint16_t mm = m[H2(2 * i + idx_m)];
+ do {
+ uint16_t nn = n0[H1(4 * i + idx_n)] | (n1[H1(4 * i + idx_n)] << 8);
+ d[H2(i)] = f8dotadd_h(nn, mm, 2, d[H2(i)], &ctx);
+ } while (++i & 7);
+ } while (i < elements);
+}
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 267a6b0d9b..ff5554eefb 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -1273,6 +1273,10 @@ static bool do_fvdot_sb(DisasContext *s, arg_azx_n *a, bool top)
TRANS_FEAT(FVDOTB_sb, aa64_sme_f8f32, do_fvdot_sb, a, false)
TRANS_FEAT(FVDOTT_sb, aa64_sme_f8f32, do_fvdot_sb, a, true)
+TRANS_FEAT(FVDOT_hb, aa64_sme_f8f16, do_azz_acc_fp8,
+ a->n, 2, a->rv, a->off, a->zn, a->zm,
+ (a->idx << 1), 0, false, gen_helper_sme_fvdot_idx_hb)
+
static bool do_fmla(DisasContext *s, arg_azz_n *a, bool multi,
ARMFPStatusFlavour fpst, gen_helper_gvec_3_ptr *fn)
{
diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode
index 160cf130d4..3a65e1ad4b 100644
--- a/target/arm/tcg/sme.decode
+++ b/target/arm/tcg/sme.decode
@@ -821,6 +821,8 @@ FDOT_nx_hb 11000001 0001 .... 1 .. 1.. ...10 0. ... @azx_4x1_i3_o3
FVDOTB_sb 11000001 1101 .... 0 .. 01. ....0 0. ... @azx_4x2_i2_o3
FVDOTT_sb 11000001 1101 .... 0 .. 01. ....0 1. ... @azx_4x2_i2_o3
+FVDOT_hb 11000001 1101 .... 0 .. 1.. ....1 0. ... @azx_2x1_i3_o3
+
### SME2 Add / Sub array accumulators
ADD_aaz_s 11000001 101 000000 .. 111 ....0 10 ... @az_2x2_o3
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (8 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16) Richard Henderson
@ 2026-06-25 1:51 ` Richard Henderson
2026-06-25 10:18 ` Peter Maydell
2026-06-26 9:33 ` [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Peter Maydell
2026-06-26 10:07 ` Alex Bennée
11 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2026-06-25 1:51 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/aarch64/elfload.c | 1 +
target/arm/tcg/cpu64.c | 1 +
docs/system/arm/emulation.rst | 1 +
3 files changed, 3 insertions(+)
diff --git a/linux-user/aarch64/elfload.c b/linux-user/aarch64/elfload.c
index 20b9838520..42aeb29306 100644
--- a/linux-user/aarch64/elfload.c
+++ b/linux-user/aarch64/elfload.c
@@ -236,6 +236,7 @@ abi_ulong get_elf_hwcap2(CPUState *cs)
GET_FEATURE_ID(aa64_ssve_f8dp4, ARM_HWCAP2_A64_SME_SF8DP4);
GET_FEATURE_ID(aa64_ssve_f8dp2, ARM_HWCAP2_A64_SME_SF8DP2);
GET_FEATURE_ID(aa64_sme_f8f32, ARM_HWCAP2_A64_SME_F8F32);
+ GET_FEATURE_ID(aa64_sme_f8f16, ARM_HWCAP2_A64_SME_F8F16);
return hwcaps;
}
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index e9eda8fbf1..42bff58e76 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1393,6 +1393,7 @@ void aarch64_max_tcg_initfn(Object *obj)
t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1); /* FEAT_SME */
t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf); /* FEAT_SME */
t = FIELD_DP64(t, ID_AA64SMFR0, F8F32, 1); /* FEAT_SME_F8F32 */
+ t = FIELD_DP64(t, ID_AA64SMFR0, F8F16, 1); /* FEAT_SME_F8F16 */
t = FIELD_DP64(t, ID_AA64SMFR0, F16F16, 1); /* FEAT_SME_F16F16 */
t = FIELD_DP64(t, ID_AA64SMFR0, B16B16, 1); /* FEAT_SME_B16B16 */
t = FIELD_DP64(t, ID_AA64SMFR0, I16I32, 5); /* FEAT_SME2 */
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index a3a1607ff9..7b85ec6146 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -169,6 +169,7 @@ the following architecture extensions:
- FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode)
- FEAT_SME_F16F16 (Non-widening half-precision FP16 arithmetic for SME2)
- FEAT_SME_F64F64 (Double-precision floating-point outer product instructions)
+- FEAT_SME_F8F16 (SME2 ZA-targeting FP8 multiply-accumulate, dot product, and outer product to half-precision instructions)
- FEAT_SME_F8F32 (SME2 ZA-targeting FP8 multiply-accumulate, dot product, and outer product to single-precision instructions)
- FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions)
- FEAT_SME_LUTv2 (Lookup table instructions with 4-bit indices and 8-bit elements)
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns
2026-06-25 1:51 ` [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns Richard Henderson
@ 2026-06-25 10:17 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-25 10:17 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Rename patterns to include _sh suffix, so that we can
> distinguish insns of the same name from FEAT_SME_F8F16.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max
2026-06-25 1:51 ` [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max Richard Henderson
@ 2026-06-25 10:18 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-25 10:18 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> linux-user/aarch64/elfload.c | 1 +
> target/arm/tcg/cpu64.c | 1 +
> docs/system/arm/emulation.rst | 1 +
> 3 files changed, 3 insertions(+)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 08/10] target/arm: Rename FVDOT pattern
2026-06-25 1:51 ` [PATCH 08/10] target/arm: Rename FVDOT pattern Richard Henderson
@ 2026-06-25 10:19 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-25 10:19 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Rename to FVDOT_sh so that we can introduce an insn
> of the same name from FEAT_SME_F8F16.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16
2026-06-25 1:51 ` [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16 Richard Henderson
@ 2026-06-26 9:03 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:03 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> These two instructions can be enabled with either
> FEAT_SME_F8F16 or FEAT_SME_F16F16.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu-features.h | 11 +++++++++++
> target/arm/tcg/translate-sme.c | 4 ++--
> 2 files changed, 13 insertions(+), 2 deletions(-)
> +static inline bool
> +isar_feature_aa64_sme_f16f16_or_f8f16(const ARMISARegisters *id)
> +{
> + return isar_feature_aa64_sme_f16f16(id) && isar_feature_aa64_sme_f8f16(id);
Should be ||, not &&...
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16)
2026-06-25 1:51 ` [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16) Richard Henderson
@ 2026-06-26 9:11 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:11 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-sme.c | 9 +++++++++
> target/arm/tcg/sme.decode | 7 +++++++
> 2 files changed, 16 insertions(+)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, FP8 to FP16)
2026-06-25 1:51 ` [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, " Richard Henderson
@ 2026-06-26 9:12 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/tcg/translate-sme.c | 4 ++++
> target/arm/tcg/sme.decode | 13 +++++++++++++
> 2 files changed, 17 insertions(+)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, FP8 to FP16)
2026-06-25 1:51 ` [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, " Richard Henderson
@ 2026-06-26 9:16 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:16 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Patch subject should be FDOT, not DOT. Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, FP8 to FP16)
2026-06-25 1:51 ` [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, " Richard Henderson
@ 2026-06-26 9:22 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:22 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16)
2026-06-25 1:51 ` [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16) Richard Henderson
@ 2026-06-26 9:27 ` Peter Maydell
2026-06-26 15:33 ` Richard Henderson
0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:27 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> +void HELPER(sme_fvdot_idx_hb)(void *vd, void *vn, void *vm,
> + CPUARMState *env, uint32_t desc)
> +{
> + FP8MulContext ctx = fp8_mul_start(env, 0xf);
> + intptr_t oprsz = simd_maxsz(desc);
> + intptr_t elements = oprsz / sizeof(float32);
Shouldn't this be sizeof(float16) since the output elements
are halfprec ?
> + int idx_n = extract32(desc, SIMD_DATA_SHIFT, 1);
> + int idx_m = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
> + float16 *d = vd;
> + uint8_t *n0 = vn;
> + uint8_t *n1 = vn + sizeof(ARMVectorReg);
> + uint16_t *m = vm;
> + intptr_t i = 0;
> +
> + do {
> + uint16_t mm = m[H2(2 * i + idx_m)];
> + do {
> + uint16_t nn = n0[H1(4 * i + idx_n)] | (n1[H1(4 * i + idx_n)] << 8);
> + d[H2(i)] = f8dotadd_h(nn, mm, 2, d[H2(i)], &ctx);
> + } while (++i & 7);
> + } while (i < elements);
> +}
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (9 preceding siblings ...)
2026-06-25 1:51 ` [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max Richard Henderson
@ 2026-06-26 9:33 ` Peter Maydell
2026-06-26 10:07 ` Alex Bennée
11 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 9:33 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
On Thu, 25 Jun 2026 at 02:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Another minor feature working toward SME2.2.
>
> r~
>
> Richard Henderson (10):
> target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16
> target/arm: Rename SME FMLAL/FMLSL patterns
> target/arm: Implement FMLAL (multiple, multiple and single, FP8 to
> FP16)
> target/arm: Implement FMLAL (multiple and indexed, FP8 to FP16)
> target/arm: Implement FDOT (multiple, multiple and single, FP8 to
> FP16)
> target/arm: Implement DOT (multiple and indexed, FP8 to FP16)
> target/arm: Implement FMOPA (widening, 2-way, FP8 to FP16)
> target/arm: Rename FVDOT pattern
> target/arm: Implement FVDOT (FP8 to FP16)
> target/arm: Enable FEAT_SME_F8F16 for -cpu max
If you agree with my suggested tweaks for patches 1, 6, 9,
I can take this into target-arm.next and adjust it there.
The only one that isn't totally obvious is the patch 9 one.
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
` (10 preceding siblings ...)
2026-06-26 9:33 ` [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Peter Maydell
@ 2026-06-26 10:07 ` Alex Bennée
2026-06-26 10:16 ` Peter Maydell
11 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2026-06-26 10:07 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-arm
Richard Henderson <richard.henderson@linaro.org> writes:
> Another minor feature working toward SME2.2.
This conflicts heavily with master - did a bunch of stuff get merged
that broke it?
>
> r~
>
> Richard Henderson (10):
> target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16
> target/arm: Rename SME FMLAL/FMLSL patterns
> target/arm: Implement FMLAL (multiple, multiple and single, FP8 to
> FP16)
> target/arm: Implement FMLAL (multiple and indexed, FP8 to FP16)
> target/arm: Implement FDOT (multiple, multiple and single, FP8 to
> FP16)
> target/arm: Implement DOT (multiple and indexed, FP8 to FP16)
> target/arm: Implement FMOPA (widening, 2-way, FP8 to FP16)
> target/arm: Rename FVDOT pattern
> target/arm: Implement FVDOT (FP8 to FP16)
> target/arm: Enable FEAT_SME_F8F16 for -cpu max
>
> target/arm/cpu-features.h | 11 +++++
> target/arm/tcg/helper-fp8-defs.h | 2 +
> linux-user/aarch64/elfload.c | 1 +
> target/arm/tcg/cpu64.c | 1 +
> target/arm/tcg/fp8_helper.c | 58 +++++++++++++++++++++++++
> target/arm/tcg/translate-sme.c | 72 +++++++++++++++++++++++---------
> docs/system/arm/emulation.rst | 1 +
> target/arm/tcg/sme.decode | 66 +++++++++++++++++++++--------
> 8 files changed, 176 insertions(+), 36 deletions(-)
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16
2026-06-26 10:07 ` Alex Bennée
@ 2026-06-26 10:16 ` Peter Maydell
2026-06-26 11:54 ` Peter Maydell
0 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 10:16 UTC (permalink / raw)
To: Alex Bennée; +Cc: Richard Henderson, qemu-devel, qemu-arm
On Fri, 26 Jun 2026 at 11:07, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
> > Another minor feature working toward SME2.2.
>
> This conflicts heavily with master - did a bunch of stuff get merged
> that broke it?
It'll be based on the F8F32 series that's not yet upstream
(it's in my target-arm.next queue).
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16
2026-06-26 10:16 ` Peter Maydell
@ 2026-06-26 11:54 ` Peter Maydell
0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2026-06-26 11:54 UTC (permalink / raw)
To: Alex Bennée; +Cc: Richard Henderson, qemu-devel, qemu-arm
On Fri, 26 Jun 2026 at 11:16, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 26 Jun 2026 at 11:07, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> > Richard Henderson <richard.henderson@linaro.org> writes:
> >
> > > Another minor feature working toward SME2.2.
> >
> > This conflicts heavily with master - did a bunch of stuff get merged
> > that broke it?
>
> It'll be based on the F8F32 series that's not yet upstream
> (it's in my target-arm.next queue).
https://gitlab.com/pm215/qemu/-/commits/target-arm.next
including this series.
-- PMM
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16)
2026-06-26 9:27 ` Peter Maydell
@ 2026-06-26 15:33 ` Richard Henderson
0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2026-06-26 15:33 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, qemu-arm
On 6/26/26 02:27, Peter Maydell wrote:
> On Thu, 25 Jun 2026 at 02:52, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>
>
>> +void HELPER(sme_fvdot_idx_hb)(void *vd, void *vn, void *vm,
>> + CPUARMState *env, uint32_t desc)
>> +{
>> + FP8MulContext ctx = fp8_mul_start(env, 0xf);
>> + intptr_t oprsz = simd_maxsz(desc);
>> + intptr_t elements = oprsz / sizeof(float32);
>
> Shouldn't this be sizeof(float16) since the output elements
> are halfprec ?
Oops, yes. Thanks.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-06-26 15:33 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 1:51 [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Richard Henderson
2026-06-25 1:51 ` [PATCH 01/10] target/arm: Enable FADD/FSUB (half-precision) with FEAT_SME_F8F16 Richard Henderson
2026-06-26 9:03 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 02/10] target/arm: Rename SME FMLAL/FMLSL patterns Richard Henderson
2026-06-25 10:17 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 03/10] target/arm: Implement FMLAL (multiple, multiple and single, FP8 to FP16) Richard Henderson
2026-06-26 9:11 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 04/10] target/arm: Implement FMLAL (multiple and indexed, " Richard Henderson
2026-06-26 9:12 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 05/10] target/arm: Implement FDOT (multiple, multiple and single, " Richard Henderson
2026-06-25 1:51 ` [PATCH 06/10] target/arm: Implement DOT (multiple and indexed, " Richard Henderson
2026-06-26 9:16 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 07/10] target/arm: Implement FMOPA (widening, 2-way, " Richard Henderson
2026-06-26 9:22 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 08/10] target/arm: Rename FVDOT pattern Richard Henderson
2026-06-25 10:19 ` Peter Maydell
2026-06-25 1:51 ` [PATCH 09/10] target/arm: Implement FVDOT (FP8 to FP16) Richard Henderson
2026-06-26 9:27 ` Peter Maydell
2026-06-26 15:33 ` Richard Henderson
2026-06-25 1:51 ` [PATCH 10/10] target/arm: Enable FEAT_SME_F8F16 for -cpu max Richard Henderson
2026-06-25 10:18 ` Peter Maydell
2026-06-26 9:33 ` [PATCH 00/10] target/arm: Implement FEAT_SME_F8F16 Peter Maydell
2026-06-26 10:07 ` Alex Bennée
2026-06-26 10:16 ` Peter Maydell
2026-06-26 11:54 ` Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.