[PATCH 0/4] target/arm: Fix SME full tile indexing

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] target/arm: Fix SME full tile indexing
@ 2023-06-22 15:11 Richard Henderson
  2023-06-22 15:11 ` [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump Richard Henderson
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Richard Henderson @ 2023-06-22 15:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Fix #1620 and add its test case.
Several cleanups to aid debugging ZA[].  :-)

r~

Richard Henderson (4):
  target/arm: Avoid splitting Zregs across lines in dump
  target/arm: Dump ZA[] when active
  target/arm: Support reading ZA[] from gdbstub
  target/arm: Fix SME full tile indexing

 target/arm/cpu.h                  |  1 +
 target/arm/internals.h            |  3 ++
 target/arm/cpu.c                  | 54 +++++++++++--------
 target/arm/gdbstub.c              |  8 +++
 target/arm/gdbstub64.c            | 88 +++++++++++++++++++++++++++++++
 target/arm/tcg/translate-sme.c    | 24 ++++++---
 tests/tcg/aarch64/sme-outprod1.c  | 83 +++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target | 10 ++--
 8 files changed, 240 insertions(+), 31 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-outprod1.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump
  2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
@ 2023-06-22 15:11 ` Richard Henderson
  2023-06-27 12:51   ` Peter Maydell
  2023-06-22 15:11 ` [PATCH 2/4] target/arm: Dump ZA[] when active Richard Henderson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2023-06-22 15:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Allow the line length to extend to 548 columns.  While annoyingly wide,
it's still less confusing than the continuations we print.  Also, the
default VL used by Linux (and max for A64FX) uses only 140 columns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 353fc48567..7cb70f9727 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -955,7 +955,7 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
     uint32_t psr = pstate_read(env);
-    int i;
+    int i, j;
     int el = arm_current_el(env);
     const char *ns_status;
     bool sve;
@@ -1014,7 +1014,7 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     }
 
     if (sve) {
-        int j, zcr_len = sve_vqm1_for_el(env, el);
+        int zcr_len = sve_vqm1_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
             bool eol;
@@ -1054,32 +1054,24 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
             }
         }
 
-        for (i = 0; i < 32; i++) {
-            if (zcr_len == 0) {
+        if (zcr_len == 0) {
+            /*
+             * With vl=16, there are only 37 columns per register,
+             * so output two registers per line.
+             */
+            for (i = 0; i < 32; i++) {
                 qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64 "%s",
                              i, env->vfp.zregs[i].d[1],
                              env->vfp.zregs[i].d[0], i & 1 ? "\n" : " ");
-            } else if (zcr_len == 1) {
-                qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64
-                             ":%016" PRIx64 ":%016" PRIx64 "\n",
-                             i, env->vfp.zregs[i].d[3], env->vfp.zregs[i].d[2],
-                             env->vfp.zregs[i].d[1], env->vfp.zregs[i].d[0]);
-            } else {
+            }
+        } else {
+            for (i = 0; i < 32; i++) {
+                qemu_fprintf(f, "Z%02d=", i);
                 for (j = zcr_len; j >= 0; j--) {
-                    bool odd = (zcr_len - j) % 2 != 0;
-                    if (j == zcr_len) {
-                        qemu_fprintf(f, "Z%02d[%x-%x]=", i, j, j - 1);
-                    } else if (!odd) {
-                        if (j > 0) {
-                            qemu_fprintf(f, "   [%x-%x]=", j, j - 1);
-                        } else {
-                            qemu_fprintf(f, "     [%x]=", j);
-                        }
-                    }
                     qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%s",
                                  env->vfp.zregs[i].d[j * 2 + 1],
-                                 env->vfp.zregs[i].d[j * 2],
-                                 odd || j == 0 ? "\n" : ":");
+                                 env->vfp.zregs[i].d[j * 2 + 0],
+                                 j ? ":" : "\n");
                 }
             }
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] target/arm: Dump ZA[] when active
  2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
  2023-06-22 15:11 ` [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump Richard Henderson
@ 2023-06-22 15:11 ` Richard Henderson
  2023-06-27 12:51   ` Peter Maydell
  2023-06-22 15:12 ` [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub Richard Henderson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2023-06-22 15:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Always print each matrix row whole, one per line, so that we
get the entire matrix in the proper shape.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 7cb70f9727..3da811bc5e 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1082,6 +1082,24 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
                          i, q[1], q[0], (i & 1 ? "\n" : " "));
         }
     }
+
+    if (cpu_isar_feature(aa64_sme, cpu) &&
+        FIELD_EX64(env->svcr, SVCR, ZA) &&
+        sme_exception_el(env, el) == 0) {
+        int zcr_len = sve_vqm1_for_el_sm(env, el, true);
+        int svl = (zcr_len + 1) * 16;
+        int svl_lg10 = svl < 100 ? 2 : 3;
+
+        for (i = 0; i < svl; i++) {
+            qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i);
+            for (j = zcr_len; j >= 0; --j) {
+                qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c",
+                             env->zarray[i].d[2 * j + 1],
+                             env->zarray[i].d[2 * j],
+                             j ? ':' : '\n');
+            }
+        }
+    }
 }
 
 #else
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub
  2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
  2023-06-22 15:11 ` [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump Richard Henderson
  2023-06-22 15:11 ` [PATCH 2/4] target/arm: Dump ZA[] when active Richard Henderson
@ 2023-06-22 15:12 ` Richard Henderson
  2023-06-27 13:07   ` Peter Maydell
  2023-06-22 15:12 ` [PATCH 4/4] target/arm: Fix SME full tile indexing Richard Henderson
  2023-06-27 13:36 ` [PATCH 0/4] " Peter Maydell
  4 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2023-06-22 15:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Mirror the existing support for SVE.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h       |  1 +
 target/arm/internals.h |  3 ++
 target/arm/gdbstub.c   |  8 ++++
 target/arm/gdbstub64.c | 88 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index af0119addf..082617cfc6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -877,6 +877,7 @@ struct ArchCPU {
 
     DynamicGDBXMLInfo dyn_sysreg_xml;
     DynamicGDBXMLInfo dyn_svereg_xml;
+    DynamicGDBXMLInfo dyn_zareg_xml;
     DynamicGDBXMLInfo dyn_m_systemreg_xml;
     DynamicGDBXMLInfo dyn_m_secextreg_xml;
 
diff --git a/target/arm/internals.h b/target/arm/internals.h
index e3029bdc37..54d1f28992 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1362,12 +1362,15 @@ static inline uint64_t pmu_counter_mask(CPUARMState *env)
 
 #ifdef TARGET_AARCH64
 int arm_gen_dynamic_svereg_xml(CPUState *cpu, int base_reg);
+int arm_gen_dynamic_zareg_xml(CPUState *cpu, int base_reg);
 int aarch64_gdb_get_sve_reg(CPUARMState *env, GByteArray *buf, int reg);
 int aarch64_gdb_set_sve_reg(CPUARMState *env, uint8_t *buf, int reg);
 int aarch64_gdb_get_fpu_reg(CPUARMState *env, GByteArray *buf, int reg);
 int aarch64_gdb_set_fpu_reg(CPUARMState *env, uint8_t *buf, int reg);
 int aarch64_gdb_get_pauth_reg(CPUARMState *env, GByteArray *buf, int reg);
 int aarch64_gdb_set_pauth_reg(CPUARMState *env, uint8_t *buf, int reg);
+int aarch64_gdb_get_za_reg(CPUARMState *env, GByteArray *buf, int reg);
+int aarch64_gdb_set_za_reg(CPUARMState *env, uint8_t *buf, int reg);
 void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
 void arm_cpu_sme_finalize(ARMCPU *cpu, Error **errp);
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp);
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index 03b17c814f..1204eb40d7 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -490,6 +490,8 @@ const char *arm_gdb_get_dynamic_xml(CPUState *cs, const char *xmlname)
         return cpu->dyn_sysreg_xml.desc;
     } else if (strcmp(xmlname, "sve-registers.xml") == 0) {
         return cpu->dyn_svereg_xml.desc;
+    } else if (strcmp(xmlname, "za-registers.xml") == 0) {
+        return cpu->dyn_zareg_xml.desc;
     } else if (strcmp(xmlname, "arm-m-system.xml") == 0) {
         return cpu->dyn_m_systemreg_xml.desc;
 #ifndef CONFIG_USER_ONLY
@@ -532,6 +534,12 @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
                                      aarch64_gdb_set_pauth_reg,
                                      4, "aarch64-pauth.xml", 0);
         }
+        if (cpu_isar_feature(aa64_sme, cpu)) {
+            int nreg = arm_gen_dynamic_zareg_xml(cs, cs->gdb_num_regs);
+            gdb_register_coprocessor(cs, aarch64_gdb_get_za_reg,
+                                     aarch64_gdb_set_za_reg, nreg,
+                                     "za-registers.xml", 0);
+        }
 #endif
     } else {
         if (arm_feature(env, ARM_FEATURE_NEON)) {
diff --git a/target/arm/gdbstub64.c b/target/arm/gdbstub64.c
index d7b79a6589..b76fac9bd0 100644
--- a/target/arm/gdbstub64.c
+++ b/target/arm/gdbstub64.c
@@ -247,6 +247,61 @@ int aarch64_gdb_set_pauth_reg(CPUARMState *env, uint8_t *buf, int reg)
     return 0;
 }
 
+static int max_svq(ARMCPU *cpu)
+{
+    return 32 - clz32(cpu->sme_vq.map);
+}
+
+int aarch64_gdb_get_za_reg(CPUARMState *env, GByteArray *buf, int reg)
+{
+    ARMCPU *cpu = env_archcpu(env);
+    int max_vq = max_svq(cpu);
+    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
+    int i;
+
+    if (reg >= max_vq * 16) {
+        return 0;
+    }
+
+    /* If ZA is unset, or reg out of range, the contents are zero. */
+    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
+        for (i = 0; i < cur_vq; i++) {
+            gdb_get_reg128(buf, env->zarray[reg].d[i * 2 + 1],
+                           env->zarray[reg].d[i * 2]);
+        }
+    } else {
+        cur_vq = 0;
+    }
+
+    for (i = cur_vq; i < max_vq; i++) {
+        gdb_get_reg128(buf, 0, 0);
+    }
+
+    return max_vq * 16;
+}
+
+int aarch64_gdb_set_za_reg(CPUARMState *env, uint8_t *buf, int reg)
+{
+    ARMCPU *cpu = env_archcpu(env);
+    uint64_t *p = (uint64_t *) buf;
+    int max_vq = max_svq(cpu);
+    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
+    int i;
+
+    if (reg >= max_vq * 16) {
+        return 0;
+    }
+
+    /* If ZA is unset, or reg out of range, the contents are zero. */
+    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
+        for (i = 0; i < cur_vq; i++) {
+            env->zarray[reg].d[i * 2 + 1] = *p++;
+            env->zarray[reg].d[i * 2 + 0] = *p++;
+        }
+    }
+    return max_vq * 16;
+}
+
 static void output_vector_union_type(GString *s, int reg_width,
                                      const char *name)
 {
@@ -379,3 +434,36 @@ int arm_gen_dynamic_svereg_xml(CPUState *cs, int orig_base_reg)
     info->num = base_reg - orig_base_reg;
     return info->num;
 }
+
+/*
+ * Generate the xml for SME, with matrix size set to the maximum
+ * for the cpu.  Returns the number of registers generated.
+ */
+int arm_gen_dynamic_zareg_xml(CPUState *cs, int base_reg)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    GString *s = g_string_new(NULL);
+    int vq = max_svq(cpu);
+    int row_count = vq * 16;
+    int row_width = vq * 128;
+    int i;
+
+    g_string_printf(s, "<?xml version=\"1.0\"?>");
+    g_string_append_printf(s, "<!DOCTYPE target SYSTEM \"gdb-target.dtd\">");
+    g_string_append_printf(s, "<feature name=\"org.qemu.gdb.aarch64.za\">");
+
+    output_vector_union_type(s, row_width, "zav");
+
+    for (i = 0; i < row_count; i++) {
+        g_string_append_printf(s,
+                               "<reg name=\"za%d\" bitsize=\"%d\""
+                               " regnum=\"%d\" type=\"zav\"/>",
+                               i, row_width, base_reg + i);
+    }
+
+    g_string_append_printf(s, "</feature>");
+
+    cpu->dyn_zareg_xml.num = row_count;
+    cpu->dyn_zareg_xml.desc = g_string_free(s, false);
+    return row_count;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] target/arm: Fix SME full tile indexing
  2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
                   ` (2 preceding siblings ...)
  2023-06-22 15:12 ` [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub Richard Henderson
@ 2023-06-22 15:12 ` Richard Henderson
  2023-06-27 13:24   ` Peter Maydell
  2023-06-27 13:36 ` [PATCH 0/4] " Peter Maydell
  4 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2023-06-22 15:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

For the outer product set of insns, which take an entire matrix
tile as output, the argument is not a combined tile+column.
Therefore using get_tile_rowcol was incorrect, as we extracted
the tile number from itself.

The test case relies only on assembler support for SME, since
no release of GCC recognizes -march=armv9-a+sme yet.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1620
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate-sme.c    | 24 ++++++---
 tests/tcg/aarch64/sme-outprod1.c  | 83 +++++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target | 10 ++--
 3 files changed, 108 insertions(+), 9 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-outprod1.c

diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index d0054e3f77..6038b0a06f 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -95,6 +95,21 @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
     return addr;
 }
 
+/*
+ * Resolve tile.size[0] to a host pointer.
+ * Used by e.g. outer product insns where we require the entire tile.
+ */
+static TCGv_ptr get_tile(DisasContext *s, int esz, int tile)
+{
+    TCGv_ptr addr = tcg_temp_new_ptr();
+    int offset;
+
+    offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray);
+
+    tcg_gen_addi_ptr(addr, cpu_env, offset);
+    return addr;
+}
+
 static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
 {
     if (!dc_isar_feature(aa64_sme, s)) {
@@ -260,8 +275,7 @@ static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     pn = pred_full_reg_ptr(s, a->pn);
     pm = pred_full_reg_ptr(s, a->pm);
@@ -286,8 +300,7 @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     zm = vec_full_reg_ptr(s, a->zm);
     pn = pred_full_reg_ptr(s, a->pn);
@@ -308,8 +321,7 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     zm = vec_full_reg_ptr(s, a->zm);
     pn = pred_full_reg_ptr(s, a->pn);
diff --git a/tests/tcg/aarch64/sme-outprod1.c b/tests/tcg/aarch64/sme-outprod1.c
new file mode 100644
index 0000000000..6e5972d75e
--- /dev/null
+++ b/tests/tcg/aarch64/sme-outprod1.c
@@ -0,0 +1,83 @@
+/*
+ * SME outer product, 1 x 1.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+
+extern void foo(float *dst);
+
+asm(
+"	.arch_extension sme\n"
+"	.type foo, @function\n"
+"foo:\n"
+"	stp x29, x30, [sp, -80]!\n"
+"	mov x29, sp\n"
+"	stp d8, d9, [sp, 16]\n"
+"	stp d10, d11, [sp, 32]\n"
+"	stp d12, d13, [sp, 48]\n"
+"	stp d14, d15, [sp, 64]\n"
+"	smstart\n"
+"	ptrue p0.s, vl4\n"
+"	fmov z0.s, #1.0\n"
+/*
+ * An outer product of a vector of 1.0 by itself should be a matrix of 1.0.
+ * Note that we are using tile 1 here (za1.s) rather than tile 0.
+ */
+"	zero {za}\n"
+"	fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n"
+/*
+ * Read the first 4x4 sub-matrix of elements from tile 1:
+ * Note that za1h should be interchangable here.
+ */
+"	mov w12, #0\n"
+"	mova z0.s, p0/m, za1v.s[w12, #0]\n"
+"	mova z1.s, p0/m, za1v.s[w12, #1]\n"
+"	mova z2.s, p0/m, za1v.s[w12, #2]\n"
+"	mova z3.s, p0/m, za1v.s[w12, #3]\n"
+/*
+ * And store them to the input pointer (dst in the C code):
+ */
+"	st1w {z0.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z1.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z2.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z3.s}, p0, [x0]\n"
+"	smstop\n"
+"	ldp d8, d9, [sp, 16]\n"
+"	ldp d10, d11, [sp, 32]\n"
+"	ldp d12, d13, [sp, 48]\n"
+"	ldp d14, d15, [sp, 64]\n"
+"	ldp x29, x30, [sp], 80\n"
+"	ret\n"
+"	.size foo, . - foo"
+);
+
+int main()
+{
+    float dst[16];
+    int i, j;
+
+    foo(dst);
+
+    for (i = 0; i < 16; i++) {
+        if (dst[i] != 1.0f) {
+            break;
+        }
+    }
+
+    if (i == 16) {
+        return 0; /* success */
+    }
+
+    /* failure */
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 4; ++j) {
+            printf("%f ", (double)dst[i * 4 + j]);
+        }
+        printf("\n");
+    }
+    return 1;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 3430fd3cd8..253ea9c481 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -26,7 +26,7 @@ config-cc.mak: Makefile
 	    $(call cc-option,-march=armv8.5-a,              CROSS_CC_HAS_ARMV8_5); \
 	    $(call cc-option,-mbranch-protection=standard,  CROSS_CC_HAS_ARMV8_BTI); \
 	    $(call cc-option,-march=armv8.5-a+memtag,       CROSS_CC_HAS_ARMV8_MTE); \
-	    $(call cc-option,-march=armv9-a+sme,            CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
+	    $(call cc-option,-Wa$(COMMA)-march=armv9-a+sme, CROSS_AS_HAS_ARMV9_SME)) 3> config-cc.mak
 -include config-cc.mak
 
 ifneq ($(CROSS_CC_HAS_ARMV8_2),)
@@ -61,11 +61,15 @@ AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
 mte-%: CFLAGS += -march=armv8.5-a+memtag
 endif
 
+ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
+AARCH64_TESTS += sme-outprod1
+endif
+
 ifneq ($(CROSS_CC_HAS_SVE),)
 # System Registers Tests
 AARCH64_TESTS += sysregs
-ifneq ($(CROSS_CC_HAS_ARMV9_SME),)
-sysregs: CFLAGS+=-march=armv9-a+sme -DHAS_ARMV9_SME
+ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
+sysregs: CFLAGS+=-Wa,-march=armv9-a+sme -DHAS_ARMV9_SME
 else
 sysregs: CFLAGS+=-march=armv8.1-a+sve
 endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump
  2023-06-22 15:11 ` [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump Richard Henderson
@ 2023-06-27 12:51   ` Peter Maydell
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2023-06-27 12:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, 22 Jun 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Allow the line length to extend to 548 columns.  While annoyingly wide,
> it's still less confusing than the continuations we print.  Also, the
> default VL used by Linux (and max for A64FX) uses only 140 columns.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] target/arm: Dump ZA[] when active
  2023-06-22 15:11 ` [PATCH 2/4] target/arm: Dump ZA[] when active Richard Henderson
@ 2023-06-27 12:51   ` Peter Maydell
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2023-06-27 12:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, 22 Jun 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Always print each matrix row whole, one per line, so that we
> get the entire matrix in the proper shape.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub
  2023-06-22 15:12 ` [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub Richard Henderson
@ 2023-06-27 13:07   ` Peter Maydell
  2023-06-27 13:29     ` Luis Machado
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2023-06-27 13:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Luis Machado

On Thu, 22 Jun 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Mirror the existing support for SVE.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


> @@ -247,6 +247,61 @@ int aarch64_gdb_set_pauth_reg(CPUARMState *env, uint8_t *buf, int reg)
>      return 0;
>  }
>
> +static int max_svq(ARMCPU *cpu)
> +{
> +    return 32 - clz32(cpu->sme_vq.map);
> +}
> +
> +int aarch64_gdb_get_za_reg(CPUARMState *env, GByteArray *buf, int reg)
> +{
> +    ARMCPU *cpu = env_archcpu(env);
> +    int max_vq = max_svq(cpu);
> +    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
> +    int i;
> +
> +    if (reg >= max_vq * 16) {
> +        return 0;
> +    }
> +
> +    /* If ZA is unset, or reg out of range, the contents are zero. */
> +    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
> +        for (i = 0; i < cur_vq; i++) {
> +            gdb_get_reg128(buf, env->zarray[reg].d[i * 2 + 1],
> +                           env->zarray[reg].d[i * 2]);
> +        }
> +    } else {
> +        cur_vq = 0;
> +    }
> +
> +    for (i = cur_vq; i < max_vq; i++) {
> +        gdb_get_reg128(buf, 0, 0);
> +    }
> +
> +    return max_vq * 16;
> +}
> +
> +int aarch64_gdb_set_za_reg(CPUARMState *env, uint8_t *buf, int reg)
> +{
> +    ARMCPU *cpu = env_archcpu(env);
> +    uint64_t *p = (uint64_t *) buf;
> +    int max_vq = max_svq(cpu);
> +    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
> +    int i;
> +
> +    if (reg >= max_vq * 16) {
> +        return 0;
> +    }
> +
> +    /* If ZA is unset, or reg out of range, the contents are zero. */
> +    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
> +        for (i = 0; i < cur_vq; i++) {
> +            env->zarray[reg].d[i * 2 + 1] = *p++;
> +            env->zarray[reg].d[i * 2 + 0] = *p++;

This looks like it won't do the right thing on a big-endian
system. (And the existing SVE code also looks wrong.)
The gdb_get_reg*() functions handle endianness conversion
from the gdb data buffer; there are no equivalent gdb_set_reg*()
functions so you have to do the byte-swapping yourself.
(This is pretty bug-prone so maybe we should design a better
API here :-))

Compare aarch64_gdb_get/set_fpu_reg() where a gdb_get_reg128()
is matched with a pair of ldq_le_p() and so on.

> +        }
> +    }
> +    return max_vq * 16;
> +}
> +
>  static void output_vector_union_type(GString *s, int reg_width,
>                                       const char *name)
>  {
> @@ -379,3 +434,36 @@ int arm_gen_dynamic_svereg_xml(CPUState *cs, int orig_base_reg)
>      info->num = base_reg - orig_base_reg;
>      return info->num;
>  }
> +
> +/*
> + * Generate the xml for SME, with matrix size set to the maximum
> + * for the cpu.  Returns the number of registers generated.
> + */
> +int arm_gen_dynamic_zareg_xml(CPUState *cs, int base_reg)
> +{
> +    ARMCPU *cpu = ARM_CPU(cs);
> +    GString *s = g_string_new(NULL);
> +    int vq = max_svq(cpu);
> +    int row_count = vq * 16;
> +    int row_width = vq * 128;
> +    int i;
> +
> +    g_string_printf(s, "<?xml version=\"1.0\"?>");
> +    g_string_append_printf(s, "<!DOCTYPE target SYSTEM \"gdb-target.dtd\">");
> +    g_string_append_printf(s, "<feature name=\"org.qemu.gdb.aarch64.za\">");

The patches on the GDB end are still under review, but they
use the feature name org.gnu.gdb.aarch64.sme:

https://inbox.sourceware.org/gdb-patches/20230519102508.14020-18-luis.machado@arm.com/
We should follow that (and only commit our end when the GDB
spec for the XML layout is finalized.

Luis kindly gave me a dump of some example XML to save us
from trying to parse it out of the patch:

  <feature name="org.gnu.gdb.aarch64.sme">
    <flags id="svcr_flags" size="8">
      <field name="SM" start="0" end="0" type="bool"/>
      <field name="ZA" start="1" end="1" type="bool"/>
    </flags>
    <vector id="sme_bv" type="uint8" count="32"/>
    <vector id="sme_bvv" type="sme_bv" count="32"/>
    <reg name="svg" bitsize="64" type="int" regnum="91"/>
    <reg name="svcr" bitsize="64" type="svcr_flags" regnum="92"/>
    <reg name="za" bitsize="8192" type="sme_bvv" regnum="93"/>
  </feature>

> +
> +    output_vector_union_type(s, row_width, "zav");
> +
> +    for (i = 0; i < row_count; i++) {
> +        g_string_append_printf(s,
> +                               "<reg name=\"za%d\" bitsize=\"%d\""
> +                               " regnum=\"%d\" type=\"zav\"/>",
> +                               i, row_width, base_reg + i);
> +    }
> +
> +    g_string_append_printf(s, "</feature>");
> +
> +    cpu->dyn_zareg_xml.num = row_count;
> +    cpu->dyn_zareg_xml.desc = g_string_free(s, false);
> +    return row_count;
> +}

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] target/arm: Fix SME full tile indexing
  2023-06-22 15:12 ` [PATCH 4/4] target/arm: Fix SME full tile indexing Richard Henderson
@ 2023-06-27 13:24   ` Peter Maydell
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Maydell @ 2023-06-27 13:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, 22 Jun 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> For the outer product set of insns, which take an entire matrix
> tile as output, the argument is not a combined tile+column.
> Therefore using get_tile_rowcol was incorrect, as we extracted
> the tile number from itself.
>
> The test case relies only on assembler support for SME, since
> no release of GCC recognizes -march=armv9-a+sme yet.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1620
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Should we cc: stable ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub
  2023-06-27 13:07   ` Peter Maydell
@ 2023-06-27 13:29     ` Luis Machado
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Machado @ 2023-06-27 13:29 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: qemu-devel

On 6/27/23 14:07, Peter Maydell wrote:
> On Thu, 22 Jun 2023 at 16:12, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Mirror the existing support for SVE.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
>
>> @@ -247,6 +247,61 @@ int aarch64_gdb_set_pauth_reg(CPUARMState *env, uint8_t *buf, int reg)
>>      return 0;
>>  }
>>
>> +static int max_svq(ARMCPU *cpu)
>> +{
>> +    return 32 - clz32(cpu->sme_vq.map);
>> +}
>> +
>> +int aarch64_gdb_get_za_reg(CPUARMState *env, GByteArray *buf, int reg)
>> +{
>> +    ARMCPU *cpu = env_archcpu(env);
>> +    int max_vq = max_svq(cpu);
>> +    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
>> +    int i;
>> +
>> +    if (reg >= max_vq * 16) {
>> +        return 0;
>> +    }
>> +
>> +    /* If ZA is unset, or reg out of range, the contents are zero. */
>> +    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
>> +        for (i = 0; i < cur_vq; i++) {
>> +            gdb_get_reg128(buf, env->zarray[reg].d[i * 2 + 1],
>> +                           env->zarray[reg].d[i * 2]);
>> +        }
>> +    } else {
>> +        cur_vq = 0;
>> +    }
>> +
>> +    for (i = cur_vq; i < max_vq; i++) {
>> +        gdb_get_reg128(buf, 0, 0);
>> +    }
>> +
>> +    return max_vq * 16;
>> +}
>> +
>> +int aarch64_gdb_set_za_reg(CPUARMState *env, uint8_t *buf, int reg)
>> +{
>> +    ARMCPU *cpu = env_archcpu(env);
>> +    uint64_t *p = (uint64_t *) buf;
>> +    int max_vq = max_svq(cpu);
>> +    int cur_vq = EX_TBFLAG_A64(env->hflags, SVL) + 1;
>> +    int i;
>> +
>> +    if (reg >= max_vq * 16) {
>> +        return 0;
>> +    }
>> +
>> +    /* If ZA is unset, or reg out of range, the contents are zero. */
>> +    if (FIELD_EX64(env->svcr, SVCR, ZA) && reg < cur_vq * 16) {
>> +        for (i = 0; i < cur_vq; i++) {
>> +            env->zarray[reg].d[i * 2 + 1] = *p++;
>> +            env->zarray[reg].d[i * 2 + 0] = *p++;
>
> This looks like it won't do the right thing on a big-endian
> system. (And the existing SVE code also looks wrong.)
> The gdb_get_reg*() functions handle endianness conversion
> from the gdb data buffer; there are no equivalent gdb_set_reg*()
> functions so you have to do the byte-swapping yourself.
> (This is pretty bug-prone so maybe we should design a better
> API here :-))
>
> Compare aarch64_gdb_get/set_fpu_reg() where a gdb_get_reg128()
> is matched with a pair of ldq_le_p() and so on.
>
>> +        }
>> +    }
>> +    return max_vq * 16;
>> +}
>> +
>>  static void output_vector_union_type(GString *s, int reg_width,
>>                                       const char *name)
>>  {
>> @@ -379,3 +434,36 @@ int arm_gen_dynamic_svereg_xml(CPUState *cs, int orig_base_reg)
>>      info->num = base_reg - orig_base_reg;
>>      return info->num;
>>  }
>> +
>> +/*
>> + * Generate the xml for SME, with matrix size set to the maximum
>> + * for the cpu.  Returns the number of registers generated.
>> + */
>> +int arm_gen_dynamic_zareg_xml(CPUState *cs, int base_reg)
>> +{
>> +    ARMCPU *cpu = ARM_CPU(cs);
>> +    GString *s = g_string_new(NULL);
>> +    int vq = max_svq(cpu);
>> +    int row_count = vq * 16;
>> +    int row_width = vq * 128;
>> +    int i;
>> +
>> +    g_string_printf(s, "<?xml version=\"1.0\"?>");
>> +    g_string_append_printf(s, "<!DOCTYPE target SYSTEM \"gdb-target.dtd\">");
>> +    g_string_append_printf(s, "<feature name=\"org.qemu.gdb.aarch64.za\">");

Thanks for cc-ing me in the thread Peter.

>
> The patches on the GDB end are still under review, but they
> use the feature name org.gnu.gdb.aarch64.sme:
>
> https://inbox.sourceware.org/gdb-patches/20230519102508.14020-18-luis.machado@arm.com/
> We should follow that (and only commit our end when the GDB
> spec for the XML layout is finalized.
>
> Luis kindly gave me a dump of some example XML to save us
> from trying to parse it out of the patch:
>
>   <feature name="org.gnu.gdb.aarch64.sme">
>     <flags id="svcr_flags" size="8">
>       <field name="SM" start="0" end="0" type="bool"/>
>       <field name="ZA" start="1" end="1" type="bool"/>
>     </flags>
>     <vector id="sme_bv" type="uint8" count="32"/>
>     <vector id="sme_bvv" type="sme_bv" count="32"/>

Just to clarify, for convenience I've defined ZA as a 2-dimensional array of bytes. That way gdb can do things like:

(gdb) p $za
$1 = {{0 <repeats 32 times>} <repeats 32 times>}

Or you can access a particular row or col as needed.

Here SVL is 32 bytes. So the final size of ZA is 1024 (8192 bits).

GDB will also take care of providing the numerous pseudo-registers that read/write to portions of ZA.

>     <reg name="svg" bitsize="64" type="int" regnum="91"/>

SVG is just like VG in SVE, but for SME. It is SVL / 8.

>     <reg name="svcr" bitsize="64" type="svcr_flags" regnum="92"/>

SVCR tracks the SM and ZA bits, which QEMU must provide. I haven't decided if we want to make that read-only or read/write. I'm tempted to make it read-only.

I haven't done any testing of bare metal ZA support yet. Please let me know what you see.

>     <reg name="za" bitsize="8192" type="sme_bvv" regnum="93"/>
>   </feature>
>
>> +
>> +    output_vector_union_type(s, row_width, "zav");
>> +
>> +    for (i = 0; i < row_count; i++) {
>> +        g_string_append_printf(s,
>> +                               "<reg name=\"za%d\" bitsize=\"%d\""
>> +                               " regnum=\"%d\" type=\"zav\"/>",
>> +                               i, row_width, base_reg + i);
>> +    }
>> +
>> +    g_string_append_printf(s, "</feature>");
>> +
>> +    cpu->dyn_zareg_xml.num = row_count;
>> +    cpu->dyn_zareg_xml.desc = g_string_free(s, false);
>> +    return row_count;
>> +}
>
> thanks
> -- PMM

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] target/arm: Fix SME full tile indexing
  2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
                   ` (3 preceding siblings ...)
  2023-06-22 15:12 ` [PATCH 4/4] target/arm: Fix SME full tile indexing Richard Henderson
@ 2023-06-27 13:36 ` Peter Maydell
  2023-06-28  6:54   ` Richard Henderson
  4 siblings, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2023-06-27 13:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, 22 Jun 2023 at 16:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Fix #1620 and add its test case.
> Several cleanups to aid debugging ZA[].  :-)

I'm going to apply patches 1,2 and 4 to target-arm.next.
I've tagged 4 as cc: qemu-stable (but will remove that if
you disagree).

thanks
-- PMM


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] target/arm: Fix SME full tile indexing
  2023-06-27 13:36 ` [PATCH 0/4] " Peter Maydell
@ 2023-06-28  6:54   ` Richard Henderson
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2023-06-28  6:54 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On 6/27/23 15:36, Peter Maydell wrote:
> On Thu, 22 Jun 2023 at 16:12, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Fix #1620 and add its test case.
>> Several cleanups to aid debugging ZA[].  :-)
> 
> I'm going to apply patches 1,2 and 4 to target-arm.next.
> I've tagged 4 as cc: qemu-stable (but will remove that if
> you disagree).

No, it should go back to 8.0.


r~



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-06-28  6:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-22 15:11 [PATCH 0/4] target/arm: Fix SME full tile indexing Richard Henderson
2023-06-22 15:11 ` [PATCH 1/4] target/arm: Avoid splitting Zregs across lines in dump Richard Henderson
2023-06-27 12:51   ` Peter Maydell
2023-06-22 15:11 ` [PATCH 2/4] target/arm: Dump ZA[] when active Richard Henderson
2023-06-27 12:51   ` Peter Maydell
2023-06-22 15:12 ` [PATCH 3/4] target/arm: Support reading ZA[] from gdbstub Richard Henderson
2023-06-27 13:07   ` Peter Maydell
2023-06-27 13:29     ` Luis Machado
2023-06-22 15:12 ` [PATCH 4/4] target/arm: Fix SME full tile indexing Richard Henderson
2023-06-27 13:24   ` Peter Maydell
2023-06-27 13:36 ` [PATCH 0/4] " Peter Maydell
2023-06-28  6:54   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).