qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-8.2 0/5] target/arm: Implement cortex-a710
@ 2023-08-10  2:35 Richard Henderson
  2023-08-10  2:35 ` [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1 Richard Henderson
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This is one of the first generation Armv9 cores, and gives us something
concrete to test in that area.  Notably, it supports MTE.

The first patch is a bug fix of sorts for neoverse-v1, as we don't,
and won't, support FEAT_TRF.

The only thing missing for the a710 is FEAT_MPAM.  I haven't looked
at that properly, and I believe that there may already be some work
done on that within Linaro -- even if a stub implementation.


r~


Richard Henderson (5):
  target/arm: Disable FEAT_TRF in neoverse-v1
  target/arm: Reduce dcz_blocksize to uint8_t
  target/arm: Allow cpu to configure GM blocksize
  target/arm: Support more GM blocksizes
  target/arm: Implement cortex-a710

 docs/system/arm/virt.rst       |   1 +
 target/arm/cpu.h               |   5 +-
 target/arm/internals.h         |   6 --
 target/arm/tcg/translate.h     |   2 +
 hw/arm/virt.c                  |   1 +
 target/arm/helper.c            |  11 ++-
 target/arm/tcg/cpu64.c         | 172 ++++++++++++++++++++++++++++++++-
 target/arm/tcg/mte_helper.c    |  91 ++++++++++++++---
 target/arm/tcg/translate-a64.c |   5 +-
 9 files changed, 263 insertions(+), 31 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1
  2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
@ 2023-08-10  2:35 ` Richard Henderson
  2023-08-10  9:16   ` Peter Maydell
  2023-08-10  2:35 ` [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t Richard Henderson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-stable

Self-hosted trace is out of scope for QEMU.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/cpu64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 8019f00bc3..60e5f034d9 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -618,7 +618,7 @@ static void aarch64_neoverse_v1_initfn(Object *obj)
     cpu->dcz_blocksize = 4;
     cpu->id_aa64afr0 = 0x00000000;
     cpu->id_aa64afr1 = 0x00000000;
-    cpu->isar.id_aa64dfr0  = 0x000001f210305519ull;
+    cpu->isar.id_aa64dfr0  = 0x000000f210305519ull; /* w/o FEAT_TRF */
     cpu->isar.id_aa64dfr1 = 0x00000000;
     cpu->isar.id_aa64isar0 = 0x1011111110212120ull; /* with FEAT_RNG */
     cpu->isar.id_aa64isar1 = 0x0111000001211032ull;
@@ -628,7 +628,7 @@ static void aarch64_neoverse_v1_initfn(Object *obj)
     cpu->isar.id_aa64pfr0  = 0x1101110120111112ull; /* GIC filled in later */
     cpu->isar.id_aa64pfr1  = 0x0000000000000020ull;
     cpu->id_afr0       = 0x00000000;
-    cpu->isar.id_dfr0  = 0x15011099;
+    cpu->isar.id_dfr0  = 0x05011099; /* w/o FEAT_TRF */
     cpu->isar.id_isar0 = 0x02101110;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232042;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t
  2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
  2023-08-10  2:35 ` [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1 Richard Henderson
@ 2023-08-10  2:35 ` Richard Henderson
  2023-08-10 14:09   ` Peter Maydell
  2023-08-10  2:35 ` [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize Richard Henderson
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This value is only 4 bits wide.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 88e5accda6..7fedbb34ba 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1074,7 +1074,8 @@ struct ArchCPU {
     bool prop_lpa2;
 
     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
-    uint32_t dcz_blocksize;
+    uint8_t dcz_blocksize;
+
     uint64_t rvbar_prop; /* Property/input signals.  */
 
     /* Configurable aspects of GIC cpu interface (which is part of the CPU) */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize
  2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
  2023-08-10  2:35 ` [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1 Richard Henderson
  2023-08-10  2:35 ` [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t Richard Henderson
@ 2023-08-10  2:35 ` Richard Henderson
  2023-08-10 14:13   ` Peter Maydell
  2023-08-10  2:35 ` [PATCH 4/5] target/arm: Support more GM blocksizes Richard Henderson
  2023-08-10  2:35 ` [PATCH 5/5] target/arm: Implement cortex-a710 Richard Henderson
  4 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Previously we hard-coded the blocksize with GMID_EL1_BS.
But the value we choose for -cpu max does not match the
value that cortex-a710 uses.

Mirror the way we handle dcz_blocksize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               |  2 ++
 target/arm/internals.h         |  6 -----
 target/arm/tcg/translate.h     |  2 ++
 target/arm/helper.c            | 11 +++++---
 target/arm/tcg/cpu64.c         |  1 +
 target/arm/tcg/mte_helper.c    | 46 ++++++++++++++++++++++------------
 target/arm/tcg/translate-a64.c |  5 ++--
 7 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7fedbb34ba..dfa02eb4dc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1075,6 +1075,8 @@ struct ArchCPU {
 
     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
     uint8_t dcz_blocksize;
+    /* GM blocksize, in log_2(words), ie low 4 bits of GMID_EL0 */
+    uint8_t gm_blocksize;
 
     uint64_t rvbar_prop; /* Property/input signals.  */
 
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 0f01bc32a8..6fcf12c178 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1243,12 +1243,6 @@ void arm_log_exception(CPUState *cs);
 
 #endif /* !CONFIG_USER_ONLY */
 
-/*
- * The log2 of the words in the tag block, for GMID_EL1.BS.
- * The is the maximum, 256 bytes, which manipulates 64-bits of tags.
- */
-#define GMID_EL1_BS  6
-
 /*
  * SVE predicates are 1/8 the size of SVE vectors, and cannot use
  * the same simd_desc() encoding due to restrictions on size.
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index d1cacff0b2..f748ba6f39 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -151,6 +151,8 @@ typedef struct DisasContext {
     int8_t btype;
     /* A copy of cpu->dcz_blocksize. */
     uint8_t dcz_blocksize;
+    /* A copy of cpu->gm_blocksize. */
+    uint8_t gm_blocksize;
     /* True if this page is guarded.  */
     bool guarded_page;
     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 50f61e42ca..f5effa30f7 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7643,10 +7643,6 @@ static const ARMCPRegInfo mte_reginfo[] = {
       .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 6,
       .access = PL1_RW, .accessfn = access_mte,
       .fieldoffset = offsetof(CPUARMState, cp15.gcr_el1) },
-    { .name = "GMID_EL1", .state = ARM_CP_STATE_AA64,
-      .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 4,
-      .access = PL1_R, .accessfn = access_aa64_tid5,
-      .type = ARM_CP_CONST, .resetvalue = GMID_EL1_BS },
     { .name = "TCO", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 7,
       .type = ARM_CP_NO_RAW,
@@ -9237,6 +9233,13 @@ void register_cp_regs_for_features(ARMCPU *cpu)
      * then define only a RAZ/WI version of PSTATE.TCO.
      */
     if (cpu_isar_feature(aa64_mte, cpu)) {
+        ARMCPRegInfo gmid_reginfo = {
+            .name = "GMID_EL1", .state = ARM_CP_STATE_AA64,
+            .opc0 = 3, .opc1 = 1, .crn = 0, .crm = 0, .opc2 = 4,
+            .access = PL1_R, .accessfn = access_aa64_tid5,
+            .type = ARM_CP_CONST, .resetvalue = cpu->gm_blocksize,
+        };
+        define_one_arm_cp_reg(cpu, &gmid_reginfo);
         define_arm_cp_regs(cpu, mte_reginfo);
         define_arm_cp_regs(cpu, mte_el0_cacheop_reginfo);
     } else if (cpu_isar_feature(aa64_mte_insn_reg, cpu)) {
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 60e5f034d9..5ca9070c14 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -868,6 +868,7 @@ void aarch64_max_tcg_initfn(Object *obj)
     cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
     cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
+    cpu->gm_blocksize = 6;  /*  256 bytes */
 
     cpu->sve_vq.supported = MAKE_64BIT_MASK(0, ARM_MAX_VQ);
     cpu->sme_vq.supported = SVE_VQ_POW2_MAP;
diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
index 9c64def081..3640c6e57f 100644
--- a/target/arm/tcg/mte_helper.c
+++ b/target/arm/tcg/mte_helper.c
@@ -421,46 +421,54 @@ void HELPER(st2g_stub)(CPUARMState *env, uint64_t ptr)
     }
 }
 
-#define LDGM_STGM_SIZE  (4 << GMID_EL1_BS)
-
 uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
 {
     int mmu_idx = cpu_mmu_index(env, false);
     uintptr_t ra = GETPC();
+    int gm_bs = env_archcpu(env)->gm_blocksize;
+    int gm_bs_bytes = 4 << gm_bs;
     void *tag_mem;
 
-    ptr = QEMU_ALIGN_DOWN(ptr, LDGM_STGM_SIZE);
+    ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
 
     /* Trap if accessing an invalid page.  */
     tag_mem = allocation_tag_mem(env, mmu_idx, ptr, MMU_DATA_LOAD,
-                                 LDGM_STGM_SIZE, MMU_DATA_LOAD,
-                                 LDGM_STGM_SIZE / (2 * TAG_GRANULE), ra);
+                                 gm_bs_bytes, MMU_DATA_LOAD,
+                                 gm_bs_bytes / (2 * TAG_GRANULE), ra);
 
     /* The tag is squashed to zero if the page does not support tags.  */
     if (!tag_mem) {
         return 0;
     }
 
-    QEMU_BUILD_BUG_ON(GMID_EL1_BS != 6);
     /*
-     * We are loading 64-bits worth of tags.  The ordering of elements
-     * within the word corresponds to a 64-bit little-endian operation.
+     * The ordering of elements within the word corresponds to
+     * a little-endian operation.
      */
-    return ldq_le_p(tag_mem);
+    switch (gm_bs) {
+    case 6:
+        /* 256 bytes -> 16 tags -> 64 result bits */
+        return ldq_le_p(tag_mem);
+    default:
+        /* cpu configured with unsupported gm blocksize. */
+        g_assert_not_reached();
+    }
 }
 
 void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
 {
     int mmu_idx = cpu_mmu_index(env, false);
     uintptr_t ra = GETPC();
+    int gm_bs = env_archcpu(env)->gm_blocksize;
+    int gm_bs_bytes = 4 << gm_bs;
     void *tag_mem;
 
-    ptr = QEMU_ALIGN_DOWN(ptr, LDGM_STGM_SIZE);
+    ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
 
     /* Trap if accessing an invalid page.  */
     tag_mem = allocation_tag_mem(env, mmu_idx, ptr, MMU_DATA_STORE,
-                                 LDGM_STGM_SIZE, MMU_DATA_LOAD,
-                                 LDGM_STGM_SIZE / (2 * TAG_GRANULE), ra);
+                                 gm_bs_bytes, MMU_DATA_LOAD,
+                                 gm_bs_bytes / (2 * TAG_GRANULE), ra);
 
     /*
      * Tag store only happens if the page support tags,
@@ -470,12 +478,18 @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
         return;
     }
 
-    QEMU_BUILD_BUG_ON(GMID_EL1_BS != 6);
     /*
-     * We are storing 64-bits worth of tags.  The ordering of elements
-     * within the word corresponds to a 64-bit little-endian operation.
+     * The ordering of elements within the word corresponds to
+     * a little-endian operation.
      */
-    stq_le_p(tag_mem, val);
+    switch (gm_bs) {
+    case 6:
+        stq_le_p(tag_mem, val);
+        break;
+    default:
+        /* cpu configured with unsupported gm blocksize. */
+        g_assert_not_reached();
+    }
 }
 
 void HELPER(stzgm_tags)(CPUARMState *env, uint64_t ptr, uint64_t val)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5fa1257d32..d822d9a9af 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -3786,7 +3786,7 @@ static bool trans_STGM(DisasContext *s, arg_ldst_tag *a)
         gen_helper_stgm(cpu_env, addr, tcg_rt);
     } else {
         MMUAccessType acc = MMU_DATA_STORE;
-        int size = 4 << GMID_EL1_BS;
+        int size = 4 << s->gm_blocksize;
 
         clean_addr = clean_data_tbi(s, addr);
         tcg_gen_andi_i64(clean_addr, clean_addr, -size);
@@ -3818,7 +3818,7 @@ static bool trans_LDGM(DisasContext *s, arg_ldst_tag *a)
         gen_helper_ldgm(tcg_rt, cpu_env, addr);
     } else {
         MMUAccessType acc = MMU_DATA_LOAD;
-        int size = 4 << GMID_EL1_BS;
+        int size = 4 << s->gm_blocksize;
 
         clean_addr = clean_data_tbi(s, addr);
         tcg_gen_andi_i64(clean_addr, clean_addr, -size);
@@ -13900,6 +13900,7 @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->cp_regs = arm_cpu->cp_regs;
     dc->features = env->features;
     dc->dcz_blocksize = arm_cpu->dcz_blocksize;
+    dc->gm_blocksize = arm_cpu->gm_blocksize;
 
 #ifdef CONFIG_USER_ONLY
     /* In sve_probe_page, we assume TBI is enabled. */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] target/arm: Support more GM blocksizes
  2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
                   ` (2 preceding siblings ...)
  2023-08-10  2:35 ` [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize Richard Henderson
@ 2023-08-10  2:35 ` Richard Henderson
  2023-08-10 14:23   ` Peter Maydell
  2023-08-10  2:35 ` [PATCH 5/5] target/arm: Implement cortex-a710 Richard Henderson
  4 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Support all of the easy GM block sizes.
Use direct memory operations, since the pointers are aligned.

While BS=2 (16 bytes, 1 tag) is a legal setting, that requires
an atomic store of one nibble.  This is not difficult, but there
is also no point in supporting it until required.

Note that cortex-a710 sets GM blocksize to match its cacheline
size of 64 bytes.  I expect many implementations will also
match the cacheline, which makes 16 bytes very unlikely.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/mte_helper.c | 61 ++++++++++++++++++++++++++++++++-----
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
index 3640c6e57f..6faf4e42d5 100644
--- a/target/arm/tcg/mte_helper.c
+++ b/target/arm/tcg/mte_helper.c
@@ -428,6 +428,8 @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
     int gm_bs = env_archcpu(env)->gm_blocksize;
     int gm_bs_bytes = 4 << gm_bs;
     void *tag_mem;
+    uint64_t ret;
+    int shift;
 
     ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
 
@@ -443,16 +445,39 @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
 
     /*
      * The ordering of elements within the word corresponds to
-     * a little-endian operation.
+     * a little-endian operation.  Computation of shift comes from
+     *
+     *     index = address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>
+     *     data<index*4+3:index*4> = tag
+     *
+     * Because of the alignment of ptr above, BS=6 has shift=0.
+     * All memory operations are aligned.
      */
     switch (gm_bs) {
-    case 6:
-        /* 256 bytes -> 16 tags -> 64 result bits */
-        return ldq_le_p(tag_mem);
-    default:
+    case 2:
         /* cpu configured with unsupported gm blocksize. */
         g_assert_not_reached();
+    case 3:
+        /* 32 bytes -> 2 tags -> 8 result bits */
+        ret = *(uint8_t *)tag_mem;
+        break;
+    case 4:
+        /* 64 bytes -> 4 tags -> 16 result bits */
+        ret = cpu_to_le16(*(uint16_t *)tag_mem);
+        break;
+    case 5:
+        /* 128 bytes -> 8 tags -> 32 result bits */
+        ret = cpu_to_le32(*(uint32_t *)tag_mem);
+        break;
+    case 6:
+        /* 256 bytes -> 16 tags -> 64 result bits */
+        return cpu_to_le64(*(uint64_t *)tag_mem);
+    default:
+        /* cpu configured with invalid gm blocksize. */
+        g_assert_not_reached();
     }
+    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
+    return ret << shift;
 }
 
 void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
@@ -462,6 +487,7 @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
     int gm_bs = env_archcpu(env)->gm_blocksize;
     int gm_bs_bytes = 4 << gm_bs;
     void *tag_mem;
+    int shift;
 
     ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
 
@@ -480,14 +506,33 @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
 
     /*
      * The ordering of elements within the word corresponds to
-     * a little-endian operation.
+     * a little-endian operation.  See LDGM for comments on shift.
+     * All memory operations are aligned.
      */
+    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
+    val >>= shift;
     switch (gm_bs) {
+    case 2:
+        /* cpu configured with unsupported gm blocksize. */
+        g_assert_not_reached();
+    case 3:
+        /* 32 bytes -> 2 tags -> 8 result bits */
+        *(uint8_t *)tag_mem = val;
+        break;
+    case 4:
+        /* 64 bytes -> 4 tags -> 16 result bits */
+        *(uint16_t *)tag_mem = cpu_to_le16(val);
+        break;
+    case 5:
+        /* 128 bytes -> 8 tags -> 32 result bits */
+        *(uint32_t *)tag_mem = cpu_to_le32(val);
+        break;
     case 6:
-        stq_le_p(tag_mem, val);
+        /* 256 bytes -> 16 tags -> 64 result bits */
+        *(uint64_t *)tag_mem = cpu_to_le64(val);
         break;
     default:
-        /* cpu configured with unsupported gm blocksize. */
+        /* cpu configured with invalid gm blocksize. */
         g_assert_not_reached();
     }
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
                   ` (3 preceding siblings ...)
  2023-08-10  2:35 ` [PATCH 4/5] target/arm: Support more GM blocksizes Richard Henderson
@ 2023-08-10  2:35 ` Richard Henderson
  2023-08-10  4:11   ` Richard Henderson
  2023-08-10 15:49   ` Peter Maydell
  4 siblings, 2 replies; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  2:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

The cortex-a710 is a first generation ARMv9.0-A processor.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 docs/system/arm/virt.rst |   1 +
 hw/arm/virt.c            |   1 +
 target/arm/tcg/cpu64.c   | 167 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 169 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 51cdac6841..e1697ac8f4 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -58,6 +58,7 @@ Supported guest CPU types:
 - ``cortex-a57`` (64-bit)
 - ``cortex-a72`` (64-bit)
 - ``cortex-a76`` (64-bit)
+- ``cortex-a710`` (64-bit)
 - ``a64fx`` (64-bit)
 - ``host`` (with KVM only)
 - ``neoverse-n1`` (64-bit)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7d9dbc2663..d1522c305d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
     ARM_CPU_TYPE_NAME("cortex-a55"),
     ARM_CPU_TYPE_NAME("cortex-a72"),
     ARM_CPU_TYPE_NAME("cortex-a76"),
+    ARM_CPU_TYPE_NAME("cortex-a710"),
     ARM_CPU_TYPE_NAME("a64fx"),
     ARM_CPU_TYPE_NAME("neoverse-n1"),
     ARM_CPU_TYPE_NAME("neoverse-v1"),
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 5ca9070c14..6f555a39ce 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -700,6 +700,172 @@ static void aarch64_neoverse_v1_initfn(Object *obj)
     aarch64_add_sve_properties(obj);
 }
 
+static const ARMCPRegInfo cortex_a710_cp_reginfo[] = {
+    /* TODO: trapped by HCR_EL2.TIDCP */
+    { .name = "CPUACTLR4_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 3,
+      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUECTLR2_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 5,
+      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUACTLR5_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 0,
+      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUACTLR6_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 1,
+      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUACTLR7_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 2,
+      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUPPMCR4_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 4,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUPPMCR5_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 5,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUPPMCR6_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 6,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUACTLR_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 4, .opc2 = 0,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUPOR2_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 8, .opc2 = 4,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "CPUPMR2_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 8, .opc2 = 5,
+      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+
+    /*
+     * Stub RAMINDEX, as we don't actually implement caches,
+     * BTB, or anything else with cpu internal memory.
+     * "Read" zeros into the IDATA* and DDATA* output registers.
+     */
+    { .name = "RAMINDEX_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 0,
+      .access = PL3_W, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "IDATA0_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 0,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "IDATA1_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 1,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "IDATA2_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 2,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "DDATA0_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 0,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "DDATA1_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 1,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+    { .name = "DDATA2_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 2,
+      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+};
+
+static void define_cortex_a710_cp_reginfo(ARMCPU *cpu)
+{
+    /*
+     * The Cortex A710 has all of the Neoverse V1's IMPDEF
+     * registers and a few more of its own.
+     */
+    define_arm_cp_regs(cpu, neoverse_n1_cp_reginfo);
+    define_arm_cp_regs(cpu, neoverse_v1_cp_reginfo);
+    define_arm_cp_regs(cpu, cortex_a710_cp_reginfo);
+}
+
+static void aarch64_a710_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    cpu->dtb_compatible = "arm,cortex-a710";
+    set_feature(&cpu->env, ARM_FEATURE_V8);
+    set_feature(&cpu->env, ARM_FEATURE_NEON);
+    set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
+    set_feature(&cpu->env, ARM_FEATURE_AARCH64);
+    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
+    set_feature(&cpu->env, ARM_FEATURE_EL2);
+    set_feature(&cpu->env, ARM_FEATURE_EL3);
+    set_feature(&cpu->env, ARM_FEATURE_PMU);
+
+    /* Ordered by Section B.4: AArch64 registers */
+    cpu->midr = 0x412FD471;          /* r2p1 */
+    cpu->revidr = cpu->midr;         /* mirror midr: "no significance" */
+    cpu->isar.id_pfr0  = 0x21110131;
+    cpu->isar.id_pfr1  = 0x00010000; /* GIC filled in later */
+    cpu->isar.id_dfr0  = 0x06011099; /* w/o FEAT_TRF */
+    cpu->id_afr0       = 0;
+    cpu->isar.id_mmfr0 = 0x10201105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01260000;
+    cpu->isar.id_mmfr3 = 0x02122211;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232042;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x00010142;
+    cpu->isar.id_isar5 = 0x11011121;
+    cpu->isar.id_mmfr4 = 0x21021110;
+    cpu->isar.id_isar6 = 0x01111111;
+    cpu->isar.mvfr0    = 0x10110222;
+    cpu->isar.mvfr1    = 0x13211111;
+    cpu->isar.mvfr2    = 0x00000043;
+    cpu->isar.id_pfr2  = 0x00000011;
+    /* GIC filled in later; w/o FEAT_MPAM */
+    cpu->isar.id_aa64pfr0  = 0x1201101120111112ull;
+    cpu->isar.id_aa64pfr1  = 0x0000000000000221ull;
+    cpu->isar.id_aa64zfr0  = 0x0000110100110021ull;
+    cpu->isar.id_aa64dfr0  = 0x000000f210305619ull; /* w/o FEAT_{TRF,TRBE} */
+    cpu->isar.id_aa64dfr1  = 0;
+    cpu->id_aa64afr0       = 0;
+    cpu->id_aa64afr1       = 0;
+    cpu->isar.id_aa64isar0 = 0x0221111110212120ull;
+    cpu->isar.id_aa64isar1 = 0x0010111101211032ull;
+    cpu->isar.id_aa64mmfr0 = 0x0000022200101122ull;
+    cpu->isar.id_aa64mmfr1 = 0x0000000010212122ull;
+    cpu->isar.id_aa64mmfr2 = 0x1221011110101011ull;
+    cpu->clidr             = 0x0000002282000023ull;
+    cpu->gm_blocksize      = 4;
+    cpu->ctr               = 0x00000004b444c004ull; /* with DIC set */
+    cpu->dcz_blocksize     = 4;
+    /* TODO FEAT_MPAM: mpamidr_el1 = 0x0000_0001_0006_003f */
+
+    /* Section B.5.2: PMCR_EL0 */
+    cpu->isar.reset_pmcr_el0 = 0xa000;  /* with 20 counters */
+
+    /* Section B.6.7: ICH_VTR_EL2 */
+    cpu->gic_num_lrs = 4;
+    cpu->gic_vpribits = 5;
+    cpu->gic_vprebits = 5;
+    cpu->gic_pribits = 5;
+
+    /* Section 14: Scalable Vector Extensions support */
+    cpu->sve_vq.supported = 1 << 0;  /* 128bit */
+
+    /*
+     * The cortex-a710 TRM does not list CCSIDR values.
+     * The layout of the cache is in text in Table 7-1 (L1-I),
+     * Table 8-1 (L1-D), and Table 9-1 (L2).
+     *
+     * L1: 4-way set associative 64-byte line size, total either 32K or 64K.
+     * We pick 64K, so this has 256 sets.
+     *
+     * L2: 8-way set associative 64 byte line size, total either 256K or 512K.
+     * We pick 512K, so this has 1024 sets.
+     */
+    cpu->ccsidr[0] = 0x000000ff0000001aull; /* 64KB L1 dcache */
+    cpu->ccsidr[1] = 0x000000ff0000001aull; /* 64KB L1 icache */
+    cpu->ccsidr[2] = 0x000003ff0000003aull; /* 512KB L2 cache */
+
+    /* ??? Not documented -- copied from neoverse-v1 */
+    cpu->reset_sctlr = 0x30c50838;
+
+    define_cortex_a710_cp_reginfo(cpu);
+    aarch64_add_pauth_properties(obj);
+    aarch64_add_sve_properties(obj);
+}
+
 /*
  * -cpu max: a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu32.c;
@@ -889,6 +1055,7 @@ static const ARMCPUInfo aarch64_cpus[] = {
     { .name = "cortex-a55",         .initfn = aarch64_a55_initfn },
     { .name = "cortex-a72",         .initfn = aarch64_a72_initfn },
     { .name = "cortex-a76",         .initfn = aarch64_a76_initfn },
+    { .name = "cortex-a710",        .initfn = aarch64_a710_initfn },
     { .name = "a64fx",              .initfn = aarch64_a64fx_initfn },
     { .name = "neoverse-n1",        .initfn = aarch64_neoverse_n1_initfn },
     { .name = "neoverse-v1",        .initfn = aarch64_neoverse_v1_initfn },
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10  2:35 ` [PATCH 5/5] target/arm: Implement cortex-a710 Richard Henderson
@ 2023-08-10  4:11   ` Richard Henderson
  2023-08-10 15:49   ` Peter Maydell
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2023-08-10  4:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

On 8/9/23 19:35, Richard Henderson wrote:
> +static const ARMCPRegInfo cortex_a710_cp_reginfo[] = {
> +    /* TODO: trapped by HCR_EL2.TIDCP */
> +    { .name = "CPUACTLR4_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 3,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUECTLR2_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 5,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },

Oops, duplicate CPUECTLR2_EL1.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1
  2023-08-10  2:35 ` [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1 Richard Henderson
@ 2023-08-10  9:16   ` Peter Maydell
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2023-08-10  9:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm, qemu-stable

On Thu, 10 Aug 2023 at 03:36, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Self-hosted trace is out of scope for QEMU.

True, but we already disable this in arm_cpu_realizefn()
along with FEAT_SPE, FEAT_AMU and a bunch of other "out
of scope" or not-yet-implemented features. I thought
it more reliable to do the disabling globally rather
than rely on everybody adding a new CPU to remember
to adjust the ID register values. (Plus if we ever do
implement some approximation to one of these features
we only need to change one place in the code, not
re-look-up all the CPU ID register values.) See
commit 7d8c283e10dd81.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t
  2023-08-10  2:35 ` [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t Richard Henderson
@ 2023-08-10 14:09   ` Peter Maydell
  2023-08-10 19:02     ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2023-08-10 14:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 10 Aug 2023 at 03:37, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This value is only 4 bits wide.

True. Any particular reason to change the type, though?

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize
  2023-08-10  2:35 ` [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize Richard Henderson
@ 2023-08-10 14:13   ` Peter Maydell
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2023-08-10 14:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 10 Aug 2023 at 03:37, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Previously we hard-coded the blocksize with GMID_EL1_BS.
> But the value we choose for -cpu max does not match the
> value that cortex-a710 uses.
>
> Mirror the way we handle dcz_blocksize.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] target/arm: Support more GM blocksizes
  2023-08-10  2:35 ` [PATCH 4/5] target/arm: Support more GM blocksizes Richard Henderson
@ 2023-08-10 14:23   ` Peter Maydell
  2023-08-10 19:10     ` Richard Henderson
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2023-08-10 14:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 10 Aug 2023 at 03:36, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Support all of the easy GM block sizes.
> Use direct memory operations, since the pointers are aligned.
>
> While BS=2 (16 bytes, 1 tag) is a legal setting, that requires
> an atomic store of one nibble.  This is not difficult, but there
> is also no point in supporting it until required.
>
> Note that cortex-a710 sets GM blocksize to match its cacheline
> size of 64 bytes.  I expect many implementations will also
> match the cacheline, which makes 16 bytes very unlikely.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/mte_helper.c | 61 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c
> index 3640c6e57f..6faf4e42d5 100644
> --- a/target/arm/tcg/mte_helper.c
> +++ b/target/arm/tcg/mte_helper.c
> @@ -428,6 +428,8 @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
>      int gm_bs = env_archcpu(env)->gm_blocksize;
>      int gm_bs_bytes = 4 << gm_bs;
>      void *tag_mem;
> +    uint64_t ret;
> +    int shift;
>
>      ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
>
> @@ -443,16 +445,39 @@ uint64_t HELPER(ldgm)(CPUARMState *env, uint64_t ptr)
>
>      /*
>       * The ordering of elements within the word corresponds to
> -     * a little-endian operation.
> +     * a little-endian operation.  Computation of shift comes from
> +     *
> +     *     index = address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>
> +     *     data<index*4+3:index*4> = tag
> +     *
> +     * Because of the alignment of ptr above, BS=6 has shift=0.
> +     * All memory operations are aligned.
>       */
>      switch (gm_bs) {
> -    case 6:
> -        /* 256 bytes -> 16 tags -> 64 result bits */
> -        return ldq_le_p(tag_mem);
> -    default:
> +    case 2:
>          /* cpu configured with unsupported gm blocksize. */
>          g_assert_not_reached();
> +    case 3:
> +        /* 32 bytes -> 2 tags -> 8 result bits */
> +        ret = *(uint8_t *)tag_mem;
> +        break;
> +    case 4:
> +        /* 64 bytes -> 4 tags -> 16 result bits */
> +        ret = cpu_to_le16(*(uint16_t *)tag_mem);

Does this really make a difference compared to ldw_le_p() ?

> +        break;
> +    case 5:
> +        /* 128 bytes -> 8 tags -> 32 result bits */
> +        ret = cpu_to_le32(*(uint32_t *)tag_mem);
> +        break;
> +    case 6:
> +        /* 256 bytes -> 16 tags -> 64 result bits */
> +        return cpu_to_le64(*(uint64_t *)tag_mem);
> +    default:
> +        /* cpu configured with invalid gm blocksize. */
> +        g_assert_not_reached();

Is it worth having an assert in CPU realize for an invalid
blocksize, so that we can catch duff ID register values
without having to rely on there being a test run that
uses ldgm/stgm ?

>      }
> +    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
> +    return ret << shift;
>  }
>
>  void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
> @@ -462,6 +487,7 @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
>      int gm_bs = env_archcpu(env)->gm_blocksize;
>      int gm_bs_bytes = 4 << gm_bs;
>      void *tag_mem;
> +    int shift;
>
>      ptr = QEMU_ALIGN_DOWN(ptr, gm_bs_bytes);
>
> @@ -480,14 +506,33 @@ void HELPER(stgm)(CPUARMState *env, uint64_t ptr, uint64_t val)
>
>      /*
>       * The ordering of elements within the word corresponds to
> -     * a little-endian operation.
> +     * a little-endian operation.  See LDGM for comments on shift.
> +     * All memory operations are aligned.
>       */
> +    shift = extract64(ptr, LOG2_TAG_GRANULE, 4) * 4;
> +    val >>= shift;
>      switch (gm_bs) {
> +    case 2:
> +        /* cpu configured with unsupported gm blocksize. */
> +        g_assert_not_reached();
> +    case 3:
> +        /* 32 bytes -> 2 tags -> 8 result bits */
> +        *(uint8_t *)tag_mem = val;
> +        break;
> +    case 4:
> +        /* 64 bytes -> 4 tags -> 16 result bits */
> +        *(uint16_t *)tag_mem = cpu_to_le16(val);
> +        break;
> +    case 5:
> +        /* 128 bytes -> 8 tags -> 32 result bits */
> +        *(uint32_t *)tag_mem = cpu_to_le32(val);
> +        break;
>      case 6:
> -        stq_le_p(tag_mem, val);
> +        /* 256 bytes -> 16 tags -> 64 result bits */
> +        *(uint64_t *)tag_mem = cpu_to_le64(val);
>          break;
>      default:
> -        /* cpu configured with unsupported gm blocksize. */
> +        /* cpu configured with invalid gm blocksize. */
>          g_assert_not_reached();
>      }
>  }
> --
> 2.34.1

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10  2:35 ` [PATCH 5/5] target/arm: Implement cortex-a710 Richard Henderson
  2023-08-10  4:11   ` Richard Henderson
@ 2023-08-10 15:49   ` Peter Maydell
  2023-08-10 17:05     ` Richard Henderson
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2023-08-10 15:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 10 Aug 2023 at 03:36, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The cortex-a710 is a first generation ARMv9.0-A processor.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  docs/system/arm/virt.rst |   1 +
>  hw/arm/virt.c            |   1 +
>  target/arm/tcg/cpu64.c   | 167 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 169 insertions(+)
>
> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
> index 51cdac6841..e1697ac8f4 100644
> --- a/docs/system/arm/virt.rst
> +++ b/docs/system/arm/virt.rst
> @@ -58,6 +58,7 @@ Supported guest CPU types:
>  - ``cortex-a57`` (64-bit)
>  - ``cortex-a72`` (64-bit)
>  - ``cortex-a76`` (64-bit)
> +- ``cortex-a710`` (64-bit)
>  - ``a64fx`` (64-bit)
>  - ``host`` (with KVM only)
>  - ``neoverse-n1`` (64-bit)
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7d9dbc2663..d1522c305d 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
>      ARM_CPU_TYPE_NAME("cortex-a55"),
>      ARM_CPU_TYPE_NAME("cortex-a72"),
>      ARM_CPU_TYPE_NAME("cortex-a76"),
> +    ARM_CPU_TYPE_NAME("cortex-a710"),
>      ARM_CPU_TYPE_NAME("a64fx"),
>      ARM_CPU_TYPE_NAME("neoverse-n1"),
>      ARM_CPU_TYPE_NAME("neoverse-v1"),

Will sbsa-ref want this core ?

> diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> index 5ca9070c14..6f555a39ce 100644
> --- a/target/arm/tcg/cpu64.c
> +++ b/target/arm/tcg/cpu64.c
> @@ -700,6 +700,172 @@ static void aarch64_neoverse_v1_initfn(Object *obj)
>      aarch64_add_sve_properties(obj);
>  }
>
> +static const ARMCPRegInfo cortex_a710_cp_reginfo[] = {
> +    /* TODO: trapped by HCR_EL2.TIDCP */
> +    { .name = "CPUACTLR4_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 3,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUECTLR2_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 1, .opc2 = 5,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUACTLR5_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 0,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUACTLR6_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 1,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUACTLR7_EL1", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 8, .opc2 = 2,
> +      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUPPMCR4_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 4,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUPPMCR5_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 5,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUPPMCR6_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 6,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUACTLR_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 4, .opc2 = 0,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUPOR2_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 8, .opc2 = 4,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "CPUPMR2_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 8, .opc2 = 5,
> +      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> +
> +    /*
> +     * Stub RAMINDEX, as we don't actually implement caches,
> +     * BTB, or anything else with cpu internal memory.
> +     * "Read" zeros into the IDATA* and DDATA* output registers.
> +     */
> +    { .name = "RAMINDEX_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 1, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 0,
> +      .access = PL3_W, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "IDATA0_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 0,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "IDATA1_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 1,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "IDATA2_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 0, .opc2 = 2,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "DDATA0_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 0,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "DDATA1_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 1,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +    { .name = "DDATA2_EL3", .state = ARM_CP_STATE_AA64,
> +      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 1, .opc2 = 2,
> +      .access = PL3_R, .type = ARM_CP_CONST, .resetvalue = 0 },
> +};
> +
> +static void define_cortex_a710_cp_reginfo(ARMCPU *cpu)
> +{
> +    /*
> +     * The Cortex A710 has all of the Neoverse V1's IMPDEF
> +     * registers and a few more of its own.
> +     */
> +    define_arm_cp_regs(cpu, neoverse_n1_cp_reginfo);
> +    define_arm_cp_regs(cpu, neoverse_v1_cp_reginfo);
> +    define_arm_cp_regs(cpu, cortex_a710_cp_reginfo);

The TRM doesn't document the existence of these regs
from the n1 reginfo:

    { .name = "ERXPFGCDN_EL1", .state = ARM_CP_STATE_AA64,
      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 2,
      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
    { .name = "ERXPFGCTL_EL1", .state = ARM_CP_STATE_AA64,
      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 1,
      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
    { .name = "ERXPFGF_EL1", .state = ARM_CP_STATE_AA64,
      .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },

This one in the v1 reginfo:

    { .name = "CPUPPMCR3_EL3", .state = ARM_CP_STATE_AA64,
      .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 6,
      .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },

exists but has been renamed CPUPPMCR6_EL3, which means it's
a duplicate of an entry in your new array. Meanwhile the
A710's actual CPUPPMCR3_EL3 at 3, 0, c15, c2, 4 isn't in
your new array.

(I thought we had an assert to detect duplicate regdefs,
so I'm surprised this didn't fall over.)

> +}
> +
> +static void aarch64_a710_initfn(Object *obj)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +
> +    cpu->dtb_compatible = "arm,cortex-a710";
> +    set_feature(&cpu->env, ARM_FEATURE_V8);
> +    set_feature(&cpu->env, ARM_FEATURE_NEON);
> +    set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
> +    set_feature(&cpu->env, ARM_FEATURE_AARCH64);
> +    set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
> +    set_feature(&cpu->env, ARM_FEATURE_EL2);
> +    set_feature(&cpu->env, ARM_FEATURE_EL3);
> +    set_feature(&cpu->env, ARM_FEATURE_PMU);
> +
> +    /* Ordered by Section B.4: AArch64 registers */
> +    cpu->midr = 0x412FD471;          /* r2p1 */
> +    cpu->revidr = cpu->midr;         /* mirror midr: "no significance" */

The bit about "no significance" is just the boilerplate text from
the architecture manual. I think we should continue our usual
practice of setting the revidr to 0.

For this particular core you can see in the Cortex-A710
Software Developer Errata Notice that the REVIDR appears
to be a collection of bits which are 1 when some erratum
has been fixed in this particular part without having
bumped the rNpN value:
https://developer.arm.com/documentation/SDEN1775101/latest

This is for an A15 erratum, but it does suggest that
guests probably aren't doing that "is revidr == midr"
check before looking at revidr bits:
 https://elixir.bootlin.com/linux/latest/source/arch/arm/kernel/smp_tlb.c#L94

> +    cpu->isar.id_pfr0  = 0x21110131;
> +    cpu->isar.id_pfr1  = 0x00010000; /* GIC filled in later */
> +    cpu->isar.id_dfr0  = 0x06011099; /* w/o FEAT_TRF */

You don't have to suppress FEAT_TRF manually, we do
it in cpu.c.

> +    cpu->id_afr0       = 0;
> +    cpu->isar.id_mmfr0 = 0x10201105;
> +    cpu->isar.id_mmfr1 = 0x40000000;
> +    cpu->isar.id_mmfr2 = 0x01260000;
> +    cpu->isar.id_mmfr3 = 0x02122211;
> +    cpu->isar.id_isar0 = 0x02101110;
> +    cpu->isar.id_isar1 = 0x13112111;
> +    cpu->isar.id_isar2 = 0x21232042;
> +    cpu->isar.id_isar3 = 0x01112131;
> +    cpu->isar.id_isar4 = 0x00010142;
> +    cpu->isar.id_isar5 = 0x11011121;

For isar5 we could say /* with Crypto */

> +    cpu->isar.id_mmfr4 = 0x21021110;

I don't think we implement HPDS == 2 (that's FEAT_HPDS2).
I guess we should push it down to HPDS 1 only in cpu.c
for now. (Or implement it, it's probably simple.)

> +    cpu->isar.id_isar6 = 0x01111111;
> +    cpu->isar.mvfr0    = 0x10110222;
> +    cpu->isar.mvfr1    = 0x13211111;
> +    cpu->isar.mvfr2    = 0x00000043;
> +    cpu->isar.id_pfr2  = 0x00000011;
> +    /* GIC filled in later; w/o FEAT_MPAM */
> +    cpu->isar.id_aa64pfr0  = 0x1201101120111112ull;

We clear out MPAM in cpu.c so you don't need to do it manually.

> +    cpu->isar.id_aa64pfr1  = 0x0000000000000221ull;
> +    cpu->isar.id_aa64zfr0  = 0x0000110100110021ull;

/* with Crypto */

> +    cpu->isar.id_aa64dfr0  = 0x000000f210305619ull; /* w/o FEAT_{TRF,TRBE} */

We already clear TRACEFILT in cpu.c. We should also
clear out TRACEBUFFER I guess.

Unless I'm miscounting bits, you have a '2' in the
RES0 [35:32] bit field.

> +    cpu->isar.id_aa64dfr1  = 0;
> +    cpu->id_aa64afr0       = 0;
> +    cpu->id_aa64afr1       = 0;
> +    cpu->isar.id_aa64isar0 = 0x0221111110212120ull;

/* with crypto */ again I guess

> +    cpu->isar.id_aa64isar1 = 0x0010111101211032ull;

should be 5 in bits [7:4] ?

> +    cpu->isar.id_aa64mmfr0 = 0x0000022200101122ull;
> +    cpu->isar.id_aa64mmfr1 = 0x0000000010212122ull;

This one is specifying FEAT_HPDS2 as well.

> +    cpu->isar.id_aa64mmfr2 = 0x1221011110101011ull;
> +    cpu->clidr             = 0x0000002282000023ull;

I think you've miscounted with the Ttype1 and Ttype2 fields:
these are at [36:35] and [34:33], so you want 14 instead
of 22 as the first non-zero nibbles.

> +    cpu->gm_blocksize      = 4;
> +    cpu->ctr               = 0x00000004b444c004ull; /* with DIC set */

Why set DIC? The h/w doesn't.

> +    cpu->dcz_blocksize     = 4;
> +    /* TODO FEAT_MPAM: mpamidr_el1 = 0x0000_0001_0006_003f */
> +
> +    /* Section B.5.2: PMCR_EL0 */
> +    cpu->isar.reset_pmcr_el0 = 0xa000;  /* with 20 counters */
> +
> +    /* Section B.6.7: ICH_VTR_EL2 */
> +    cpu->gic_num_lrs = 4;
> +    cpu->gic_vpribits = 5;
> +    cpu->gic_vprebits = 5;
> +    cpu->gic_pribits = 5;
> +
> +    /* Section 14: Scalable Vector Extensions support */
> +    cpu->sve_vq.supported = 1 << 0;  /* 128bit */
> +
> +    /*
> +     * The cortex-a710 TRM does not list CCSIDR values.
> +     * The layout of the cache is in text in Table 7-1 (L1-I),
> +     * Table 8-1 (L1-D), and Table 9-1 (L2).
> +     *
> +     * L1: 4-way set associative 64-byte line size, total either 32K or 64K.
> +     * We pick 64K, so this has 256 sets.
> +     *
> +     * L2: 8-way set associative 64 byte line size, total either 256K or 512K.
> +     * We pick 512K, so this has 1024 sets.
> +     */
> +    cpu->ccsidr[0] = 0x000000ff0000001aull; /* 64KB L1 dcache */
> +    cpu->ccsidr[1] = 0x000000ff0000001aull; /* 64KB L1 icache */
> +    cpu->ccsidr[2] = 0x000003ff0000003aull; /* 512KB L2 cache */

I was too lazy to do this for neoverse-v1, so I don't insist
on it here, but if we're going to find ourselves calculating
new-format ccsidr values by hand for each new CPU, I wonder if we
should define a macro that takes numsets, assoc, linesize,
subtracts 1 where relevant, and shifts them into the right bit
fields? (Shame the preprocessor can't do a log2() operation ;-))

> +
> +    /* ??? Not documented -- copied from neoverse-v1 */
> +    cpu->reset_sctlr = 0x30c50838;
> +
> +    define_cortex_a710_cp_reginfo(cpu);
> +    aarch64_add_pauth_properties(obj);
> +    aarch64_add_sve_properties(obj);
> +}
> +
>  /*
>   * -cpu max: a CPU with as many features enabled as our emulation supports.
>   * The version of '-cpu max' for qemu-system-arm is defined in cpu32.c;
> @@ -889,6 +1055,7 @@ static const ARMCPUInfo aarch64_cpus[] = {
>      { .name = "cortex-a55",         .initfn = aarch64_a55_initfn },
>      { .name = "cortex-a72",         .initfn = aarch64_a72_initfn },
>      { .name = "cortex-a76",         .initfn = aarch64_a76_initfn },
> +    { .name = "cortex-a710",        .initfn = aarch64_a710_initfn },
>      { .name = "a64fx",              .initfn = aarch64_a64fx_initfn },
>      { .name = "neoverse-n1",        .initfn = aarch64_neoverse_n1_initfn },
>      { .name = "neoverse-v1",        .initfn = aarch64_neoverse_v1_initfn },
> --
> 2.34.1

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10 15:49   ` Peter Maydell
@ 2023-08-10 17:05     ` Richard Henderson
  2023-08-10 17:12       ` Peter Maydell
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2023-08-10 17:05 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 8/10/23 08:49, Peter Maydell wrote:
> On Thu, 10 Aug 2023 at 03:36, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> The cortex-a710 is a first generation ARMv9.0-A processor.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   docs/system/arm/virt.rst |   1 +
>>   hw/arm/virt.c            |   1 +
>>   target/arm/tcg/cpu64.c   | 167 +++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 169 insertions(+)
>>
>> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
>> index 51cdac6841..e1697ac8f4 100644
>> --- a/docs/system/arm/virt.rst
>> +++ b/docs/system/arm/virt.rst
>> @@ -58,6 +58,7 @@ Supported guest CPU types:
>>   - ``cortex-a57`` (64-bit)
>>   - ``cortex-a72`` (64-bit)
>>   - ``cortex-a76`` (64-bit)
>> +- ``cortex-a710`` (64-bit)
>>   - ``a64fx`` (64-bit)
>>   - ``host`` (with KVM only)
>>   - ``neoverse-n1`` (64-bit)
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 7d9dbc2663..d1522c305d 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
>>       ARM_CPU_TYPE_NAME("cortex-a55"),
>>       ARM_CPU_TYPE_NAME("cortex-a72"),
>>       ARM_CPU_TYPE_NAME("cortex-a76"),
>> +    ARM_CPU_TYPE_NAME("cortex-a710"),
>>       ARM_CPU_TYPE_NAME("a64fx"),
>>       ARM_CPU_TYPE_NAME("neoverse-n1"),
>>       ARM_CPU_TYPE_NAME("neoverse-v1"),
> 
> Will sbsa-ref want this core ?

It only has 40 PA bits, and I think sbsa-ref requires 48.

>> +static void define_cortex_a710_cp_reginfo(ARMCPU *cpu)
>> +{
>> +    /*
>> +     * The Cortex A710 has all of the Neoverse V1's IMPDEF
>> +     * registers and a few more of its own.
>> +     */
>> +    define_arm_cp_regs(cpu, neoverse_n1_cp_reginfo);
>> +    define_arm_cp_regs(cpu, neoverse_v1_cp_reginfo);
>> +    define_arm_cp_regs(cpu, cortex_a710_cp_reginfo);
> 
> The TRM doesn't document the existence of these regs
> from the n1 reginfo:
> 
>      { .name = "ERXPFGCDN_EL1", .state = ARM_CP_STATE_AA64,
>        .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 2,
>        .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
>      { .name = "ERXPFGCTL_EL1", .state = ARM_CP_STATE_AA64,
>        .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 1,
>        .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
>      { .name = "ERXPFGF_EL1", .state = ARM_CP_STATE_AA64,
>        .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
>        .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> 
> This one in the v1 reginfo:
> 
>      { .name = "CPUPPMCR3_EL3", .state = ARM_CP_STATE_AA64,
>        .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 6,
>        .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
> 
> exists but has been renamed CPUPPMCR6_EL3, which means it's
> a duplicate of an entry in your new array. Meanwhile the
> A710's actual CPUPPMCR3_EL3 at 3, 0, c15, c2, 4 isn't in
> your new array.
> 
> (I thought we had an assert to detect duplicate regdefs,
> so I'm surprised this didn't fall over.)

It did fall over.  Pre-send-email testing mistake, which I found immediately after (of 
course).

>> +    cpu->revidr = cpu->midr;         /* mirror midr: "no significance" */
> 
> The bit about "no significance" is just the boilerplate text from
> the architecture manual. I think we should continue our usual
> practice of setting the revidr to 0.

Ok.

>> +    cpu->isar.id_dfr0  = 0x06011099; /* w/o FEAT_TRF */
> 
> You don't have to suppress FEAT_TRF manually, we do
> it in cpu.c.

Ok.

>> +    cpu->isar.id_isar5 = 0x11011121;
> 
> For isar5 we could say /* with Crypto */

Ok.

>> +    cpu->isar.id_mmfr4 = 0x21021110;
> 
> I don't think we implement HPDS == 2 (that's FEAT_HPDS2).
> I guess we should push it down to HPDS 1 only in cpu.c
> for now. (Or implement it, it's probably simple.)

Feh.  I thought I'd double-checked all of the features.
I'll have a look at implementing that.

>> +    cpu->ctr               = 0x00000004b444c004ull; /* with DIC set */
> 
> Why set DIC? The h/w doesn't.

Heh.  From the comment in neoverse-v1, I thought you had force enabled it there.  But it 
must simply be a h/w option?

>> +    cpu->ccsidr[0] = 0x000000ff0000001aull; /* 64KB L1 dcache */
>> +    cpu->ccsidr[1] = 0x000000ff0000001aull; /* 64KB L1 icache */
>> +    cpu->ccsidr[2] = 0x000003ff0000003aull; /* 512KB L2 cache */
> 
> I was too lazy to do this for neoverse-v1, so I don't insist
> on it here, but if we're going to find ourselves calculating
> new-format ccsidr values by hand for each new CPU, I wonder if we
> should define a macro that takes numsets, assoc, linesize,
> subtracts 1 where relevant, and shifts them into the right bit
> fields? (Shame the preprocessor can't do a log2() operation ;-))

I'll create something for this.
It doesn't need to be in the preprocessor.  :-)

Thanks for the careful review.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10 17:05     ` Richard Henderson
@ 2023-08-10 17:12       ` Peter Maydell
  2023-08-14  8:49         ` Marcin Juszkiewicz
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2023-08-10 17:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm

On Thu, 10 Aug 2023 at 18:05, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 8/10/23 08:49, Peter Maydell wrote:
> > On Thu, 10 Aug 2023 at 03:36, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> The cortex-a710 is a first generation ARMv9.0-A processor.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   docs/system/arm/virt.rst |   1 +
> >>   hw/arm/virt.c            |   1 +
> >>   target/arm/tcg/cpu64.c   | 167 +++++++++++++++++++++++++++++++++++++++
> >>   3 files changed, 169 insertions(+)
> >>
> >> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
> >> index 51cdac6841..e1697ac8f4 100644
> >> --- a/docs/system/arm/virt.rst
> >> +++ b/docs/system/arm/virt.rst
> >> @@ -58,6 +58,7 @@ Supported guest CPU types:
> >>   - ``cortex-a57`` (64-bit)
> >>   - ``cortex-a72`` (64-bit)
> >>   - ``cortex-a76`` (64-bit)
> >> +- ``cortex-a710`` (64-bit)
> >>   - ``a64fx`` (64-bit)
> >>   - ``host`` (with KVM only)
> >>   - ``neoverse-n1`` (64-bit)
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 7d9dbc2663..d1522c305d 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
> >>       ARM_CPU_TYPE_NAME("cortex-a55"),
> >>       ARM_CPU_TYPE_NAME("cortex-a72"),
> >>       ARM_CPU_TYPE_NAME("cortex-a76"),
> >> +    ARM_CPU_TYPE_NAME("cortex-a710"),
> >>       ARM_CPU_TYPE_NAME("a64fx"),
> >>       ARM_CPU_TYPE_NAME("neoverse-n1"),
> >>       ARM_CPU_TYPE_NAME("neoverse-v1"),
> >
> > Will sbsa-ref want this core ?
>
> It only has 40 PA bits, and I think sbsa-ref requires 48.

Yes, it does want 48 (we ran into that with some other core).

> >> +    cpu->isar.id_mmfr4 = 0x21021110;
> >
> > I don't think we implement HPDS == 2 (that's FEAT_HPDS2).
> > I guess we should push it down to HPDS 1 only in cpu.c
> > for now. (Or implement it, it's probably simple.)
>
> Feh.  I thought I'd double-checked all of the features.
> I'll have a look at implementing that.

I think we (meaning Linaro) kind of noted a lot of features
as architecturally optional and then didn't think through
that we might need them anyway for specific implementations.
(I got surprised by FEAT_NV that way for the Neoverse-V1.)

> >> +    cpu->ctr               = 0x00000004b444c004ull; /* with DIC set */
> >
> > Why set DIC? The h/w doesn't.
>
> Heh.  From the comment in neoverse-v1, I thought you had force enabled it there.  But it
> must simply be a h/w option?

Yes, the Neoverse-V1 TRM documents a config option of
"instruction cache hardware coherency" (which sets DIC),
and that the IDC pin "reflects the inverse value of the
BROADCASTCACHEMAINTPOU pin". So I opted for the config
choices that happen to be faster for QEMU.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t
  2023-08-10 14:09   ` Peter Maydell
@ 2023-08-10 19:02     ` Richard Henderson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2023-08-10 19:02 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 8/10/23 07:09, Peter Maydell wrote:
> On Thu, 10 Aug 2023 at 03:37, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> This value is only 4 bits wide.
> 
> True. Any particular reason to change the type, though?

To save space.


r~

> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/arm/cpu.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> thanks
> -- PMM



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] target/arm: Support more GM blocksizes
  2023-08-10 14:23   ` Peter Maydell
@ 2023-08-10 19:10     ` Richard Henderson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2023-08-10 19:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, qemu-arm

On 8/10/23 07:23, Peter Maydell wrote:
>> +    case 4:
>> +        /* 64 bytes -> 4 tags -> 16 result bits */
>> +        ret = cpu_to_le16(*(uint16_t *)tag_mem);
> 
> Does this really make a difference compared to ldw_le_p() ?

ldw_le_p uses memcpy, though only mips and sparc hosts do not have unaligned reads, so 
perhaps it doesn't make much difference.

I had originally been thinking about atomicity, but then noticed that the pseudocode uses 
a loop and so the instruction is therefore non-atomic.


> Is it worth having an assert in CPU realize for an invalid
> blocksize, so that we can catch duff ID register values
> without having to rely on there being a test run that
> uses ldgm/stgm ?

Yes, that's a good idea.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] target/arm: Implement cortex-a710
  2023-08-10 17:12       ` Peter Maydell
@ 2023-08-14  8:49         ` Marcin Juszkiewicz
  0 siblings, 0 replies; 17+ messages in thread
From: Marcin Juszkiewicz @ 2023-08-14  8:49 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: qemu-devel, qemu-arm

W dniu 10.08.2023 o 19:12, Peter Maydell pisze:
> On Thu, 10 Aug 2023 at 18:05, Richard Henderson
>> On 8/10/23 08:49, Peter Maydell wrote:
>>> On Thu, 10 Aug 2023 at 03:36, Richard Henderson

>>> Will sbsa-ref want this core ?

>> It only has 40 PA bits, and I think sbsa-ref requires 48.

> Yes, it does want 48 (we ran into that with some other core).

sbsa-ref needs PA > 40 bit as memory starts at 2^40.

Cortex A57/A72 have only 44 bits for PA space and are fine.

 From v9 cores I look forward for Neoverse-N2/V2 cores.

My "AArch64 cpu cores info table" [1] lists all/most of Arm designed 
cores with some basic information (arch level, 32bit, PA/VA, granules, SVE).

1. https://marcin.juszkiewicz.com.pl/download/tables/arm-cpu-cores.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-14  8:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-10  2:35 [PATCH for-8.2 0/5] target/arm: Implement cortex-a710 Richard Henderson
2023-08-10  2:35 ` [PATCH 1/5] target/arm: Disable FEAT_TRF in neoverse-v1 Richard Henderson
2023-08-10  9:16   ` Peter Maydell
2023-08-10  2:35 ` [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t Richard Henderson
2023-08-10 14:09   ` Peter Maydell
2023-08-10 19:02     ` Richard Henderson
2023-08-10  2:35 ` [PATCH 3/5] target/arm: Allow cpu to configure GM blocksize Richard Henderson
2023-08-10 14:13   ` Peter Maydell
2023-08-10  2:35 ` [PATCH 4/5] target/arm: Support more GM blocksizes Richard Henderson
2023-08-10 14:23   ` Peter Maydell
2023-08-10 19:10     ` Richard Henderson
2023-08-10  2:35 ` [PATCH 5/5] target/arm: Implement cortex-a710 Richard Henderson
2023-08-10  4:11   ` Richard Henderson
2023-08-10 15:49   ` Peter Maydell
2023-08-10 17:05     ` Richard Henderson
2023-08-10 17:12       ` Peter Maydell
2023-08-14  8:49         ` Marcin Juszkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).