[RFC PATCH 0/2] Specifying cache topology on ARM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] Specifying cache topology on ARM
@ 2024-08-23 12:54 Alireza Sanaee via
  2024-08-23 12:54 ` [PATCH 1/2] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Alireza Sanaee via @ 2024-08-23 12:54 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: zhao1.liu, zhenyu.z.wang, dapeng1.mi, yongwei.ma, armbru, farman,
	peter.maydell, mst, anisinha, shannon.zhaosl, imammedo, mtosatti,
	berrange, richard.henderson, linuxarm, shameerali.kolothum.thodi,
	Jonathan.Cameron, jiangkunkun

Specifying the cache layout in virtual machines is useful for
applications and operating systems to fetch accurate information about
the cache structure and make appropriate adjustments. Enforcing correct
sharing information can lead to better optimizations. This patch enables
the specification of cache layout through a command line parameter,
building on a patch set by Intel [1]. It uses this set as a foundation.
The ACPI/PPTT table is populated based on user-provided information and
CPU topology.

Example:


+----------------+                            +----------------+
|    Socket 0    |                            |    Socket 1    |
|    (L3 Cache)  |                            |    (L3 Cache)  |
+--------+-------+                            +--------+-------+
         |                                             |
+--------+--------+                            +--------+--------+
|   Cluster 0     |                            |   Cluster 0     |
|   (L2 Cache)    |                            |   (L2 Cache)    |
+--------+--------+                            +--------+--------+
         |                                             |
+--------+--------+  +--------+--------+    +--------+--------+  +--------+----+
|   Core 0         | |   Core 1        |    |   Core 0        |  |   Core 1    |
|   (L1i, L1d)     | |   (L1i, L1d)    |    |   (L1i, L1d)    |  |   (L1i, L1d)|
+--------+--------+  +--------+--------+    +--------+--------+  +--------+----+
         |                   |                       |                   |
+--------+              +--------+              +--------+          +--------+
|Thread 0|              |Thread 1|              |Thread 1|          |Thread 0|
+--------+              +--------+              +--------+          +--------+
|Thread 1|              |Thread 0|              |Thread 0|          |Thread 1|
+--------+              +--------+              +--------+          +--------+


The following command will represent the system.

./qemu-system-aarch64 \
 -machine virt,**smp-cache=cache0** \
 -cpu max \
 -m 2048 \
 -smp sockets=2,clusters=1,cores=2,threads=2 \
 -kernel ./Image.gz \
 -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
 -initrd rootfs.cpio.gz \
 -bios ./edk2-aarch64-code.fd \
 **-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \
 -nographic

Failure cases:
    1) there are cases where QEMU might not have any clusters selected in the
    -smp option, while user specifies caches to be shared at cluster level. In
    this situations, qemu returns error.

    2) There are other scenarios where caches exist in systems' registers but
    not left unspecified by users. In this case qemu returns failure.

Currently only three levels of caches are supported to be specified from
the command line. However, increasing the value does not require
significant changes. Further, this patch assumes l2 and l3 unified
caches and does not allow l(2/3)(i/d). The level terminology is
thread/core/cluster/socket right now.

Here is the hierarchy assumed in this patch:
Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3;

[1] https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1.liu@intel.com/#r

TODO:
1) Making the code to work with arbitrary levels
2) Separated data and instruction cache at L2 and L3.
3) Allow for different Data or Instruction only at a particular level.
4) Additional cache controls.  e.g. size of L3 may not want to just
match the underlying system, because only some of the associated host
CPUs may be bound to this VM.
5) Add device tree related code to generate info related to caches.

Alireza Sanaee (2):
  target/arm/tcg: increase cache level for cpu=max
  hw/acpi: add cache hierarchy node to pptt table

 hw/acpi/aml-build.c         | 307 +++++++++++++++++++++++++++++++++++-
 hw/arm/virt-acpi-build.c    | 137 +++++++++++++++-
 hw/arm/virt.c               |   5 +
 hw/core/machine-smp.c       |   6 +-
 hw/loongarch/acpi-build.c   |   3 +-
 include/hw/acpi/aml-build.h |  20 ++-
 target/arm/tcg/cpu64.c      |  35 ++++
 7 files changed, 503 insertions(+), 10 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] target/arm/tcg: increase cache level for cpu=max
  2024-08-23 12:54 [RFC PATCH 0/2] Specifying cache topology on ARM Alireza Sanaee via
@ 2024-08-23 12:54 ` Alireza Sanaee via
  2024-08-23 12:54 ` [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table Alireza Sanaee via
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Alireza Sanaee via @ 2024-08-23 12:54 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: zhao1.liu, zhenyu.z.wang, dapeng1.mi, yongwei.ma, armbru, farman,
	peter.maydell, mst, anisinha, shannon.zhaosl, imammedo, mtosatti,
	berrange, richard.henderson, linuxarm, shameerali.kolothum.thodi,
	Jonathan.Cameron, jiangkunkun

This patch addresses cache description in the `aarch64_max_tcg_initfn`
function. It introduces three layers of caches and modifies the cache
description registers accordingly. Additionally, a new function is added
to handle cache description when CCIDX is disabled. The CCIDX remains
disabled for cpu=max configuration.

TODO: I am planning to send a separate patch using this cache
description function for the rest of the CPU types. This is a
starting point to test L3 caches for cpu=max.

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 target/arm/tcg/cpu64.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index fe232eb306..f2b6fb6d84 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -55,6 +55,32 @@ static uint64_t make_ccsidr64(unsigned assoc, unsigned linesize,
          | (lg_linesize - 4);
 }
 
+static uint64_t make_ccsidr32(unsigned assoc, unsigned linesize,
+                              unsigned cachesize)
+{
+    unsigned lg_linesize = ctz32(linesize);
+    unsigned sets;
+
+    /*
+     * The 32-bit CCSIDR_EL1 format is:
+     *   [27:13] number of sets - 1
+     *   [12:3]  associativity - 1
+     *   [2:0]   log2(linesize) - 4
+     *           so 0 == 16 bytes, 1 == 32 bytes, 2 == 64 bytes, etc
+     */
+    assert(assoc != 0);
+    assert(is_power_of_2(linesize));
+    assert(lg_linesize >= 4 && lg_linesize <= 7 + 4);
+
+    /* sets * associativity * linesize == cachesize. */
+    sets = cachesize / (assoc * linesize);
+    assert(cachesize % (assoc * linesize) == 0);
+
+    return ((uint64_t)(sets - 1) << 13)
+         | ((assoc - 1) << 3)
+         | (lg_linesize - 4);
+}
+
 static void aarch64_a35_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -1086,6 +1112,15 @@ void aarch64_max_tcg_initfn(Object *obj)
     uint64_t t;
     uint32_t u;
 
+    /*
+     * Expanded cache set
+     */
+    cpu->clidr = 0x8200123; /* 4 4 3 in 3 bit fields */
+    cpu->ccsidr[0] = make_ccsidr32(4, 64, 64 * KiB); /* 64KB L1 dcache */
+    cpu->ccsidr[1] = cpu->ccsidr[0];                 /* 64KB L1 icache */
+    cpu->ccsidr[2] = make_ccsidr32(8, 64, 1 * MiB);  /* 1MB L2 unified cache */
+    cpu->ccsidr[4] = make_ccsidr32(8, 64, 2 * MiB);  /* 2MB L3 unified cache */
+
     /*
      * Unset ARM_FEATURE_BACKCOMPAT_CNTFRQ, which we would otherwise default
      * to because we started with aarch64_a57_initfn(). A 'max' CPU might
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table
  2024-08-23 12:54 [RFC PATCH 0/2] Specifying cache topology on ARM Alireza Sanaee via
  2024-08-23 12:54 ` [PATCH 1/2] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
@ 2024-08-23 12:54 ` Alireza Sanaee via
  2024-08-31 11:47   ` Zhao Liu
  2024-08-31 11:25 ` [RFC PATCH 0/2] Specifying cache topology on ARM Zhao Liu
  2024-09-02 11:49 ` Marcin Juszkiewicz
  3 siblings, 1 reply; 8+ messages in thread
From: Alireza Sanaee via @ 2024-08-23 12:54 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: zhao1.liu, zhenyu.z.wang, dapeng1.mi, yongwei.ma, armbru, farman,
	peter.maydell, mst, anisinha, shannon.zhaosl, imammedo, mtosatti,
	berrange, richard.henderson, linuxarm, shameerali.kolothum.thodi,
	Jonathan.Cameron, jiangkunkun

Specify which layer (core/cluster/socket) caches found at in the CPU
topology.

Example:

Here, 2 sockets (packages), and 2 clusters, 4 cores and 2 threads
created, in aggregate 2*2*4*2 logical cores. In the smp-cache object,
cores will have l1d and l1i (threads will share these caches by default.
However, extending this is not difficult).  The clusters will share a
unified l2 level cache, and finally sockets will share l3. In this
patch, threads will share l1 caches by default, but this can be adjusted
if case required.

Currently only three levels of caches are supported. Also, PPTT table
revision has been increased to 3 even in scenarios where there are no
caches. The patch does not allow partial declaration of caches. In
another word, all caches must be defined or caches must be skipped.

./qemu-system-aarch64 \
-machine virt,smp-cache=cache0 \
-cpu max \
-m 2048 \
-smp sockets=2,clusters=2,cores=4,threads=2 \
-kernel ./Image.gz \
-append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
-initrd rootfs.cpio.gz \
-bios ./edk2-aarch64-code.fd \
-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}' \
-nographic

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 hw/acpi/aml-build.c         | 307 +++++++++++++++++++++++++++++++++++-
 hw/arm/virt-acpi-build.c    | 137 +++++++++++++++-
 hw/arm/virt.c               |   5 +
 hw/core/machine-smp.c       |   6 +-
 hw/loongarch/acpi-build.c   |   3 +-
 include/hw/acpi/aml-build.h |  20 ++-
 6 files changed, 468 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6d4517cfbe..3b3ee6f01d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1964,6 +1964,200 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
     acpi_table_end(linker, &table);
 }
 
+static bool cache_described_at(MachineState *ms, CpuTopologyLevel level)
+{
+    if (machine_get_cache_topo_level(ms, SMP_CACHE_L3) == level ||
+        machine_get_cache_topo_level(ms, SMP_CACHE_L2) == level ||
+        machine_get_cache_topo_level(ms, SMP_CACHE_L1I) == level ||
+        machine_get_cache_topo_level(ms, SMP_CACHE_L1D) == level) {
+        return true;
+    }
+
+    return false;
+}
+
+static int partial_cache_description(MachineState *ms, ACPIPPTTCache* caches,
+                                 int num_caches)
+{
+    int level, c;
+
+    for (level = 1; level < num_caches; level++) {
+        for (c = 0; c < num_caches; c++) {
+            if (caches[c].level != level) {
+                continue;
+            }
+
+            switch (level) {
+            case 1:
+                /*
+                 * L1 cache is assumed to have both L1I and L1D available.
+                 * Technically both need to be checked.
+                 */
+                if (machine_get_cache_topo_level(ms, SMP_CACHE_L1I) ==
+                        CPU_TOPO_LEVEL_DEFAULT) {
+                    assert(machine_get_cache_topo_level(ms, SMP_CACHE_L1D) !=
+                           CPU_TOPO_LEVEL_DEFAULT);
+                    return level;
+                }
+                break;
+            case 2:
+                if (machine_get_cache_topo_level(ms, SMP_CACHE_L2) ==
+                        CPU_TOPO_LEVEL_DEFAULT) {
+                    return level;
+                }
+                break;
+            case 3:
+                if (machine_get_cache_topo_level(ms, SMP_CACHE_L3) ==
+                        CPU_TOPO_LEVEL_DEFAULT) {
+                    return level;
+                }
+                break;
+            }
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * This function assumes l3 and l2 have unified cache and l1 is split l1d
+ * and l1i, and further prepares the lowest cache level for a topology
+ * level.  The info will be fed to build_caches to create caches at the
+ * right level.
+ */
+static int find_the_lowest_level_cache_defined_at_level(MachineState *ms,
+                int *level_found,
+                CpuTopologyLevel topo_level) {
+
+    CpuTopologyLevel level;
+
+    level = machine_get_cache_topo_level(ms, SMP_CACHE_L1I);
+    if (level == topo_level) {
+        *level_found = 1;
+        return 1;
+    }
+
+    level = machine_get_cache_topo_level(ms, SMP_CACHE_L1D);
+    if (level == topo_level) {
+        *level_found = 1;
+        return 1;
+    }
+
+    level = machine_get_cache_topo_level(ms, SMP_CACHE_L2);
+    if (level == topo_level) {
+        *level_found = 2;
+        return 2;
+    }
+
+    level = machine_get_cache_topo_level(ms, SMP_CACHE_L3);
+    if (level == topo_level) {
+        *level_found = 3;
+        return 3;
+    }
+
+    return 0;
+}
+
+static void build_cache_nodes(GArray *tbl, ACPIPPTTCache *cache,
+                              uint32_t next_offset, unsigned int id)
+{
+    int val;
+
+    /* Type 1 - cache */
+    build_append_byte(tbl, 1);
+    /* Length */
+    build_append_byte(tbl, 28);
+    /* Reserved */
+    build_append_int_noprefix(tbl, 0, 2);
+    /* Flags - everything except possibly the ID */
+    build_append_int_noprefix(tbl, 0xff, 4);
+    /* Offset of next cache up */
+    build_append_int_noprefix(tbl, next_offset, 4);
+    build_append_int_noprefix(tbl, cache->size, 4);
+    build_append_int_noprefix(tbl, cache->sets, 4);
+    build_append_byte(tbl, cache->associativity);
+    val = 0x3;
+    switch (cache->type) {
+    case INSTRUCTION:
+        val |= (1 << 2);
+        break;
+    case DATA:
+        val |= (0 << 2); /* Data */
+        break;
+    case UNIFIED:
+        val |= (3 << 2); /* Unified */
+        break;
+    }
+    build_append_byte(tbl, val);
+    build_append_int_noprefix(tbl, cache->linesize, 2);
+    build_append_int_noprefix(tbl,
+                             (cache->type << 24) | (cache->level << 16) | id,
+                             4);
+}
+
+/*
+ * builds caches from the top level (`level_high` parameter) to the bottom
+ * level (`level_low` parameter).  It searches for caches found in
+ * systems' registers, and fills up the table. Then it updates the
+ * `data_offset` and `instr_offset` parameters with the offset of the data
+ * and instruction caches of the lowest level, respectively.
+ */
+static bool build_caches(GArray *table_data, uint32_t pptt_start,
+                                int num_caches, ACPIPPTTCache *caches,
+                                int base_id,
+                                uint8_t level_high, /* Inclusive */
+                                uint8_t level_low, /* Inclusive */
+                                uint32_t *data_offset,
+                                uint32_t *instr_offset)
+{
+    uint32_t next_level_offset_data = 0, next_level_offset_instruction = 0;
+    uint32_t this_offset, next_offset = 0;
+    int c, level;
+    bool found_cache = false;
+
+    /* Walk caches from top to bottom */
+    for (level = level_high; level >= level_low; level--) {
+        for (c = 0; c < num_caches; c++) {
+            if (caches[c].level != level) {
+                continue;
+            }
+
+            /* Assume only unified above l1 for now */
+            this_offset = table_data->len - pptt_start;
+            switch (caches[c].type) {
+            case INSTRUCTION:
+                next_offset = next_level_offset_instruction;
+                break;
+            case DATA:
+                next_offset = next_level_offset_data;
+                break;
+            case UNIFIED:
+                /* Either is fine here - hopefully */
+                next_offset = next_level_offset_instruction;
+                break;
+            }
+            build_cache_nodes(table_data, &caches[c], next_offset, base_id);
+            switch (caches[c].type) {
+            case INSTRUCTION:
+                next_level_offset_instruction = this_offset;
+                break;
+            case DATA:
+                next_level_offset_data = this_offset;
+                break;
+            case UNIFIED:
+                next_level_offset_instruction = this_offset;
+                next_level_offset_data = this_offset;
+                break;
+            }
+            *data_offset = next_level_offset_data;
+            *instr_offset = next_level_offset_instruction;
+
+            found_cache = true;
+        }
+    }
+
+    return found_cache;
+}
 /*
  * ACPI spec, Revision 6.3
  * 5.2.29.1 Processor hierarchy node structure (Type 0)
@@ -2047,24 +2241,40 @@ void build_spcr(GArray *table_data, BIOSLinker *linker,
 
     acpi_table_end(linker, &table);
 }
+
 /*
  * ACPI spec, Revision 6.3
  * 5.2.29 Processor Properties Topology Table (PPTT)
  */
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id)
+                 const char *oem_id, const char *oem_table_id,
+                 int num_caches, ACPIPPTTCache *caches)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
     CPUArchIdList *cpus = ms->possible_cpus;
+    uint32_t l1_data_offset = 0, l1_instr_offset = 0, cluster_data_offset = 0;
+    uint32_t cluster_instr_offset = 0, node_data_offset = 0;
+    uint32_t node_instr_offset = 0;
+    int top_node = 3, top_cluster = 3, top_core = 3;
+    int bottom_node = 3, bottom_cluster = 3, bottom_core = 3;
     int64_t socket_id = -1, cluster_id = -1, core_id = -1;
     uint32_t socket_offset = 0, cluster_offset = 0, core_offset = 0;
     uint32_t pptt_start = table_data->len;
     int n;
-    AcpiTable table = { .sig = "PPTT", .rev = 2,
+    uint32_t priv_rsrc[2];
+    uint32_t num_priv = 0;
+    bool cache_created;
+
+    AcpiTable table = { .sig = "PPTT", .rev = 3,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
 
     acpi_table_begin(&table, table_data);
 
+    n = partial_cache_description(ms, caches, num_caches);
+    if (n) {
+        error_setg(&error_fatal, "Missing cache description at #L%d", n);
+    }
+
     /*
      * This works with the assumption that cpus[n].props.*_id has been
      * sorted from top to down levels in mc->possible_cpu_arch_ids().
@@ -2077,10 +2287,37 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
             socket_id = cpus->cpus[n].props.socket_id;
             cluster_id = -1;
             core_id = -1;
+            bottom_node = top_node;
+            num_priv = 0;
+
+            if (cache_described_at(ms, CPU_TOPO_LEVEL_SOCKET) &&
+                    find_the_lowest_level_cache_defined_at_level(ms,
+                        &bottom_node,
+                        CPU_TOPO_LEVEL_SOCKET)) {
+                cache_created = build_caches(table_data, pptt_start,
+                                    num_caches, caches,
+                                    n, top_node, bottom_node,
+                                    &node_data_offset, &node_instr_offset);
+
+                if (!cache_created) {
+                    error_setg(&error_fatal, "No caches at levels %d-%d",
+                               top_node, bottom_node);
+                }
+
+                priv_rsrc[0] = node_instr_offset;
+                priv_rsrc[1] = node_data_offset;
+
+                if (node_instr_offset || node_data_offset) {
+                    num_priv = node_instr_offset == node_data_offset ? 1 : 2;
+                }
+
+                top_cluster = bottom_node - 1;
+            }
+
             socket_offset = table_data->len - pptt_start;
             build_processor_hierarchy_node(table_data,
                 (1 << 0), /* Physical package */
-                0, socket_id, NULL, 0);
+                0, socket_id, priv_rsrc, num_priv);
         }
 
         if (mc->smp_props.clusters_supported && mc->smp_props.has_clusters) {
@@ -2088,20 +2325,78 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 assert(cpus->cpus[n].props.cluster_id > cluster_id);
                 cluster_id = cpus->cpus[n].props.cluster_id;
                 core_id = -1;
+                bottom_cluster = top_cluster;
+                num_priv = 0;
+
+                if (cache_described_at(ms, CPU_TOPO_LEVEL_CLUSTER) &&
+                       find_the_lowest_level_cache_defined_at_level(ms,
+                           &bottom_cluster,
+                           CPU_TOPO_LEVEL_CLUSTER)) {
+
+                    cache_created = build_caches(table_data, pptt_start,
+                        num_caches, caches, n, top_cluster, bottom_cluster,
+                        &cluster_data_offset, &cluster_instr_offset);
+
+                    if (!cache_created) {
+                        error_setg(&error_fatal, "No caches at levels %d-%d",
+                                   top_cluster, bottom_cluster);
+                    }
+
+                    priv_rsrc[0] = cluster_instr_offset;
+                    priv_rsrc[1] = cluster_data_offset;
+
+                    if (cluster_instr_offset || cluster_data_offset) {
+                        num_priv = cluster_instr_offset == cluster_data_offset ?
+                            1 : 2;
+                    }
+
+                    top_core = bottom_cluster - 1;
+                }
+
                 cluster_offset = table_data->len - pptt_start;
                 build_processor_hierarchy_node(table_data,
                     (0 << 0), /* Not a physical package */
-                    socket_offset, cluster_id, NULL, 0);
+                    socket_offset, cluster_id, priv_rsrc, num_priv);
             }
         } else {
+            if (cache_described_at(ms, CPU_TOPO_LEVEL_CLUSTER)) {
+                error_setg(&error_fatal, "Not clusters but defined caches for");
+            }
+
             cluster_offset = socket_offset;
+            top_core = bottom_node - 1; /* there is no cluster */
+        }
+
+        if (cpus->cpus[n].props.core_id != core_id) {
+            bottom_core = top_core;
+            num_priv = 0;
+
+            if (cache_described_at(ms, CPU_TOPO_LEVEL_CORE) &&
+                    find_the_lowest_level_cache_defined_at_level(ms,
+                        &bottom_core,
+                        CPU_TOPO_LEVEL_CORE)) {
+                cache_created = build_caches(table_data, pptt_start,
+                                    num_caches, caches,
+                                    n, top_core , bottom_core,
+                                    &l1_data_offset, &l1_instr_offset);
+
+                if (!cache_created) {
+                    error_setg(&error_fatal, "No cache defined at levels %d-%d",
+                        top_core, bottom_core);
+                }
+
+                priv_rsrc[0] = l1_instr_offset;
+                priv_rsrc[1] = l1_data_offset;
+
+                num_priv = l1_instr_offset == l1_data_offset ? 1 : 2;
+            }
         }
 
         if (ms->smp.threads == 1) {
             build_processor_hierarchy_node(table_data,
                 (1 << 1) | /* ACPI Processor ID valid */
                 (1 << 3),  /* Node is a Leaf */
-                cluster_offset, n, NULL, 0);
+                cluster_offset, n, priv_rsrc, num_priv);
         } else {
             if (cpus->cpus[n].props.core_id != core_id) {
                 assert(cpus->cpus[n].props.core_id > core_id);
@@ -2109,7 +2404,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 core_offset = table_data->len - pptt_start;
                 build_processor_hierarchy_node(table_data,
                     (0 << 0), /* Not a physical package */
-                    cluster_offset, core_id, NULL, 0);
+                    cluster_offset, core_id, priv_rsrc, num_priv);
             }
 
             build_processor_hierarchy_node(table_data,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e10cad86dd..397ff939eb 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -60,11 +60,14 @@
 #include "hw/acpi/acpi_generic_initiator.h"
 #include "hw/virtio/virtio-acpi.h"
 #include "target/arm/multiprocessing.h"
+#include "cpu-features.h"
 
 #define ARM_SPI_BASE 32
 
 #define ACPI_BUILD_TABLE_SIZE             0x20000
 
+#define ACPI_PPTT_MAX_CACHES 16
+
 static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
     MachineState *ms = MACHINE(vms);
@@ -890,6 +893,132 @@ static void acpi_align_size(GArray *blob, unsigned align)
     g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
 }
 
+static unsigned int virt_get_caches(VirtMachineState *vms,
+                                    ACPIPPTTCache *caches)
+{
+    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0)); /* assume homogeneous CPUs */
+    bool ccidx = cpu_isar_feature(any_ccidx, armcpu);
+    unsigned int num_cache, i;
+    int level_instr = 1, level_data = 1;
+
+    for (i = 0, num_cache = 0; i < ACPI_PPTT_MAX_CACHES; i++, num_cache++) {
+        int type = (armcpu->clidr >> (3 * i)) & 7;
+        int bank_index;
+        int level;
+        ACPIPPTTCacheType cache_type;
+
+        if (type == 0) {
+            break;
+        }
+
+        switch (type) {
+        case 1:
+            cache_type = INSTRUCTION;
+            level = level_instr;
+            break;
+        case 2:
+            cache_type = DATA;
+            level = level_data;
+            break;
+        case 4:
+            cache_type = UNIFIED;
+            level = level_instr > level_data ? level_instr : level_data;
+            break;
+        case 3: /* Split - Do data first */
+            cache_type = DATA;
+            level = level_data;
+            break;
+        default:
+            error_setg(&error_abort, "Unrecognized cache type");
+            return 0;
+        }
+        /*
+         * ccsidr is indexed using both the level and whether it is
+         * an instruction cache. Unified caches use the same storage
+         * as data caches.
+         */
+        bank_index = (i * 2) | ((type == 1) ? 1 : 0);
+        if (ccidx) {
+            caches[num_cache] = (ACPIPPTTCache) {
+                .type =  cache_type,
+                .level = level,
+                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                             CCSIDR_EL1,
+                                             CCIDX_LINESIZE) + 4),
+                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                            CCSIDR_EL1,
+                                            CCIDX_ASSOCIATIVITY) + 1,
+                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                   CCIDX_NUMSETS) + 1,
+            };
+        } else {
+            caches[num_cache] = (ACPIPPTTCache) {
+                .type =  cache_type,
+                .level = level,
+                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                             CCSIDR_EL1, LINESIZE) + 4),
+                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                            CCSIDR_EL1,
+                                            ASSOCIATIVITY) + 1,
+                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                   NUMSETS) + 1,
+            };
+        }
+        caches[num_cache].size = caches[num_cache].associativity *
+            caches[num_cache].sets * caches[num_cache].linesize;
+
+        /* Break one 'split' entry up into two records */
+        if (type == 3) {
+            num_cache++;
+            bank_index = (i * 2) | 1;
+            if (ccidx) {
+                /* Instruction cache: bottom bit set when reading banked reg */
+                caches[num_cache] = (ACPIPPTTCache) {
+                    .type = INSTRUCTION,
+                    .level = level_instr,
+                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                 CCSIDR_EL1,
+                                                 CCIDX_LINESIZE) + 4),
+                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                CCSIDR_EL1,
+                                                CCIDX_ASSOCIATIVITY) + 1,
+                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                       CCIDX_NUMSETS) + 1,
+                };
+            } else {
+                caches[num_cache] = (ACPIPPTTCache) {
+                    .type = INSTRUCTION,
+                    .level = level_instr,
+                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                 CCSIDR_EL1, LINESIZE) + 4),
+                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                CCSIDR_EL1,
+                                                ASSOCIATIVITY) + 1,
+                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                       NUMSETS) + 1,
+                };
+            }
+            caches[num_cache].size = caches[num_cache].associativity *
+                caches[num_cache].sets * caches[num_cache].linesize;
+        }
+        switch (type) {
+        case 1:
+            level_instr++;
+            break;
+        case 2:
+            level_data++;
+            break;
+        case 3:
+        case 4:
+            level_instr++;
+            level_data++;
+            break;
+        }
+    }
+
+    return num_cache;
+}
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -899,6 +1028,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     GArray *tables_blob = tables->table_data;
     MachineState *ms = MACHINE(vms);
 
+    ACPIPPTTCache caches[ACPI_PPTT_MAX_CACHES]; /* Can select up to 16 */
+    unsigned int num_cache;
+
+    num_cache = virt_get_caches(vms, caches);
+
     table_offsets = g_array_new(false, true /* clear */,
                                         sizeof(uint32_t));
 
@@ -920,7 +1054,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (!vmc->no_cpu_topology) {
         acpi_add_table(table_offsets, tables_blob);
         build_pptt(tables_blob, tables->linker, ms,
-                   vms->oem_id, vms->oem_table_id);
+                   vms->oem_id, vms->oem_table_id,
+                   num_cache, caches);
     }
 
     acpi_add_table(table_offsets, tables_blob);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0c68d66a3..b723248ecf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3093,6 +3093,11 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     hc->unplug = virt_machine_device_unplug_cb;
     mc->nvdimm_supported = true;
     mc->smp_props.clusters_supported = true;
+    /* Supported cached */
+    mc->smp_props.cache_supported[SMP_CACHE_L1D] = true;
+    mc->smp_props.cache_supported[SMP_CACHE_L1I] = true;
+    mc->smp_props.cache_supported[SMP_CACHE_L2] = true;
+    mc->smp_props.cache_supported[SMP_CACHE_L3] = true;
     mc->auto_enable_numa_with_memhp = true;
     mc->auto_enable_numa_with_memdev = true;
     /* platform instead of architectural choice */
diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index bf6f2f9107..de95ec9c0f 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -274,7 +274,11 @@ unsigned int machine_topo_get_threads_per_socket(const MachineState *ms)
 CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
                                               SMPCacheName cache)
 {
-    return ms->smp_cache->props[cache].topo;
+    if (ms->smp_cache) {
+        return ms->smp_cache->props[cache].topo;
+    }
+
+    return CPU_TOPO_LEVEL_DEFAULT;
 }
 
 static bool machine_check_topo_support(MachineState *ms,
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index af45ce526d..154f2c9ddd 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -473,7 +473,8 @@ static void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 
     acpi_add_table(table_offsets, tables_blob);
     build_pptt(tables_blob, tables->linker, machine,
-               lvms->oem_id, lvms->oem_table_id);
+               lvms->oem_id, lvms->oem_table_id,
+               0, NULL);
 
     acpi_add_table(table_offsets, tables_blob);
     build_srat(tables_blob, tables->linker, machine);
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index a3784155cb..9077b81ba2 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -31,6 +31,23 @@ struct Aml {
     AmlBlockFlags block_flags;
 };
 
+typedef enum ACPIPPTTCacheType {
+    DATA,
+    INSTRUCTION,
+    UNIFIED,
+} ACPIPPTTCacheType;
+
+typedef struct ACPIPPTTCache {
+    ACPIPPTTCacheType type;
+    uint32_t pptt_id;
+    uint32_t sets;
+    uint32_t size;
+    uint32_t level;
+    uint16_t linesize;
+    uint8_t attributes; /* write policy: 0x0 write back, 0x1 write through */
+    uint8_t associativity;
+} ACPIPPTTCache;
+
 typedef enum {
     AML_COMPATIBILITY = 0,
     AML_TYPEA = 1,
@@ -490,7 +507,8 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 const char *oem_id, const char *oem_table_id);
 
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id);
+                const char *oem_id, const char *oem_table_id,
+                int num_caches, ACPIPPTTCache *caches);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] Specifying cache topology on ARM
  2024-08-23 12:54 [RFC PATCH 0/2] Specifying cache topology on ARM Alireza Sanaee via
  2024-08-23 12:54 ` [PATCH 1/2] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
  2024-08-23 12:54 ` [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table Alireza Sanaee via
@ 2024-08-31 11:25 ` Zhao Liu
  2024-09-02 10:25   ` Alireza Sanaee via
  2024-09-02 11:49 ` Marcin Juszkiewicz
  3 siblings, 1 reply; 8+ messages in thread
From: Zhao Liu @ 2024-08-31 11:25 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: qemu-devel, qemu-arm, zhenyu.z.wang, dapeng1.mi, yongwei.ma,
	armbru, farman, peter.maydell, mst, anisinha, shannon.zhaosl,
	imammedo, mtosatti, berrange, richard.henderson, linuxarm,
	shameerali.kolothum.thodi, Jonathan.Cameron, jiangkunkun,
	Zhao Liu

Hi Alireza,

Great to see your Arm side implementation!

On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote:
> Date: Fri, 23 Aug 2024 13:54:44 +0100
> From: Alireza Sanaee <alireza.sanaee@huawei.com>
> Subject: [RFC PATCH 0/2] Specifying cache topology on ARM
> X-Mailer: git-send-email 2.34.1
> 

[snip]

> 
> The following command will represent the system.
> 
> ./qemu-system-aarch64 \
>  -machine virt,**smp-cache=cache0** \
>  -cpu max \
>  -m 2048 \
>  -smp sockets=2,clusters=1,cores=2,threads=2 \
>  -kernel ./Image.gz \
>  -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
>  -initrd rootfs.cpio.gz \
>  -bios ./edk2-aarch64-code.fd \
>  **-object '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'** \
>  -nographic

I plan to refresh a new version soon, in which the smp-cache array will
be integrated into -machine totally. And I'cc you then.

Regards,
Zhao



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table
  2024-08-23 12:54 ` [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table Alireza Sanaee via
@ 2024-08-31 11:47   ` Zhao Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Zhao Liu @ 2024-08-31 11:47 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: qemu-devel, qemu-arm, zhenyu.z.wang, dapeng1.mi, yongwei.ma,
	armbru, farman, peter.maydell, mst, anisinha, shannon.zhaosl,
	imammedo, mtosatti, berrange, richard.henderson, linuxarm,
	shameerali.kolothum.thodi, Jonathan.Cameron, jiangkunkun,
	zhao1.liu

Hi Alireza,

On Fri, Aug 23, 2024 at 01:54:46PM +0100, Alireza Sanaee wrote:

[snip]

> +static int partial_cache_description(MachineState *ms, ACPIPPTTCache* caches,
> +                                 int num_caches)
> +{
> +    int level, c;
> +
> +    for (level = 1; level < num_caches; level++) {
> +        for (c = 0; c < num_caches; c++) {
> +            if (caches[c].level != level) {
> +                continue;
> +            }
> +
> +            switch (level) {
> +            case 1:
> +                /*
> +                 * L1 cache is assumed to have both L1I and L1D available.
> +                 * Technically both need to be checked.
> +                 */
> +                if (machine_get_cache_topo_level(ms, SMP_CACHE_L1I) ==
> +                        CPU_TOPO_LEVEL_DEFAULT) {

This check just concerns L1i, but it looks not covering L1d, is L1d being
missed?  

> +                    assert(machine_get_cache_topo_level(ms, SMP_CACHE_L1D) !=
> +                           CPU_TOPO_LEVEL_DEFAULT);

I understand you don't want user to configure other different levels for
L1d in this case...If so, it's better to return error (error_steg or
error_report or some other error print ways) to tell user his cache
configuration is invalid.

> +                    return level;
> +                }
> +                break;
> +            case 2:
> +                if (machine_get_cache_topo_level(ms, SMP_CACHE_L2) ==
> +                        CPU_TOPO_LEVEL_DEFAULT) {
> +                    return level;
> +                }
> +                break;
> +            case 3:
> +                if (machine_get_cache_topo_level(ms, SMP_CACHE_L3) ==
> +                        CPU_TOPO_LEVEL_DEFAULT) {
> +                    return level;
> +                }
> +                break;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +

[snip]

> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index b0c68d66a3..b723248ecf 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -3093,6 +3093,11 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      hc->unplug = virt_machine_device_unplug_cb;
>      mc->nvdimm_supported = true;
>      mc->smp_props.clusters_supported = true;
> +    /* Supported cached */
> +    mc->smp_props.cache_supported[SMP_CACHE_L1D] = true;
> +    mc->smp_props.cache_supported[SMP_CACHE_L1I] = true;
> +    mc->smp_props.cache_supported[SMP_CACHE_L2] = true;
> +    mc->smp_props.cache_supported[SMP_CACHE_L3] = true;
>      mc->auto_enable_numa_with_memhp = true;
>      mc->auto_enable_numa_with_memdev = true;
>      /* platform instead of architectural choice */
> diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
> index bf6f2f9107..de95ec9c0f 100644
> --- a/hw/core/machine-smp.c
> +++ b/hw/core/machine-smp.c
> @@ -274,7 +274,11 @@ unsigned int machine_topo_get_threads_per_socket(const MachineState *ms)
>  CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
>                                                SMPCacheName cache)
>  {
> -    return ms->smp_cache->props[cache].topo;
> +    if (ms->smp_cache) {
> +        return ms->smp_cache->props[cache].topo;
> +    }
> +
> +    return CPU_TOPO_LEVEL_DEFAULT;
>  }
>  
>  static bool machine_check_topo_support(MachineState *ms,

Maybe it's better to split smp-cache support/check on Arm in a seperate
patch.

Regards,
Zhao




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] Specifying cache topology on ARM
  2024-08-31 11:25 ` [RFC PATCH 0/2] Specifying cache topology on ARM Zhao Liu
@ 2024-09-02 10:25   ` Alireza Sanaee via
  2024-09-02 12:23     ` Zhao Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Alireza Sanaee via @ 2024-09-02 10:25 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, qemu-arm, zhenyu.z.wang, dapeng1.mi, yongwei.ma,
	armbru, farman, peter.maydell, mst, anisinha, shannon.zhaosl,
	imammedo, mtosatti, berrange, richard.henderson, linuxarm,
	shameerali.kolothum.thodi, Jonathan.Cameron, jiangkunkun

On Sat, 31 Aug 2024 19:25:47 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> Hi Alireza,
> 
> Great to see your Arm side implementation!
> 
> On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote:
> > Date: Fri, 23 Aug 2024 13:54:44 +0100
> > From: Alireza Sanaee <alireza.sanaee@huawei.com>
> > Subject: [RFC PATCH 0/2] Specifying cache topology on ARM
> > X-Mailer: git-send-email 2.34.1
> >   
> 
> [snip]
> 
> > 
> > The following command will represent the system.
> > 
> > ./qemu-system-aarch64 \
> >  -machine virt,**smp-cache=cache0** \
> >  -cpu max \
> >  -m 2048 \
> >  -smp sockets=2,clusters=1,cores=2,threads=2 \
> >  -kernel ./Image.gz \
> >  -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
> >  -initrd rootfs.cpio.gz \
> >  -bios ./edk2-aarch64-code.fd \
> >  **-object
> > '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'**
> > \ -nographic  
> 
> I plan to refresh a new version soon, in which the smp-cache array
> will be integrated into -machine totally. And I'cc you then.
> 
> Regards,
> Zhao
> 
> 


Hi Zhao,

Yes, please keep me CCed. 

One thing that I noticed, sometimes, since you were going down the
Intel path, some variables couldn't be NULL. But when I was gonna go
down to ARM path, I faced some scenarios where I ended up with
some uninit vars which is still OK but could have been avoided.

Looking forward to the next revision.

Alireza


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] Specifying cache topology on ARM
  2024-08-23 12:54 [RFC PATCH 0/2] Specifying cache topology on ARM Alireza Sanaee via
                   ` (2 preceding siblings ...)
  2024-08-31 11:25 ` [RFC PATCH 0/2] Specifying cache topology on ARM Zhao Liu
@ 2024-09-02 11:49 ` Marcin Juszkiewicz
  3 siblings, 0 replies; 8+ messages in thread
From: Marcin Juszkiewicz @ 2024-09-02 11:49 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel, qemu-arm

On 23.08.2024 14:54, Alireza Sanaee via wrote:

> Failure cases:
>      1) there are cases where QEMU might not have any clusters selected in the
>      -smp option, while user specifies caches to be shared at cluster level. In
>      this situations, qemu returns error.
> 
>      2) There are other scenarios where caches exist in systems' registers but
>      not left unspecified by users. In this case qemu returns failure.

Sockets, clusters, cores, threads. And then caches. Sounds like more fun
than it is already.

IIRC Arm hardware can have up to 16 cores per cluster (virt uses 16,
sbsa-ref uses 8) as this is GIC limitation.

I have a script to visualize Arm topology:

https://github.com/hrw/sbsa-ref-status/blob/main/parse-pptt-log.py

It uses 'EFIShell> acpiview -s PPTT' output and gives something like this:

-smp 24,sockets=1,clusters=2,cores=3,threads=4
socket:        offset: 0x24 parent: 0x0
   cluster:     offset: 0x38 parent: 0x24
     core:      offset: 0x4C parent: 0x38 cpuId: 0x0 L1i: 0x68 L1d: 0x84
       cache:   offset: 0x68 cacheId: 1 size: 0x10000 next: 0xA0
       cache:   offset: 0x84 cacheId: 2 size: 0x10000 next: 0xA0
       cache:   offset: 0xA0 cacheId: 3 size: 0x80000
       thread:  offset: 0xBC parent: 0x4C cpuId: 0x0
       thread:  offset: 0xD0 parent: 0x4C cpuId: 0x1
       thread:  offset: 0xE4 parent: 0x4C cpuId: 0x2
       thread:  offset: 0xF8 parent: 0x4C cpuId: 0x3
     core:      offset: 0x10C parent: 0x38 cpuId: 0x0 L1i: 0x128 L1d: 0x144
       cache:   offset: 0x128 cacheId: 4 size: 0x10000 next: 0x160
       cache:   offset: 0x144 cacheId: 5 size: 0x10000 next: 0x160
       cache:   offset: 0x160 cacheId: 6 size: 0x80000
       thread:  offset: 0x17C parent: 0x10C cpuId: 0x4
       thread:  offset: 0x190 parent: 0x10C cpuId: 0x5
       thread:  offset: 0x1A4 parent: 0x10C cpuId: 0x6
       thread:  offset: 0x1B8 parent: 0x10C cpuId: 0x7
     core:      offset: 0x1CC parent: 0x38 cpuId: 0x0 L1i: 0x1E8 L1d: 0x204
       cache:   offset: 0x1E8 cacheId: 7 size: 0x10000 next: 0x220
       cache:   offset: 0x204 cacheId: 8 size: 0x10000 next: 0x220
       cache:   offset: 0x220 cacheId: 9 size: 0x80000
       thread:  offset: 0x23C parent: 0x1CC cpuId: 0x8
       thread:  offset: 0x250 parent: 0x1CC cpuId: 0x9
       thread:  offset: 0x264 parent: 0x1CC cpuId: 0xA
       thread:  offset: 0x278 parent: 0x1CC cpuId: 0xB
   cluster:     offset: 0x28C parent: 0x24
     core:      offset: 0x2A0 parent: 0x28C cpuId: 0x0 L1i: 0x2BC L1d: 0x2D8
       cache:   offset: 0x2BC cacheId: 10 size: 0x10000 next: 0x2F4
       cache:   offset: 0x2D8 cacheId: 11 size: 0x10000 next: 0x2F4
       cache:   offset: 0x2F4 cacheId: 12 size: 0x80000
       thread:  offset: 0x310 parent: 0x2A0 cpuId: 0xC
       thread:  offset: 0x324 parent: 0x2A0 cpuId: 0xD
       thread:  offset: 0x338 parent: 0x2A0 cpuId: 0xE
       thread:  offset: 0x34C parent: 0x2A0 cpuId: 0xF
     core:      offset: 0x360 parent: 0x28C cpuId: 0x0 L1i: 0x37C L1d: 0x398
       cache:   offset: 0x37C cacheId: 13 size: 0x10000 next: 0x3B4
       cache:   offset: 0x398 cacheId: 14 size: 0x10000 next: 0x3B4
       cache:   offset: 0x3B4 cacheId: 15 size: 0x80000
       thread:  offset: 0x3D0 parent: 0x360 cpuId: 0x10
       thread:  offset: 0x3E4 parent: 0x360 cpuId: 0x11
       thread:  offset: 0x3F8 parent: 0x360 cpuId: 0x12
       thread:  offset: 0x40C parent: 0x360 cpuId: 0x13
     core:      offset: 0x420 parent: 0x28C cpuId: 0x0 L1i: 0x43C L1d: 0x458
       cache:   offset: 0x43C cacheId: 16 size: 0x10000 next: 0x474
       cache:   offset: 0x458 cacheId: 17 size: 0x10000 next: 0x474
       cache:   offset: 0x474 cacheId: 18 size: 0x80000
       thread:  offset: 0x490 parent: 0x420 cpuId: 0x14
       thread:  offset: 0x4A4 parent: 0x420 cpuId: 0x15
       thread:  offset: 0x4B8 parent: 0x420 cpuId: 0x16
       thread:  offset: 0x4CC parent: 0x420 cpuId: 0x17

You may find it useful. I tested it only with cache at either core or
cluster level.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/2] Specifying cache topology on ARM
  2024-09-02 10:25   ` Alireza Sanaee via
@ 2024-09-02 12:23     ` Zhao Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Zhao Liu @ 2024-09-02 12:23 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: qemu-devel, qemu-arm, zhenyu.z.wang, dapeng1.mi, yongwei.ma,
	armbru, farman, peter.maydell, mst, anisinha, shannon.zhaosl,
	imammedo, mtosatti, berrange, richard.henderson, linuxarm,
	shameerali.kolothum.thodi, Jonathan.Cameron, jiangkunkun,
	zhao1.liu

On Mon, Sep 02, 2024 at 11:25:19AM +0100, Alireza Sanaee wrote:
> 
> Hi Zhao,
> 
> Yes, please keep me CCed. 
> 
> One thing that I noticed, sometimes, since you were going down the
> Intel path, some variables couldn't be NULL. But when I was gonna go
> down to ARM path, I faced some scenarios where I ended up with
> some uninit vars which is still OK but could have been avoided.

Ah I didn't get your point very clearly. Could you please figure out
those places on my patches? Then I can fix them in my next version. :)

Thanks,
Zhao

> Looking forward to the next revision.
> 
> Alireza


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-02 12:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-23 12:54 [RFC PATCH 0/2] Specifying cache topology on ARM Alireza Sanaee via
2024-08-23 12:54 ` [PATCH 1/2] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
2024-08-23 12:54 ` [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table Alireza Sanaee via
2024-08-31 11:47   ` Zhao Liu
2024-08-31 11:25 ` [RFC PATCH 0/2] Specifying cache topology on ARM Zhao Liu
2024-09-02 10:25   ` Alireza Sanaee via
2024-09-02 12:23     ` Zhao Liu
2024-09-02 11:49 ` Marcin Juszkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).