[PATCH v16 0/8] Specifying cache topology on ARM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v16 0/8] Specifying cache topology on ARM
@ 2025-08-27 14:21 Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Specifying the cache layout in virtual machines is useful for
applications and operating systems to fetch accurate information about
the cache structure and make appropriate adjustments. Enforcing correct
sharing information can lead to better optimizations. Patches that allow
for an interface to express caches was landed in the prior cycles. This
patchset uses the interface as a foundation.  Thus, the device tree and
ACPI/PPTT table, and device tree are populated based on
user-provided information and CPU topology.

Example:

+----------------+                         +----------------+
|    Socket 0    |                         |    Socket 1    |
|    (L3 Cache)  |                         |    (L3 Cache)  |
+--------+-------+                         +--------+-------+
         |                                          |
+--------+--------+                        +--------+--------+
|   Cluster 0     |                        |   Cluster 0     |
|   (L2 Cache)    |                        |   (L2 Cache)    |
+--------+--------+                        +--------+--------+
         |                                          |
+--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
|   Core 0         | |   Core 1        |   |   Core 0        |  |   Core 1    |
|   (L1i, L1d)     | |   (L1i, L1d)    |   |   (L1i, L1d)    |  |   (L1i, L1d)|
+--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
         |                    |                     |                    |
+--------+              +--------+             +--------+           +--------+
|Thread 0|              |Thread 1|             |Thread 1|           |Thread 0|
+--------+              +--------+             +--------+           +--------+
|Thread 1|              |Thread 0|             |Thread 0|           |Thread 1|
+--------+              +--------+             +--------+           +--------+


The following command will represent the system relying on **ACPI PPTT tables**.

./qemu-system-aarch64 \
 -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
 -cpu max \
 -m 2048 \
 -smp sockets=2,clusters=1,cores=2,threads=2 \
 -kernel ./Image.gz \
 -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
 -initrd rootfs.cpio.gz \
 -bios ./edk2-aarch64-code.fd \
 -nographic

The following command will represent the system relying on **the device tree**.

./qemu-system-aarch64 \
 -machine virt,acpi=off,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
 -cpu max \
 -m 2048 \
 -smp sockets=2,clusters=1,cores=2,threads=2 \
 -kernel ./Image.gz \
 -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \
 -initrd rootfs.cpio.gz \
 -nographic

Failure cases:
    1) There are scenarios where caches exist in systems' registers but
    left unspecified by users. In this case qemu returns failure.

    2) SMT threads cannot share caches which is not very common. More
    discussions here [1].

Currently only three levels of caches are supported to be specified from
the command line. However, increasing the value does not require
significant changes. Further, this patch assumes l2 and l3 unified
caches and does not allow l(2/3)(i/d). The level terminology is
thread/core/cluster/socket. Hierarchy assumed in this patch:
Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3;

Possible future enhancements:
  1) Separated data and instruction cache at L2 and L3.
  2) Additional cache controls.  e.g. size of L3 may not want to just
  match the underlying system, because only some of the associated host
  CPUs may be bound to this VM.

[1] https://lore.kernel.org/devicetree-spec/20250203120527.3534-1-alireza.sanaee@huawei.com/

Change Log:
  v15->v16:
    * Rebase to e771ba98de25c9f43959f79fc7099cf7fbba44cc (Open 10.2 development tree)
    * v15: https://lore.kernel.org/qemu-devel/20250812122829.204-1-alireza.sanaee@huawei.com/

  v14->v15:
   * Introduced a separate patch for loongarch64 build_pptt function.
   * Made sure loongarch64 tests pass.
   * Downgraded to V2 for ACPI PPTT. Removed PPTT IDs as was not necessary.
   * Removed dependency as it's been merged in the recent cycle.
     -- 20250604115233.1234-1-alireza.sanaee@huawei.com
   * Fixed styling issues and removed irrelevant changes.
   * Moved cache headers to core/cpu.h to be used in both acpi and virt.
   * v14: https://lore.kernel.org/qemu-devel/20250707121908.155-1-alireza.sanaee@huawei.com/
   # Thanks to Jonathan and Zhao for their comments.

  v13->v14:
   * Rebased on latest staging.
   * Made some naming changes to machine-smp.c, addd docs added to the
        same file.

  v12->v13:
   * Applied comments from Zhao.
   * Introduced a new patch for machine specific cache topology functions.
   * Base: bc98ffdc7577e55ab8373c579c28fe24d600c40f.

  v11->v12:
   * Patch #4 couldn't not merge properly as the main file diverged. Now it is fixed (hopefully).
   * Loonarch build_pptt function updated.
   * Rebased on 09be8a511a2e278b45729d7b065d30c68dd699d0.

  v10->v11:
   * Fix some coding style issues.
   * Rename some variables.

  v9->v10:
   * PPTT rev down to 2.

  v8->v9:
   * rebase to 10
   * Fixed a bug in device-tree generation related to a scenario when
        caches are shared at core in higher levels than 1.
  v7->v8:
   * rebase: Merge tag 'pull-nbd-2024-08-26' of https://repo.or.cz/qemu/ericb into staging
   * I mis-included a file in patch #4 and I removed it in this one.

  v6->v7:
   * Intel stuff got pulled up, so rebase.
   * added some discussions on device tree.

  v5->v6:
   * Minor bug fix.
   * rebase based on new Intel patchset.
     - https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1.liu@intel.com/

  v4->v5:
    * Added Reviewed-by tags.
    * Applied some comments.

  v3->v4:
    * Device tree added.

Alireza Sanaee (8):
  target/arm/tcg: increase cache level for cpu=max
  hw/core/machine: topology functions capabilities added
  hw/arm/virt: add cache hierarchy to device tree
  bios-tables-test: prepare to change ARM ACPI virt PPTT
  acpi: add caches to ACPI build_pptt table function
  hw/acpi: add cache hierarchy to pptt table
  tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology
  Update the ACPI tables based on new aml-build.c

 hw/acpi/aml-build.c                        | 203 +++++++++-
 hw/arm/virt-acpi-build.c                   |   8 +-
 hw/arm/virt.c                              | 412 ++++++++++++++++++++-
 hw/core/machine-smp.c                      |  56 +++
 hw/loongarch/virt-acpi-build.c             |   4 +-
 include/hw/acpi/aml-build.h                |   4 +-
 include/hw/acpi/cpu.h                      |  10 +
 include/hw/arm/virt.h                      |   7 +-
 include/hw/boards.h                        |   5 +
 include/hw/core/cpu.h                      |  12 +
 target/arm/tcg/cpu64.c                     |  13 +
 tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 516 bytes
 tests/qtest/bios-tables-test.c             |   4 +
 13 files changed, 723 insertions(+), 15 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-10-20 13:51   ` Gustavo Romero
  2025-08-27 14:21 ` [PATCH v16 2/8] hw/core/machine: topology functions capabilities added Alireza Sanaee via
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

This patch addresses cache description in the `aarch64_max_tcg_initfn`
function for cpu=max. It introduces three layers of caches and modifies
the cache description registers accordingly.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 target/arm/tcg/cpu64.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 35cddbafa4..bf1372ecdf 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1093,6 +1093,19 @@ void aarch64_max_tcg_initfn(Object *obj)
     uint64_t t;
     uint32_t u;
 
+    /*
+     * Expanded cache set
+     */
+    SET_IDREG(isar, CLIDR, 0x8200123); /* 4 4 3 in 3 bit fields */
+    /* 64KB L1 dcache */
+    cpu->ccsidr[0] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 4, 64, 64 * KiB, 7);
+    /* 64KB L1 icache */
+    cpu->ccsidr[1] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 4, 64, 64 * KiB, 2);
+    /* 1MB L2 unified cache */
+    cpu->ccsidr[2] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 8, 64, 1 * MiB, 7);
+    /* 2MB L3 unified cache */
+    cpu->ccsidr[4] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 8, 64, 2 * MiB, 7);
+
     /*
      * Unset ARM_FEATURE_BACKCOMPAT_CNTFRQ, which we would otherwise default
      * to because we started with aarch64_a57_initfn(). A 'max' CPU might
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 2/8] hw/core/machine: topology functions capabilities added
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-10-20 13:52   ` Gustavo Romero
  2025-08-27 14:21 ` [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree Alireza Sanaee via
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Add two functions one of which finds the lowest level cache defined in
the cache description input, and the other checks if caches are defined
at a particular level.

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 hw/core/machine-smp.c | 56 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/boards.h   |  5 ++++
 2 files changed, 61 insertions(+)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 0be0ac044c..32f3e7d6c9 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -406,3 +406,59 @@ bool machine_check_smp_cache(const MachineState *ms, Error **errp)
 
     return true;
 }
+
+/*
+ * This function assumes l3 and l2 have unified cache and l1 is split l1d and
+ * l1i.
+ */
+bool machine_find_lowest_level_cache_at_topo_level(const MachineState *ms,
+                                                   int *level_found,
+                                                   CpuTopologyLevel topo_level)
+{
+
+    CpuTopologyLevel level;
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I);
+    if (level == topo_level) {
+        *level_found = 1;
+        return true;
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D);
+    if (level == topo_level) {
+        *level_found = 1;
+        return true;
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2);
+    if (level == topo_level) {
+        *level_found = 2;
+        return true;
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3);
+    if (level == topo_level) {
+        *level_found = 3;
+        return true;
+    }
+
+    return false;
+}
+
+/*
+ * Check if there are caches defined at a particular level. It supports only
+ * L1, L2 and L3 caches, but this can be extended to more levels as needed.
+ *
+ * Return True on success, False otherwise.
+ */
+bool machine_defines_cache_at_topo_level(const MachineState *ms,
+                                       CpuTopologyLevel level)
+{
+    if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3) == level ||
+        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2) == level ||
+        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I) == level ||
+        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D) == level) {
+        return true;
+    }
+    return false;
+}
diff --git a/include/hw/boards.h b/include/hw/boards.h
index f94713e6e2..3c1a999791 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -55,6 +55,11 @@ void machine_set_cache_topo_level(MachineState *ms, CacheLevelAndType cache,
                                   CpuTopologyLevel level);
 bool machine_check_smp_cache(const MachineState *ms, Error **errp);
 void machine_memory_devices_init(MachineState *ms, hwaddr base, uint64_t size);
+bool machine_defines_cache_at_topo_level(const MachineState *ms,
+                                       CpuTopologyLevel level);
+bool machine_find_lowest_level_cache_at_topo_level(const MachineState *ms,
+                                                   int *level_found,
+                                                   CpuTopologyLevel topo_level);
 
 /**
  * machine_class_allow_dynamic_sysbus_dev: Add type to list of valid devices
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 2/8] hw/core/machine: topology functions capabilities added Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-10-20 14:33   ` Gustavo Romero
  2025-08-27 14:21 ` [PATCH v16 4/8] bios-tables-test: prepare to change ARM ACPI virt PPTT Alireza Sanaee via
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Specify which layer (core/cluster/socket) caches found at in the CPU
topology. Updating cache topology to device tree (spec v0.4).
Example:

For example, 2 sockets (packages), and 2 clusters, 4 cores and 2 threads
created, in aggregate 2*2*4*2 logical cores. In the smp-cache object,
cores will have l1d and l1i.  However, extending this is not difficult.
The clusters will share a unified l2 level cache, and finally sockets
will share l3. In this patch, threads will share l1 caches by default,
but this can be adjusted if case required.

Only three levels of caches are supported.  The patch does not
allow partial declaration of caches. In other words, the topology level
of every cache must be specified if that of any level is.

./qemu-system-aarch64 \
    -machine virt,\
         smp-cache.0.cache=l1i,smp-cache.0.topology=core,\
         smp-cache.1.cache=l1d,smp-cache.1.topology=core,\
         smp-cache.2.cache=l2,smp-cache.2.topology=cluster,\
         smp-cache.3.cache=l3,smp-cache.3.topology=socket\
    -cpu max \
    -m 2048 \
    -smp sockets=2,clusters=2,cores=4,threads=1 \
    -kernel ./Image.gz \
    -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
    -initrd rootfs.cpio.gz \
    -bios ./edk2-aarch64-code.fd \
    -nographic

For instance, following device tree will be generated for a scenario
where we have 2 sockets, 2 clusters, 2 cores and 2 threads, in total 16
PEs. L1i and L1d are private to each thread, and l2 and l3 are shared at
socket level as an example.

Limitation: SMT cores cannot share L1 cache for now. This
problem does not exist in PPTT tables.

Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 hw/arm/virt.c         | 412 +++++++++++++++++++++++++++++++++++++++++-
 include/hw/arm/virt.h |   7 +-
 include/hw/core/cpu.h |  12 ++
 3 files changed, 429 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ef6be3660f..9094d8bef8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -88,6 +88,7 @@
 #include "hw/virtio/virtio-md-pci.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
+#include "hw/core/cpu.h"
 #include "hw/cxl/cxl.h"
 #include "hw/cxl/cxl_host.h"
 #include "qemu/guest-random.h"
@@ -273,6 +274,134 @@ static bool ns_el2_virt_timer_present(void)
         arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
 }
 
+unsigned int virt_get_caches(const VirtMachineState *vms,
+                             CPUCoreCaches *caches)
+{
+    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0)); /* assume homogeneous CPUs */
+    bool ccidx = cpu_isar_feature(any_ccidx, armcpu);
+    ARMISARegisters *isar = &armcpu->isar;
+    unsigned int num_cache, i;
+    int level_instr = 1, level_data = 1;
+
+    for (i = 0, num_cache = 0; i < CPU_MAX_CACHES; i++, num_cache++) {
+        uint32_t clidr = GET_IDREG(isar, CLIDR);
+        int type = (clidr >> (3 * i)) & 7;
+        int bank_index;
+        int level;
+        enum CacheType cache_type;
+
+        if (type == 0) {
+            break;
+        }
+
+        switch (type) {
+        case 1:
+            cache_type = INSTRUCTION_CACHE;
+            level = level_instr;
+            break;
+        case 2:
+            cache_type = DATA_CACHE;
+            level = level_data;
+            break;
+        case 4:
+            cache_type = UNIFIED_CACHE;
+            level = level_instr > level_data ? level_instr : level_data;
+            break;
+        case 3: /* Split - Do data first */
+            cache_type = DATA_CACHE;
+            level = level_data;
+            break;
+        default:
+            error_setg(&error_abort, "Unrecognized cache type");
+            return 0;
+        }
+        /*
+         * ccsidr is indexed using both the level and whether it is
+         * an instruction cache. Unified caches use the same storage
+         * as data caches.
+         */
+        bank_index = (i * 2) | ((type == 1) ? 1 : 0);
+        if (ccidx) {
+            caches[num_cache] = (CPUCoreCaches) {
+                .type =  cache_type,
+                .level = level,
+                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                             CCSIDR_EL1,
+                                             CCIDX_LINESIZE) + 4),
+                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                            CCSIDR_EL1,
+                                            CCIDX_ASSOCIATIVITY) + 1,
+                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                   CCIDX_NUMSETS) + 1,
+            };
+        } else {
+            caches[num_cache] = (CPUCoreCaches) {
+                .type =  cache_type,
+                .level = level,
+                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                             CCSIDR_EL1, LINESIZE) + 4),
+                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                            CCSIDR_EL1,
+                                            ASSOCIATIVITY) + 1,
+                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                   NUMSETS) + 1,
+            };
+        }
+        caches[num_cache].size = caches[num_cache].associativity *
+            caches[num_cache].sets * caches[num_cache].linesize;
+
+        /* Break one 'split' entry up into two records */
+        if (type == 3) {
+            num_cache++;
+            bank_index = (i * 2) | 1;
+            if (ccidx) {
+                /* Instruction cache: bottom bit set when reading banked reg */
+                caches[num_cache] = (CPUCoreCaches) {
+                    .type = INSTRUCTION_CACHE,
+                    .level = level_instr,
+                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                 CCSIDR_EL1,
+                                                 CCIDX_LINESIZE) + 4),
+                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                CCSIDR_EL1,
+                                                CCIDX_ASSOCIATIVITY) + 1,
+                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                       CCIDX_NUMSETS) + 1,
+                };
+            } else {
+                caches[num_cache] = (CPUCoreCaches) {
+                    .type = INSTRUCTION_CACHE,
+                    .level = level_instr,
+                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                 CCSIDR_EL1, LINESIZE) + 4),
+                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
+                                                CCSIDR_EL1,
+                                                ASSOCIATIVITY) + 1,
+                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
+                                       NUMSETS) + 1,
+                };
+            }
+            caches[num_cache].size = caches[num_cache].associativity *
+                caches[num_cache].sets * caches[num_cache].linesize;
+        }
+        switch (type) {
+        case 1:
+            level_instr++;
+            break;
+        case 2:
+            level_data++;
+            break;
+        case 3:
+        case 4:
+            level_instr++;
+            level_data++;
+            break;
+        }
+    }
+
+    return num_cache;
+}
+
 static void create_fdt(VirtMachineState *vms)
 {
     MachineState *ms = MACHINE(vms);
@@ -423,13 +552,150 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
     }
 }
 
+static void add_cache_node(void *fdt, char *nodepath, CPUCoreCaches cache,
+                           uint32_t *next_level)
+{
+    /* Assume L2/3 are unified caches. */
+
+    uint32_t phandle;
+
+    qemu_fdt_add_path(fdt, nodepath);
+    phandle = qemu_fdt_alloc_phandle(fdt);
+    qemu_fdt_setprop_cell(fdt, nodepath, "phandle", phandle);
+    qemu_fdt_setprop_cell(fdt, nodepath, "cache-level", cache.level);
+    qemu_fdt_setprop_cell(fdt, nodepath, "cache-size", cache.size);
+    qemu_fdt_setprop_cell(fdt, nodepath, "cache-block-size", cache.linesize);
+    qemu_fdt_setprop_cell(fdt, nodepath, "cache-sets", cache.sets);
+    qemu_fdt_setprop(fdt, nodepath, "cache-unified", NULL, 0);
+    qemu_fdt_setprop_string(fdt, nodepath, "compatible", "cache");
+    if (cache.level != 3) {
+        /* top level cache doesn't have next-level-cache property */
+        qemu_fdt_setprop_cell(fdt, nodepath, "next-level-cache", *next_level);
+    }
+
+    *next_level = phandle;
+}
+
+static bool add_cpu_cache_hierarchy(void *fdt, CPUCoreCaches* cache,
+                                    uint32_t cache_cnt,
+                                    uint32_t top_level,
+                                    uint32_t bottom_level,
+                                    uint32_t cpu_id,
+                                    uint32_t *next_level) {
+    bool found_cache = false;
+
+    for (int level = top_level; level >= bottom_level; level--) {
+        for (int i = 0; i < cache_cnt; i++) {
+            char *nodepath;
+
+            if (i != level) {
+                continue;
+            }
+
+            nodepath = g_strdup_printf("/cpus/cpu@%d/l%d-cache",
+                                       cpu_id, level);
+            add_cache_node(fdt, nodepath, cache[i], next_level);
+            found_cache = true;
+            g_free(nodepath);
+
+        }
+    }
+
+    return found_cache;
+}
+
+static void set_cache_properties(void *fdt, const char *nodename,
+                                 const char *prefix, CPUCoreCaches cache)
+{
+    char prop_name[64];
+
+    snprintf(prop_name, sizeof(prop_name), "%s-block-size", prefix);
+    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.linesize);
+
+    snprintf(prop_name, sizeof(prop_name), "%s-size", prefix);
+    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.size);
+
+    snprintf(prop_name, sizeof(prop_name), "%s-sets", prefix);
+    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.sets);
+}
+
+static int partial_cache_description(const MachineState *ms,
+                                     CPUCoreCaches *caches, int num_caches)
+{
+    int level, c;
+
+    for (level = 1; level < num_caches; level++) {
+        for (c = 0; c < num_caches; c++) {
+            if (caches[c].level != level) {
+                continue;
+            }
+
+            switch (level) {
+            case 1:
+                /*
+                 * L1 cache is assumed to have both L1I and L1D available.
+                 * Technically both need to be checked.
+                 */
+                if (machine_get_cache_topo_level(ms,
+                                                 CACHE_LEVEL_AND_TYPE_L1I) ==
+                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
+                    return level;
+                }
+                break;
+            case 2:
+                if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2) ==
+                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
+                    return level;
+                }
+                break;
+            case 3:
+                if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3) ==
+                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
+                    return level;
+                }
+                break;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 {
     int cpu;
     int addr_cells = 1;
     const MachineState *ms = MACHINE(vms);
+    const MachineClass *mc = MACHINE_GET_CLASS(ms);
     const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
     int smp_cpus = ms->smp.cpus;
+    int socket_id, cluster_id, core_id;
+    uint32_t next_level = 0;
+    uint32_t socket_offset = 0;
+    uint32_t cluster_offset = 0;
+    uint32_t core_offset = 0;
+    int last_socket = -1;
+    int last_cluster = -1;
+    int last_core = -1;
+    int top_node = 3;
+    int top_cluster = 3;
+    int top_core = 3;
+    int bottom_node = 3;
+    int bottom_cluster = 3;
+    int bottom_core = 3;
+    unsigned int num_cache;
+    CPUCoreCaches caches[16];
+    bool cache_created = false;
+    bool cache_available;
+    bool llevel;
+
+    num_cache = virt_get_caches(vms, caches);
+
+    if (mc->smp_props.has_caches &&
+        partial_cache_description(ms, caches, num_cache)) {
+        error_setg(&error_fatal, "Missing cache description");
+        return;
+    }
 
     /*
      * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
@@ -458,9 +724,14 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
     qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#size-cells", 0x0);
 
     for (cpu = smp_cpus - 1; cpu >= 0; cpu--) {
+        socket_id = cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
+        cluster_id = cpu / (ms->smp.cores * ms->smp.threads) % ms->smp.clusters;
+        core_id = cpu / ms->smp.threads % ms->smp.cores;
+
         char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
         ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
         CPUState *cs = CPU(armcpu);
+        const char *prefix = NULL;
 
         qemu_fdt_add_subnode(ms->fdt, nodename);
         qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "cpu");
@@ -490,6 +761,139 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
                                   qemu_fdt_alloc_phandle(ms->fdt));
         }
 
+        if (!vmc->no_cpu_topology && num_cache) {
+            for (uint8_t i = 0; i < num_cache; i++) {
+                /* only level 1 in the CPU entry */
+                if (caches[i].level > 1) {
+                    continue;
+                }
+
+                if (caches[i].type == INSTRUCTION_CACHE) {
+                    prefix = "i-cache";
+                } else if (caches[i].type == DATA_CACHE) {
+                    prefix = "d-cache";
+                } else if (caches[i].type == UNIFIED_CACHE) {
+                    error_setg(&error_fatal,
+                               "Unified type is not implemented at level %d",
+                               caches[i].level);
+                    return;
+                } else {
+                    error_setg(&error_fatal, "Undefined cache type");
+                    return;
+                }
+
+                set_cache_properties(ms->fdt, nodename, prefix, caches[i]);
+            }
+        }
+
+        if (socket_id != last_socket) {
+            bottom_node = top_node;
+            /* this assumes socket as the highest topological level */
+            socket_offset = 0;
+            cluster_offset = 0;
+            cache_available = machine_defines_cache_at_topo_level(ms,
+                                  CPU_TOPOLOGY_LEVEL_SOCKET);
+            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
+                        &bottom_node,
+                        CPU_TOPOLOGY_LEVEL_SOCKET);
+            if (cache_available && llevel) {
+                if (bottom_node == 1 && !virt_is_acpi_enabled(vms))
+                    error_setg(
+                        &error_fatal,
+                        "Cannot share L1 at socket_id %d."
+                        "DT limiation on sharing at cache level = 1",
+                        socket_id);
+
+                cache_created = add_cpu_cache_hierarchy(ms->fdt, caches,
+                                                        num_cache,
+                                                        top_node,
+                                                        bottom_node, cpu,
+                                                        &socket_offset);
+
+                if (!cache_created) {
+                    error_setg(&error_fatal,
+                               "Socket: No caches at levels %d-%d",
+                               top_node, bottom_node);
+                    return;
+                }
+
+                top_cluster = bottom_node - 1;
+            }
+
+            last_socket = socket_id;
+        }
+
+        if (cluster_id != last_cluster) {
+            bottom_cluster = top_cluster;
+            cluster_offset = socket_offset;
+            core_offset = 0;
+            cache_available = machine_defines_cache_at_topo_level(ms,
+                                 CPU_TOPOLOGY_LEVEL_CLUSTER);
+            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
+                        &bottom_cluster,
+                        CPU_TOPOLOGY_LEVEL_CLUSTER);
+            if (cache_available && llevel) {
+                cache_created = add_cpu_cache_hierarchy(ms->fdt, caches,
+                                                        num_cache,
+                                                        top_cluster,
+                                                        bottom_cluster, cpu,
+                                                        &cluster_offset);
+                if (bottom_cluster == 1 && !virt_is_acpi_enabled(vms)) {
+                    error_setg(&error_fatal,
+                        "Cannot share L1 at socket_id %d, cluster_id %d. "
+                        "DT limitation on sharing at cache level = 1.",
+                        socket_id, cluster_id);
+                }
+
+                if (!cache_created) {
+                    error_setg(&error_fatal,
+                               "Cluster: No caches at levels %d-%d.",
+                               top_cluster, bottom_cluster);
+                    return;
+                }
+
+                top_core = bottom_cluster - 1;
+            } else if (top_cluster == bottom_node - 1) {
+                top_core = bottom_node - 1;
+            }
+
+            last_cluster = cluster_id;
+        }
+
+        if (core_id != last_core) {
+            bottom_core = top_core;
+            core_offset = cluster_offset;
+            cache_available = machine_defines_cache_at_topo_level(ms,
+                                 CPU_TOPOLOGY_LEVEL_CORE);
+            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
+                        &bottom_core,
+                        CPU_TOPOLOGY_LEVEL_CORE);
+            if (cache_available && llevel) {
+                if (bottom_core == 1 && top_core > 1) {
+                    bottom_core++;
+                    cache_created = add_cpu_cache_hierarchy(ms->fdt,
+                                                            caches,
+                                                            num_cache,
+                                                            top_core,
+                                                            bottom_core, cpu,
+                                                            &core_offset);
+
+                    if (!cache_created) {
+                        error_setg(&error_fatal,
+                                   "Core: No caches at levels %d-%d",
+                                   top_core, bottom_core);
+                        return;
+                    }
+                }
+            }
+
+            last_core = core_id;
+        }
+
+        next_level = core_offset;
+        qemu_fdt_setprop_cell(ms->fdt, nodename, "next-level-cache",
+                              next_level);
+
         g_free(nodename);
     }
 
@@ -2721,7 +3125,7 @@ static void virt_set_oem_table_id(Object *obj, const char *value,
 }
 
 
-bool virt_is_acpi_enabled(VirtMachineState *vms)
+bool virt_is_acpi_enabled(const VirtMachineState *vms)
 {
     if (vms->acpi == ON_OFF_AUTO_OFF) {
         return false;
@@ -3247,6 +3651,12 @@ static void virt_machine_class_init(ObjectClass *oc, const void *data)
     hc->unplug = virt_machine_device_unplug_cb;
     mc->nvdimm_supported = true;
     mc->smp_props.clusters_supported = true;
+
+    /* Supported caches */
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1D] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1I] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L2] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L3] = true;
     mc->auto_enable_numa_with_memhp = true;
     mc->auto_enable_numa_with_memdev = true;
     /* platform instead of architectural choice */
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 365a28b082..0099ea7fa1 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -40,6 +40,7 @@
 #include "system/kvm.h"
 #include "hw/intc/arm_gicv3_common.h"
 #include "qom/object.h"
+#include "hw/core/cpu.h"
 
 #define NUM_GICV2M_SPIS       64
 #define NUM_VIRTIO_TRANSPORTS 32
@@ -51,6 +52,8 @@
 /* GPIO pins */
 #define GPIO_PIN_POWER_BUTTON  3
 
+#define CPU_MAX_CACHES 16
+
 enum {
     VIRT_FLASH,
     VIRT_MEM,
@@ -187,7 +190,9 @@ struct VirtMachineState {
 OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
 
 void virt_acpi_setup(VirtMachineState *vms);
-bool virt_is_acpi_enabled(VirtMachineState *vms);
+bool virt_is_acpi_enabled(const VirtMachineState *vms);
+unsigned int virt_get_caches(const VirtMachineState *vms,
+                             CPUCoreCaches *caches);
 
 /* Return number of redistributors that fit in the specified region */
 static uint32_t virt_redist_capacity(VirtMachineState *vms, int region)
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 5eaf41a566..045219a68b 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -1134,4 +1134,16 @@ enum CacheType {
     UNIFIED_CACHE
 };
 
+struct CPUCoreCaches {
+    enum CacheType type;
+    uint32_t sets;
+    uint32_t size;
+    uint32_t level;
+    uint16_t linesize;
+    uint8_t attributes; /* write policy: 0x0 write back, 0x1 write through */
+    uint8_t associativity;
+};
+
+typedef struct CPUCoreCaches CPUCoreCaches;
+
 #endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 4/8] bios-tables-test: prepare to change ARM ACPI virt PPTT
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (2 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 5/8] acpi: add caches to ACPI build_pptt table function Alireza Sanaee via
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Prepare to update `build_pptt` function to add cache description
functionalities, thus add binaries in this patch.

Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..e84d6c6955 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,4 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/PPTT",
+"tests/data/acpi/aarch64/virt/PPTT.acpihmatvirt",
+"tests/data/acpi/aarch64/virt/PPTT.topology",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 5/8] acpi: add caches to ACPI build_pptt table function
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (3 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 4/8] bios-tables-test: prepare to change ARM ACPI virt PPTT Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table Alireza Sanaee via
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Add caches to build_pptt table function in ACPI for both ARM and
Loongarch.

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 hw/acpi/aml-build.c            | 3 ++-
 hw/arm/virt-acpi-build.c       | 2 +-
 hw/loongarch/virt-acpi-build.c | 4 ++--
 include/hw/acpi/aml-build.h    | 4 +++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 1e685f982f..e854f14565 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2145,7 +2145,8 @@ void build_spcr(GArray *table_data, BIOSLinker *linker,
  * 5.2.29 Processor Properties Topology Table (PPTT)
  */
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id)
+                const char *oem_id, const char *oem_table_id,
+                int num_caches, CPUCoreCaches *caches)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
     CPUArchIdList *cpus = ms->possible_cpus;
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index b01fc4f8ef..a6115f2f80 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -1044,7 +1044,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (!vmc->no_cpu_topology) {
         acpi_add_table(table_offsets, tables_blob);
         build_pptt(tables_blob, tables->linker, ms,
-                   vms->oem_id, vms->oem_table_id);
+                   vms->oem_id, vms->oem_table_id, 0, NULL);
     }
 
     acpi_add_table(table_offsets, tables_blob);
diff --git a/hw/loongarch/virt-acpi-build.c b/hw/loongarch/virt-acpi-build.c
index 8c2228a772..e7bbc40e27 100644
--- a/hw/loongarch/virt-acpi-build.c
+++ b/hw/loongarch/virt-acpi-build.c
@@ -551,8 +551,8 @@ static void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     build_madt(tables_blob, tables->linker, lvms);
 
     acpi_add_table(table_offsets, tables_blob);
-    build_pptt(tables_blob, tables->linker, machine,
-               lvms->oem_id, lvms->oem_table_id);
+    build_pptt(tables_blob, tables->linker, machine, lvms->oem_id,
+               lvms->oem_table_id, 0, NULL);
 
     acpi_add_table(table_offsets, tables_blob);
     build_srat(tables_blob, tables->linker, machine);
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index c18f681342..01e11c093e 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -3,6 +3,7 @@
 
 #include "hw/acpi/acpi-defs.h"
 #include "hw/acpi/bios-linker-loader.h"
+#include "hw/core/cpu.h"
 
 #define ACPI_BUILD_APPNAME6 "BOCHS "
 #define ACPI_BUILD_APPNAME8 "BXPC    "
@@ -497,7 +498,8 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 const char *oem_id, const char *oem_table_id);
 
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id);
+                const char *oem_id, const char *oem_table_id,
+                int num_caches, CPUCoreCaches *caches);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (4 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 5/8] acpi: add caches to ACPI build_pptt table function Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-10-20 14:33   ` Gustavo Romero
  2025-08-27 14:21 ` [PATCH v16 7/8] tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology Alireza Sanaee via
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Add cache topology to PPTT table. With this patch, both ACPI PPTT table
and device tree will represent the same cache topology given users
input.

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 hw/acpi/aml-build.c      | 200 +++++++++++++++++++++++++++++++++++++--
 hw/arm/virt-acpi-build.c |   8 +-
 include/hw/acpi/cpu.h    |  10 ++
 3 files changed, 209 insertions(+), 9 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index e854f14565..72b6bfdbe9 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -31,6 +31,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "qemu/cutils.h"
+#include "hw/core/cpu.h"
 
 static GArray *build_alloc_array(void)
 {
@@ -2140,6 +2141,104 @@ void build_spcr(GArray *table_data, BIOSLinker *linker,
     }
     acpi_table_end(linker, &table);
 }
+
+static void build_cache_nodes(GArray *tbl, CPUCoreCaches *cache,
+                              uint32_t next_offset)
+{
+    int val;
+
+    /* Type 1 - cache */
+    build_append_byte(tbl, 1);
+    /* Length */
+    build_append_byte(tbl, 24);
+    /* Reserved */
+    build_append_int_noprefix(tbl, 0, 2);
+    /* Flags */
+    build_append_int_noprefix(tbl, 0x7f, 4);
+    /* Offset of next cache up */
+    build_append_int_noprefix(tbl, next_offset, 4);
+    build_append_int_noprefix(tbl, cache->size, 4);
+    build_append_int_noprefix(tbl, cache->sets, 4);
+    build_append_byte(tbl, cache->associativity);
+    val = 0x3;
+    switch (cache->type) {
+    case INSTRUCTION_CACHE:
+        val |= (1 << 2);
+        break;
+    case DATA_CACHE:
+        val |= (0 << 2); /* Data */
+        break;
+    case UNIFIED_CACHE:
+        val |= (3 << 2); /* Unified */
+        break;
+    }
+    build_append_byte(tbl, val);
+    build_append_int_noprefix(tbl, cache->linesize, 2);
+}
+
+/*
+ * builds caches from the top level (`level_high` parameter) to the bottom
+ * level (`level_low` parameter).  It searches for caches found in
+ * systems' registers, and fills up the table. Then it updates the
+ * `data_offset` and `instr_offset` parameters with the offset of the data
+ * and instruction caches of the lowest level, respectively.
+ */
+static bool build_caches(GArray *table_data, uint32_t pptt_start,
+                         int num_caches, CPUCoreCaches *caches,
+                         uint8_t level_high, /* Inclusive */
+                         uint8_t level_low,  /* Inclusive */
+                         uint32_t *data_offset,
+                         uint32_t *instr_offset)
+{
+    uint32_t next_level_offset_data = 0, next_level_offset_instruction = 0;
+    uint32_t this_offset, next_offset = 0;
+    int c, level;
+    bool found_cache = false;
+
+    /* Walk caches from top to bottom */
+    for (level = level_high; level >= level_low; level--) {
+        for (c = 0; c < num_caches; c++) {
+            if (caches[c].level != level) {
+                continue;
+            }
+
+            /* Assume only unified above l1 for now */
+            this_offset = table_data->len - pptt_start;
+            switch (caches[c].type) {
+            case INSTRUCTION_CACHE:
+                next_offset = next_level_offset_instruction;
+                break;
+            case DATA_CACHE:
+                next_offset = next_level_offset_data;
+                break;
+            case UNIFIED_CACHE:
+                /* Either is fine here */
+                next_offset = next_level_offset_instruction;
+                break;
+            }
+            build_cache_nodes(table_data, &caches[c], next_offset);
+            switch (caches[c].type) {
+            case INSTRUCTION_CACHE:
+                next_level_offset_instruction = this_offset;
+                break;
+            case DATA_CACHE:
+                next_level_offset_data = this_offset;
+                break;
+            case UNIFIED_CACHE:
+                next_level_offset_instruction = this_offset;
+                next_level_offset_data = this_offset;
+                break;
+            }
+            *data_offset = next_level_offset_data;
+            *instr_offset = next_level_offset_instruction;
+
+            found_cache = true;
+        }
+    }
+
+    return found_cache;
+}
+
 /*
  * ACPI spec, Revision 6.3
  * 5.2.29 Processor Properties Topology Table (PPTT)
@@ -2150,11 +2249,32 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
     CPUArchIdList *cpus = ms->possible_cpus;
-    int64_t socket_id = -1, cluster_id = -1, core_id = -1;
-    uint32_t socket_offset = 0, cluster_offset = 0, core_offset = 0;
+    uint32_t core_data_offset = 0;
+    uint32_t core_instr_offset = 0;
+    uint32_t cluster_instr_offset = 0;
+    uint32_t cluster_data_offset = 0;
+    uint32_t node_data_offset = 0;
+    uint32_t node_instr_offset = 0;
+    int top_node = 3;
+    int top_cluster = 3;
+    int top_core = 3;
+    int bottom_node = 3;
+    int bottom_cluster = 3;
+    int bottom_core = 3;
+    int64_t socket_id = -1;
+    int64_t cluster_id = -1;
+    int64_t core_id = -1;
+    uint32_t socket_offset = 0;
+    uint32_t cluster_offset = 0;
+    uint32_t core_offset = 0;
     uint32_t pptt_start = table_data->len;
     uint32_t root_offset;
     int n;
+    uint32_t priv_rsrc[2];
+    uint32_t num_priv = 0;
+    bool cache_available;
+    bool llevel;
+
     AcpiTable table = { .sig = "PPTT", .rev = 2,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
 
@@ -2184,11 +2304,30 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
             socket_id = cpus->cpus[n].props.socket_id;
             cluster_id = -1;
             core_id = -1;
+            bottom_node = top_node;
+            num_priv = 0;
+            cache_available = machine_defines_cache_at_topo_level(
+                ms, CPU_TOPOLOGY_LEVEL_SOCKET);
+            llevel = machine_find_lowest_level_cache_at_topo_level(
+                ms, &bottom_node, CPU_TOPOLOGY_LEVEL_SOCKET);
+            if (cache_available && llevel) {
+                build_caches(table_data, pptt_start, num_caches, caches,
+                             top_node, bottom_node, &node_data_offset,
+                             &node_instr_offset);
+                priv_rsrc[0] = node_instr_offset;
+                priv_rsrc[1] = node_data_offset;
+                if (node_instr_offset || node_data_offset) {
+                    num_priv = node_instr_offset == node_data_offset ? 1 : 2;
+                }
+
+                top_cluster = bottom_node - 1;
+            }
+
             socket_offset = table_data->len - pptt_start;
             build_processor_hierarchy_node(table_data,
                 (1 << 0) | /* Physical package */
                 (1 << 4), /* Identical Implementation */
-                root_offset, socket_id, NULL, 0);
+                root_offset, socket_id, priv_rsrc, num_priv);
         }
 
         if (mc->smp_props.clusters_supported && mc->smp_props.has_clusters) {
@@ -2196,21 +2335,68 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 assert(cpus->cpus[n].props.cluster_id > cluster_id);
                 cluster_id = cpus->cpus[n].props.cluster_id;
                 core_id = -1;
+                bottom_cluster = top_cluster;
+                num_priv = 0;
+                cache_available = machine_defines_cache_at_topo_level(
+                    ms, CPU_TOPOLOGY_LEVEL_CLUSTER);
+                llevel = machine_find_lowest_level_cache_at_topo_level(
+                    ms, &bottom_cluster, CPU_TOPOLOGY_LEVEL_CLUSTER);
+
+                if (cache_available && llevel) {
+                    build_caches(table_data, pptt_start, num_caches, caches,
+                                 top_cluster, bottom_cluster,
+                                 &cluster_data_offset, &cluster_instr_offset);
+                    priv_rsrc[0] = cluster_instr_offset;
+                    priv_rsrc[1] = cluster_data_offset;
+                    if (cluster_instr_offset || cluster_data_offset) {
+                        num_priv =
+                            cluster_instr_offset == cluster_data_offset ? 1 : 2;
+                    }
+                    top_core = bottom_cluster - 1;
+                } else if (top_cluster == bottom_node - 1) {
+                    /* socket cache but no cluster cache */
+                    top_core = bottom_node - 1;
+                }
+
                 cluster_offset = table_data->len - pptt_start;
                 build_processor_hierarchy_node(table_data,
                     (0 << 0) | /* Not a physical package */
                     (1 << 4), /* Identical Implementation */
-                    socket_offset, cluster_id, NULL, 0);
+                    socket_offset, cluster_id, priv_rsrc, num_priv);
             }
         } else {
+            if (machine_defines_cache_at_topo_level(
+                    ms, CPU_TOPOLOGY_LEVEL_CLUSTER)) {
+                error_setg(&error_fatal, "Not clusters found for the cache");
+                return;
+            }
+
             cluster_offset = socket_offset;
+            top_core = bottom_node - 1; /* there is no cluster */
+        }
+
+        if (cpus->cpus[n].props.core_id != core_id) {
+            bottom_core = top_core;
+            num_priv = 0;
+            cache_available = machine_defines_cache_at_topo_level(
+                ms, CPU_TOPOLOGY_LEVEL_CORE);
+            llevel = machine_find_lowest_level_cache_at_topo_level(
+                ms, &bottom_core, CPU_TOPOLOGY_LEVEL_CORE);
+            if (cache_available && llevel) {
+                build_caches(table_data, pptt_start, num_caches, caches,
+                             top_core, bottom_core, &core_data_offset,
+                             &core_instr_offset);
+                priv_rsrc[0] = core_instr_offset;
+                priv_rsrc[1] = core_data_offset;
+                num_priv = core_instr_offset == core_data_offset ? 1 : 2;
+            }
         }
 
         if (ms->smp.threads == 1) {
             build_processor_hierarchy_node(table_data,
                 (1 << 1) | /* ACPI Processor ID valid */
-                (1 << 3),  /* Node is a Leaf */
-                cluster_offset, n, NULL, 0);
+                (1 << 3), /* Node is a Leaf */
+                cluster_offset, n, priv_rsrc, num_priv);
         } else {
             if (cpus->cpus[n].props.core_id != core_id) {
                 assert(cpus->cpus[n].props.core_id > core_id);
@@ -2219,7 +2405,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 build_processor_hierarchy_node(table_data,
                     (0 << 0) | /* Not a physical package */
                     (1 << 4), /* Identical Implementation */
-                    cluster_offset, core_id, NULL, 0);
+                    cluster_offset, core_id, priv_rsrc, num_priv);
             }
 
             build_processor_hierarchy_node(table_data,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a6115f2f80..5fca69fcb2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -1022,6 +1022,10 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     unsigned dsdt, xsdt;
     GArray *tables_blob = tables->table_data;
     MachineState *ms = MACHINE(vms);
+    CPUCoreCaches caches[CPU_MAX_CACHES];
+    unsigned int num_caches;
+
+    num_caches = virt_get_caches(vms, caches);
 
     table_offsets = g_array_new(false, true /* clear */,
                                         sizeof(uint32_t));
@@ -1043,8 +1047,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 
     if (!vmc->no_cpu_topology) {
         acpi_add_table(table_offsets, tables_blob);
-        build_pptt(tables_blob, tables->linker, ms,
-                   vms->oem_id, vms->oem_table_id, 0, NULL);
+        build_pptt(tables_blob, tables->linker, ms, vms->oem_id,
+                   vms->oem_table_id, num_caches, caches);
     }
 
     acpi_add_table(table_offsets, tables_blob);
diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 32654dc274..a4027a2a76 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -70,6 +70,16 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
 void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list);
 
+struct CPUPPTTCaches {
+    enum CacheType type;
+    uint32_t sets;
+    uint32_t size;
+    uint32_t level;
+    uint16_t linesize;
+    uint8_t attributes; /* write policy: 0x0 write back, 0x1 write through */
+    uint8_t associativity;
+};
+
 extern const VMStateDescription vmstate_cpu_hotplug;
 #define VMSTATE_CPU_HOTPLUG(cpuhp, state) \
     VMSTATE_STRUCT(cpuhp, state, 1, \
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 7/8] tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (5 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-08-27 14:21 ` [PATCH v16 8/8] Update the ACPI tables based on new aml-build.c Alireza Sanaee via
  2025-10-20 13:51 ` [PATCH v16 0/8] Specifying cache topology on ARM Gustavo Romero
  8 siblings, 0 replies; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

Test new PPTT topolopy with cache representation.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 tests/qtest/bios-tables-test.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index e7e6926c81..f3f870d34c 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -2228,6 +2228,10 @@ static void test_acpi_aarch64_virt_tcg_topology(void)
     };
 
     test_acpi_one("-cpu cortex-a57 "
+                  "-M virt,smp-cache.0.cache=l1i,smp-cache.0.topology=cluster,"
+                  "smp-cache.1.cache=l1d,smp-cache.1.topology=cluster,"
+                  "smp-cache.2.cache=l2,smp-cache.2.topology=cluster,"
+                  "smp-cache.3.cache=l3,smp-cache.3.topology=cluster "
                   "-smp sockets=1,clusters=2,cores=2,threads=2", &data);
     free_test_data(&data);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v16 8/8] Update the ACPI tables based on new aml-build.c
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (6 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 7/8] tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology Alireza Sanaee via
@ 2025-08-27 14:21 ` Alireza Sanaee via
  2025-10-20 13:51 ` [PATCH v16 0/8] Specifying cache topology on ARM Gustavo Romero
  8 siblings, 0 replies; 14+ messages in thread
From: Alireza Sanaee via @ 2025-08-27 14:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	gustavo.romero, imammedo, jiangkunkun, jonathan.cameron, linuxarm,
	maobibo, mst, mtosatti, peter.maydell, philmd, qemu-arm,
	richard.henderson, shannon.zhaosl, yangyicong, zhao1.liu

The disassembled differences between actual and expected PPTT based on
the following cache topology representation:

- l1d and l1i shared at cluster level
- l2 shared at cluster level
- l3 shared at cluster level

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20230628 (64-bit version)
  * Copyright (c) 2000 - 2023 Intel Corporation
  *
- * Disassembly of tests/data/acpi/aarch64/virt/PPTT.topology, Fri Aug  8 16:50:38 2025
+ * Disassembly of /tmp/aml-JGBZA3, Fri Aug  8 16:50:38 2025
  *
  * ACPI Data Table [PPTT]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue (in hex)
  */

 [000h 0000 004h]                   Signature : "PPTT"    [Processor Properties Topology Table]
-[004h 0004 004h]                Table Length : 00000164
+[004h 0004 004h]                Table Length : 00000204
 [008h 0008 001h]                    Revision : 02
-[009h 0009 001h]                    Checksum : 97
+[009h 0009 001h]                    Checksum : B8
 [00Ah 0010 006h]                      Oem ID : "BOCHS "
 [010h 0016 008h]                Oem Table ID : "BXPC    "
 [018h 0024 004h]                Oem Revision : 00000001
 [01Ch 0028 004h]             Asl Compiler ID : "BXPC"
 [020h 0032 004h]       Asl Compiler Revision : 00000001

 [024h 0036 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [025h 0037 001h]                      Length : 14
 [026h 0038 002h]                    Reserved : 0000
 [028h 0040 004h]       Flags (decoded below) : 00000011
                             Physical package : 1
                      ACPI Processor ID valid : 0
                        Processor is a thread : 0
                               Node is a leaf : 0
                     Identical Implementation : 1
@@ -34,223 +34,369 @@
 [030h 0048 004h]           ACPI Processor ID : 00000000
 [034h 0052 004h]     Private Resource Number : 00000000

 [038h 0056 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [039h 0057 001h]                      Length : 14
 [03Ah 0058 002h]                    Reserved : 0000
 [03Ch 0060 004h]       Flags (decoded below) : 00000011
                             Physical package : 1
                      ACPI Processor ID valid : 0
                        Processor is a thread : 0
                               Node is a leaf : 0
                     Identical Implementation : 1
 [040h 0064 004h]                      Parent : 00000024
 [044h 0068 004h]           ACPI Processor ID : 00000000
 [048h 0072 004h]     Private Resource Number : 00000000

-[04Ch 0076 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[04Dh 0077 001h]                      Length : 14
+[04Ch 0076 001h]               Subtable Type : 01 [Cache Type]
+[04Dh 0077 001h]                      Length : 18
 [04Eh 0078 002h]                    Reserved : 0000
-[050h 0080 004h]       Flags (decoded below) : 00000010
+[050h 0080 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[054h 0084 004h]         Next Level of Cache : 00000000
+[058h 0088 004h]                        Size : 00200000
+[05Ch 0092 004h]              Number of Sets : 00000800
+[060h 0096 001h]               Associativity : 10
+[061h 0097 001h]                  Attributes : 0F
+                             Allocation Type : 3
+                                  Cache Type : 3
+                                Write Policy : 0
+[062h 0098 002h]                   Line Size : 0040
+
+[064h 0100 001h]               Subtable Type : 01 [Cache Type]
+[065h 0101 001h]                      Length : 18
+[066h 0102 002h]                    Reserved : 0000
+[068h 0104 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[06Ch 0108 004h]         Next Level of Cache : 0000004C
+[070h 0112 004h]                        Size : 00008000
+[074h 0116 004h]              Number of Sets : 00000080
+[078h 0120 001h]               Associativity : 04
+[079h 0121 001h]                  Attributes : 03
+                             Allocation Type : 3
+                                  Cache Type : 0
+                                Write Policy : 0
+[07Ah 0122 002h]                   Line Size : 0040
+
+[07Ch 0124 001h]               Subtable Type : 01 [Cache Type]
+[07Dh 0125 001h]                      Length : 18
+[07Eh 0126 002h]                    Reserved : 0000
+[080h 0128 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[084h 0132 004h]         Next Level of Cache : 0000004C
+[088h 0136 004h]                        Size : 0000C000
+[08Ch 0140 004h]              Number of Sets : 00000100
+[090h 0144 001h]               Associativity : 03
+[091h 0145 001h]                  Attributes : 07
+                             Allocation Type : 3
+                                  Cache Type : 1
+                                Write Policy : 0
+[092h 0146 002h]                   Line Size : 0040
+
+[094h 0148 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[095h 0149 001h]                      Length : 1C
+[096h 0150 002h]                    Reserved : 0000
+[098h 0152 004h]       Flags (decoded below) : 00000010
                             Physical package : 0
                      ACPI Processor ID valid : 0
                        Processor is a thread : 0
                               Node is a leaf : 0
                     Identical Implementation : 1
-[054h 0084 004h]                      Parent : 00000038
-[058h 0088 004h]           ACPI Processor ID : 00000000
-[05Ch 0092 004h]     Private Resource Number : 00000000
-
-[060h 0096 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[061h 0097 001h]                      Length : 14
-[062h 0098 002h]                    Reserved : 0000
-[064h 0100 004h]       Flags (decoded below) : 00000010
-                            Physical package : 0
-                     ACPI Processor ID valid : 0
-                       Processor is a thread : 0
-                              Node is a leaf : 0
-                    Identical Implementation : 1
-[068h 0104 004h]                      Parent : 0000004C
-[06Ch 0108 004h]           ACPI Processor ID : 00000000
-[070h 0112 004h]     Private Resource Number : 00000000
-
-[074h 0116 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[075h 0117 001h]                      Length : 14
-[076h 0118 002h]                    Reserved : 0000
-[078h 0120 004h]       Flags (decoded below) : 0000000E
-                            Physical package : 0
-                     ACPI Processor ID valid : 1
-                       Processor is a thread : 1
-                              Node is a leaf : 1
-                    Identical Implementation : 0
-[07Ch 0124 004h]                      Parent : 00000060
-[080h 0128 004h]           ACPI Processor ID : 00000000
-[084h 0132 004h]     Private Resource Number : 00000000
-
-[088h 0136 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[089h 0137 001h]                      Length : 14
-[08Ah 0138 002h]                    Reserved : 0000
-[08Ch 0140 004h]       Flags (decoded below) : 0000000E
-                            Physical package : 0
-                     ACPI Processor ID valid : 1
-                       Processor is a thread : 1
-                              Node is a leaf : 1
-                    Identical Implementation : 0
-[090h 0144 004h]                      Parent : 00000060
-[094h 0148 004h]           ACPI Processor ID : 00000001
-[098h 0152 004h]     Private Resource Number : 00000000
-
-[09Ch 0156 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[09Dh 0157 001h]                      Length : 14
-[09Eh 0158 002h]                    Reserved : 0000
-[0A0h 0160 004h]       Flags (decoded below) : 00000010
-                            Physical package : 0
-                     ACPI Processor ID valid : 0
-                       Processor is a thread : 0
-                              Node is a leaf : 0
-                    Identical Implementation : 1
-[0A4h 0164 004h]                      Parent : 0000004C
-[0A8h 0168 004h]           ACPI Processor ID : 00000001
-[0ACh 0172 004h]     Private Resource Number : 00000000
+[09Ch 0156 004h]                      Parent : 00000038
+[0A0h 0160 004h]           ACPI Processor ID : 00000000
+[0A4h 0164 004h]     Private Resource Number : 00000002
+[0A8h 0168 004h]            Private Resource : 0000007C
+[0ACh 0172 004h]            Private Resource : 00000064

 [0B0h 0176 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [0B1h 0177 001h]                      Length : 14
 [0B2h 0178 002h]                    Reserved : 0000
-[0B4h 0180 004h]       Flags (decoded below) : 0000000E
+[0B4h 0180 004h]       Flags (decoded below) : 00000010
                             Physical package : 0
-                     ACPI Processor ID valid : 1
-                       Processor is a thread : 1
-                              Node is a leaf : 1
-                    Identical Implementation : 0
-[0B8h 0184 004h]                      Parent : 0000009C
-[0BCh 0188 004h]           ACPI Processor ID : 00000002
+                     ACPI Processor ID valid : 0
+                       Processor is a thread : 0
+                              Node is a leaf : 0
+                    Identical Implementation : 1
+[0B8h 0184 004h]                      Parent : 00000094
+[0BCh 0188 004h]           ACPI Processor ID : 00000000
 [0C0h 0192 004h]     Private Resource Number : 00000000

 [0C4h 0196 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [0C5h 0197 001h]                      Length : 14
 [0C6h 0198 002h]                    Reserved : 0000
 [0C8h 0200 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
                      ACPI Processor ID valid : 1
                        Processor is a thread : 1
                               Node is a leaf : 1
                     Identical Implementation : 0
-[0CCh 0204 004h]                      Parent : 0000009C
-[0D0h 0208 004h]           ACPI Processor ID : 00000003
+[0CCh 0204 004h]                      Parent : 000000B0
+[0D0h 0208 004h]           ACPI Processor ID : 00000000
 [0D4h 0212 004h]     Private Resource Number : 00000000

 [0D8h 0216 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [0D9h 0217 001h]                      Length : 14
 [0DAh 0218 002h]                    Reserved : 0000
-[0DCh 0220 004h]       Flags (decoded below) : 00000010
+[0DCh 0220 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
-                     ACPI Processor ID valid : 0
-                       Processor is a thread : 0
-                              Node is a leaf : 0
-                    Identical Implementation : 1
-[0E0h 0224 004h]                      Parent : 00000038
+                     ACPI Processor ID valid : 1
+                       Processor is a thread : 1
+                              Node is a leaf : 1
+                    Identical Implementation : 0
+[0E0h 0224 004h]                      Parent : 000000B0
 [0E4h 0228 004h]           ACPI Processor ID : 00000001
 [0E8h 0232 004h]     Private Resource Number : 00000000

 [0ECh 0236 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [0EDh 0237 001h]                      Length : 14
 [0EEh 0238 002h]                    Reserved : 0000
 [0F0h 0240 004h]       Flags (decoded below) : 00000010
                             Physical package : 0
                      ACPI Processor ID valid : 0
                        Processor is a thread : 0
                               Node is a leaf : 0
                     Identical Implementation : 1
-[0F4h 0244 004h]                      Parent : 000000D8
-[0F8h 0248 004h]           ACPI Processor ID : 00000000
+[0F4h 0244 004h]                      Parent : 00000094
+[0F8h 0248 004h]           ACPI Processor ID : 00000001
 [0FCh 0252 004h]     Private Resource Number : 00000000

 [100h 0256 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [101h 0257 001h]                      Length : 14
 [102h 0258 002h]                    Reserved : 0000
 [104h 0260 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
                      ACPI Processor ID valid : 1
                        Processor is a thread : 1
                               Node is a leaf : 1
                     Identical Implementation : 0
 [108h 0264 004h]                      Parent : 000000EC
-[10Ch 0268 004h]           ACPI Processor ID : 00000004
+[10Ch 0268 004h]           ACPI Processor ID : 00000002
 [110h 0272 004h]     Private Resource Number : 00000000

 [114h 0276 001h]               Subtable Type : 00 [Processor Hierarchy Node]
 [115h 0277 001h]                      Length : 14
 [116h 0278 002h]                    Reserved : 0000
 [118h 0280 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
                      ACPI Processor ID valid : 1
                        Processor is a thread : 1
                               Node is a leaf : 1
                     Identical Implementation : 0
 [11Ch 0284 004h]                      Parent : 000000EC
-[120h 0288 004h]           ACPI Processor ID : 00000005
+[120h 0288 004h]           ACPI Processor ID : 00000003
 [124h 0292 004h]     Private Resource Number : 00000000

-[128h 0296 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[129h 0297 001h]                      Length : 14
+[128h 0296 001h]               Subtable Type : 01 [Cache Type]
+[129h 0297 001h]                      Length : 18
 [12Ah 0298 002h]                    Reserved : 0000
-[12Ch 0300 004h]       Flags (decoded below) : 00000010
+[12Ch 0300 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[130h 0304 004h]         Next Level of Cache : 00000000
+[134h 0308 004h]                        Size : 00200000
+[138h 0312 004h]              Number of Sets : 00000800
+[13Ch 0316 001h]               Associativity : 10
+[13Dh 0317 001h]                  Attributes : 0F
+                             Allocation Type : 3
+                                  Cache Type : 3
+                                Write Policy : 0
+[13Eh 0318 002h]                   Line Size : 0040
+
+[140h 0320 001h]               Subtable Type : 01 [Cache Type]
+[141h 0321 001h]                      Length : 18
+[142h 0322 002h]                    Reserved : 0000
+[144h 0324 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[148h 0328 004h]         Next Level of Cache : 00000128
+[14Ch 0332 004h]                        Size : 00008000
+[150h 0336 004h]              Number of Sets : 00000080
+[154h 0340 001h]               Associativity : 04
+[155h 0341 001h]                  Attributes : 03
+                             Allocation Type : 3
+                                  Cache Type : 0
+                                Write Policy : 0
+[156h 0342 002h]                   Line Size : 0040
+
+[158h 0344 001h]               Subtable Type : 01 [Cache Type]
+[159h 0345 001h]                      Length : 18
+[15Ah 0346 002h]                    Reserved : 0000
+[15Ch 0348 004h]       Flags (decoded below) : 0000007F
+                                  Size valid : 1
+                        Number of Sets valid : 1
+                         Associativity valid : 1
+                       Allocation Type valid : 1
+                            Cache Type valid : 1
+                          Write Policy valid : 1
+                             Line Size valid : 1
+                              Cache ID valid : 0
+[160h 0352 004h]         Next Level of Cache : 00000128
+[164h 0356 004h]                        Size : 0000C000
+[168h 0360 004h]              Number of Sets : 00000100
+[16Ch 0364 001h]               Associativity : 03
+[16Dh 0365 001h]                  Attributes : 07
+                             Allocation Type : 3
+                                  Cache Type : 1
+                                Write Policy : 0
+[16Eh 0366 002h]                   Line Size : 0040
+
+[170h 0368 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[171h 0369 001h]                      Length : 1C
+[172h 0370 002h]                    Reserved : 0000
+[174h 0372 004h]       Flags (decoded below) : 00000010
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+                       Processor is a thread : 0
+                              Node is a leaf : 0
+                    Identical Implementation : 1
+[178h 0376 004h]                      Parent : 00000038
+[17Ch 0380 004h]           ACPI Processor ID : 00000001
+[180h 0384 004h]     Private Resource Number : 00000002
+[184h 0388 004h]            Private Resource : 00000158
+[188h 0392 004h]            Private Resource : 00000140
+
+[18Ch 0396 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[18Dh 0397 001h]                      Length : 14
+[18Eh 0398 002h]                    Reserved : 0000
+[190h 0400 004h]       Flags (decoded below) : 00000010
+                            Physical package : 0
+                     ACPI Processor ID valid : 0
+                       Processor is a thread : 0
+                              Node is a leaf : 0
+                    Identical Implementation : 1
+[194h 0404 004h]                      Parent : 00000170
+[198h 0408 004h]           ACPI Processor ID : 00000000
+[19Ch 0412 004h]     Private Resource Number : 00000000
+
+[1A0h 0416 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[1A1h 0417 001h]                      Length : 14
+[1A2h 0418 002h]                    Reserved : 0000
+[1A4h 0420 004h]       Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+                       Processor is a thread : 1
+                              Node is a leaf : 1
+                    Identical Implementation : 0
+[1A8h 0424 004h]                      Parent : 0000018C
+[1ACh 0428 004h]           ACPI Processor ID : 00000004
+[1B0h 0432 004h]     Private Resource Number : 00000000
+
+[1B4h 0436 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[1B5h 0437 001h]                      Length : 14
+[1B6h 0438 002h]                    Reserved : 0000
+[1B8h 0440 004h]       Flags (decoded below) : 0000000E
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+                       Processor is a thread : 1
+                              Node is a leaf : 1
+                    Identical Implementation : 0
+[1BCh 0444 004h]                      Parent : 0000018C
+[1C0h 0448 004h]           ACPI Processor ID : 00000005
+[1C4h 0452 004h]     Private Resource Number : 00000000
+
+[1C8h 0456 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[1C9h 0457 001h]                      Length : 14
+[1CAh 0458 002h]                    Reserved : 0000
+[1CCh 0460 004h]       Flags (decoded below) : 00000010
                             Physical package : 0
                      ACPI Processor ID valid : 0
                        Processor is a thread : 0
                               Node is a leaf : 0
                     Identical Implementation : 1
-[130h 0304 004h]                      Parent : 000000D8
-[134h 0308 004h]           ACPI Processor ID : 00000001
-[138h 0312 004h]     Private Resource Number : 00000000
-
-[13Ch 0316 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[13Dh 0317 001h]                      Length : 14
-[13Eh 0318 002h]                    Reserved : 0000
-[140h 0320 004h]       Flags (decoded below) : 0000000E
+[1D0h 0464 004h]                      Parent : 00000170
+[1D4h 0468 004h]           ACPI Processor ID : 00000001
+[1D8h 0472 004h]     Private Resource Number : 00000000
+
+[1DCh 0476 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[1DDh 0477 001h]                      Length : 14
+[1DEh 0478 002h]                    Reserved : 0000
+[1E0h 0480 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
                      ACPI Processor ID valid : 1
                        Processor is a thread : 1
                               Node is a leaf : 1
                     Identical Implementation : 0
-[144h 0324 004h]                      Parent : 00000128
-[148h 0328 004h]           ACPI Processor ID : 00000006
-[14Ch 0332 004h]     Private Resource Number : 00000000
-
-[150h 0336 001h]               Subtable Type : 00 [Processor Hierarchy Node]
-[151h 0337 001h]                      Length : 14
-[152h 0338 002h]                    Reserved : 0000
-[154h 0340 004h]       Flags (decoded below) : 0000000E
+[1E4h 0484 004h]                      Parent : 000001C8
+[1E8h 0488 004h]           ACPI Processor ID : 00000006
+[1ECh 0492 004h]     Private Resource Number : 00000000
+
+[1F0h 0496 001h]               Subtable Type : 00 [Processor Hierarchy Node]
+[1F1h 0497 001h]                      Length : 14
+[1F2h 0498 002h]                    Reserved : 0000
+[1F4h 0500 004h]       Flags (decoded below) : 0000000E
                             Physical package : 0
                      ACPI Processor ID valid : 1
                        Processor is a thread : 1
                               Node is a leaf : 1
                     Identical Implementation : 0
-[158h 0344 004h]                      Parent : 00000128
-[15Ch 0348 004h]           ACPI Processor ID : 00000007
-[160h 0352 004h]     Private Resource Number : 00000000
+[1F8h 0504 004h]                      Parent : 000001C8
+[1FCh 0508 004h]           ACPI Processor ID : 00000007
+[200h 0512 004h]     Private Resource Number : 00000000

-Raw Table Data: Length 356 (0x164)
+Raw Table Data: Length 516 (0x204)

-    0000: 50 50 54 54 64 01 00 00 02 97 42 4F 43 48 53 20  // PPTTd.....BOCHS
+    0000: 50 50 54 54 04 02 00 00 02 B8 42 4F 43 48 53 20  // PPTT......BOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
     0020: 01 00 00 00 00 14 00 00 11 00 00 00 00 00 00 00  // ................
     0030: 00 00 00 00 00 00 00 00 00 14 00 00 11 00 00 00  // ................
-    0040: 24 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // $...............
-    0050: 10 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00  // ....8...........
-    0060: 00 14 00 00 10 00 00 00 4C 00 00 00 00 00 00 00  // ........L.......
-    0070: 00 00 00 00 00 14 00 00 0E 00 00 00 60 00 00 00  // ............`...
-    0080: 00 00 00 00 00 00 00 00 00 14 00 00 0E 00 00 00  // ................
-    0090: 60 00 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // `...............
-    00A0: 10 00 00 00 4C 00 00 00 01 00 00 00 00 00 00 00  // ....L...........
-    00B0: 00 14 00 00 0E 00 00 00 9C 00 00 00 02 00 00 00  // ................
-    00C0: 00 00 00 00 00 14 00 00 0E 00 00 00 9C 00 00 00  // ................
-    00D0: 03 00 00 00 00 00 00 00 00 14 00 00 10 00 00 00  // ................
-    00E0: 38 00 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // 8...............
-    00F0: 10 00 00 00 D8 00 00 00 00 00 00 00 00 00 00 00  // ................
-    0100: 00 14 00 00 0E 00 00 00 EC 00 00 00 04 00 00 00  // ................
+    0040: 24 00 00 00 00 00 00 00 00 00 00 00 01 18 00 00  // $...............
+    0050: 7F 00 00 00 00 00 00 00 00 00 20 00 00 08 00 00  // .......... .....
+    0060: 10 0F 40 00 01 18 00 00 7F 00 00 00 4C 00 00 00  // ..@.........L...
+    0070: 00 80 00 00 80 00 00 00 04 03 40 00 01 18 00 00  // ..........@.....
+    0080: 7F 00 00 00 4C 00 00 00 00 C0 00 00 00 01 00 00  // ....L...........
+    0090: 03 07 40 00 00 1C 00 00 10 00 00 00 38 00 00 00  // ..@.........8...
+    00A0: 00 00 00 00 02 00 00 00 7C 00 00 00 64 00 00 00  // ........|...d...
+    00B0: 00 14 00 00 10 00 00 00 94 00 00 00 00 00 00 00  // ................
+    00C0: 00 00 00 00 00 14 00 00 0E 00 00 00 B0 00 00 00  // ................
+    00D0: 00 00 00 00 00 00 00 00 00 14 00 00 0E 00 00 00  // ................
+    00E0: B0 00 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // ................
+    00F0: 10 00 00 00 94 00 00 00 01 00 00 00 00 00 00 00  // ................
+    0100: 00 14 00 00 0E 00 00 00 EC 00 00 00 02 00 00 00  // ................
     0110: 00 00 00 00 00 14 00 00 0E 00 00 00 EC 00 00 00  // ................
-    0120: 05 00 00 00 00 00 00 00 00 14 00 00 10 00 00 00  // ................
-    0130: D8 00 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // ................
-    0140: 0E 00 00 00 28 01 00 00 06 00 00 00 00 00 00 00  // ....(...........
-    0150: 00 14 00 00 0E 00 00 00 28 01 00 00 07 00 00 00  // ........(.......
-    0160: 00 00 00 00                                      // ....
+    0120: 03 00 00 00 00 00 00 00 01 18 00 00 7F 00 00 00  // ................
+    0130: 00 00 00 00 00 00 20 00 00 08 00 00 10 0F 40 00  // ...... .......@.
+    0140: 01 18 00 00 7F 00 00 00 28 01 00 00 00 80 00 00  // ........(.......
+    0150: 80 00 00 00 04 03 40 00 01 18 00 00 7F 00 00 00  // ......@.........
+    0160: 28 01 00 00 00 C0 00 00 00 01 00 00 03 07 40 00  // (.............@.
+    0170: 00 1C 00 00 10 00 00 00 38 00 00 00 01 00 00 00  // ........8.......
+    0180: 02 00 00 00 58 01 00 00 40 01 00 00 00 14 00 00  // ....X...@.......
+    0190: 10 00 00 00 70 01 00 00 00 00 00 00 00 00 00 00  // ....p...........
+    01A0: 00 14 00 00 0E 00 00 00 8C 01 00 00 04 00 00 00  // ................
+    01B0: 00 00 00 00 00 14 00 00 0E 00 00 00 8C 01 00 00  // ................
+    01C0: 05 00 00 00 00 00 00 00 00 14 00 00 10 00 00 00  // ................
+    01D0: 70 01 00 00 01 00 00 00 00 00 00 00 00 14 00 00  // p...............
+    01E0: 0E 00 00 00 C8 01 00 00 06 00 00 00 00 00 00 00  // ................
+    01F0: 00 14 00 00 0E 00 00 00 C8 01 00 00 07 00 00 00  // ................
+    0200: 00 00 00 00                                      // ....

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
 tests/data/acpi/aarch64/virt/PPTT.topology  | Bin 356 -> 516 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   3 ---
 2 files changed, 3 deletions(-)

diff --git a/tests/data/acpi/aarch64/virt/PPTT.topology b/tests/data/acpi/aarch64/virt/PPTT.topology
index 6b864f035c9f48845e9a3beb482c5171074864a5..4f9472c5f728f3068d1054d5042b85190bdb88da 100644
GIT binary patch
literal 516
zcmZvXy$!-Z4255QAXNNF6ciL!P%r{zlr$7bL?T57U;qX{A_Gt|2qk4ohG7Wa3wP0p
z#END6^S#(Ein5GDAbe%Ve19@oRpf>i08p-oC9qKR&9aThf)#M<Y6DDw`7DLw2leXq
zLmd6_hCL38k`!1|$8txPaXnn=XBC{Q-b1-FvMKYYs}()g-e8&2`b^pnU2|HqTCvC?
zcf+qVz1z0>Vcoy2<qdlSw@IRz6_Zp2=W4%;a%XmzJ6SxyMjmt8PHwetg0c5b_lhN!
FeE|+Z9RUCU

literal 356
zcmWFt2nk7HWME*L?&R<65v<@85#X!<1VAAM5F11@h%hh+f@ov_6;nYI69Dopu!#Af
ziSYsX2{^>Sc7o)9c7V(S=|vU;>74__Oh60<Ky@%NW+X9~TafjF#BRXUfM}@RH$Wx}
cOdLs!6-f-H7uh_Jy&6CPHY9a0F?OgJ00?&w0RR91

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index e84d6c6955..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,4 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/aarch64/virt/PPTT",
-"tests/data/acpi/aarch64/virt/PPTT.acpihmatvirt",
-"tests/data/acpi/aarch64/virt/PPTT.topology",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v16 0/8] Specifying cache topology on ARM
  2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
                   ` (7 preceding siblings ...)
  2025-08-27 14:21 ` [PATCH v16 8/8] Update the ACPI tables based on new aml-build.c Alireza Sanaee via
@ 2025-10-20 13:51 ` Gustavo Romero
  8 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2025-10-20 13:51 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	imammedo, jiangkunkun, jonathan.cameron, linuxarm, maobibo, mst,
	mtosatti, peter.maydell, philmd, qemu-arm, richard.henderson,
	shannon.zhaosl, yangyicong, zhao1.liu

Hi Alireza,

I've started to review your series and have some questions/comments on it,
please see them inline in the patches.


On 8/27/25 11:21, Alireza Sanaee wrote:
> Specifying the cache layout in virtual machines is useful for
> applications and operating systems to fetch accurate information about
> the cache structure and make appropriate adjustments. Enforcing correct

Since this series applies only to TCG and all cache management
instruction are modeled as NOPs I'm wondering what would be a use case
for being able to specify the cache layout. I'm not saying I'm against it,
specially because it's for the virt machine and we have sort of freedom to
"customize" it in general.


Cheers,
Gustavo

> sharing information can lead to better optimizations. Patches that allow
> for an interface to express caches was landed in the prior cycles. This
> patchset uses the interface as a foundation.  Thus, the device tree and
> ACPI/PPTT table, and device tree are populated based on
> user-provided information and CPU topology.
> 
> Example:
> 
> +----------------+                         +----------------+
> |    Socket 0    |                         |    Socket 1    |
> |    (L3 Cache)  |                         |    (L3 Cache)  |
> +--------+-------+                         +--------+-------+
>           |                                          |
> +--------+--------+                        +--------+--------+
> |   Cluster 0     |                        |   Cluster 0     |
> |   (L2 Cache)    |                        |   (L2 Cache)    |
> +--------+--------+                        +--------+--------+
>           |                                          |
> +--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
> |   Core 0         | |   Core 1        |   |   Core 0        |  |   Core 1    |
> |   (L1i, L1d)     | |   (L1i, L1d)    |   |   (L1i, L1d)    |  |   (L1i, L1d)|
> +--------+--------+  +--------+--------+   +--------+--------+  +--------+----+
>           |                    |                     |                    |
> +--------+              +--------+             +--------+           +--------+
> |Thread 0|              |Thread 1|             |Thread 1|           |Thread 0|
> +--------+              +--------+             +--------+           +--------+
> |Thread 1|              |Thread 0|             |Thread 0|           |Thread 1|
> +--------+              +--------+             +--------+           +--------+
> 
> 
> The following command will represent the system relying on **ACPI PPTT tables**.
> 
> ./qemu-system-aarch64 \
>   -machine virt,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
>   -cpu max \
>   -m 2048 \
>   -smp sockets=2,clusters=1,cores=2,threads=2 \
>   -kernel ./Image.gz \
>   -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
>   -initrd rootfs.cpio.gz \
>   -bios ./edk2-aarch64-code.fd \
>   -nographic
> 
> The following command will represent the system relying on **the device tree**.
> 
> ./qemu-system-aarch64 \
>   -machine virt,acpi=off,smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=cluster,smp-cache.3.cache=l3,smp-cache.3.topology=socket \
>   -cpu max \
>   -m 2048 \
>   -smp sockets=2,clusters=1,cores=2,threads=2 \
>   -kernel ./Image.gz \
>   -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=off" \
>   -initrd rootfs.cpio.gz \
>   -nographic
> 
> Failure cases:
>      1) There are scenarios where caches exist in systems' registers but
>      left unspecified by users. In this case qemu returns failure.
> 
>      2) SMT threads cannot share caches which is not very common. More
>      discussions here [1].
> 
> Currently only three levels of caches are supported to be specified from
> the command line. However, increasing the value does not require
> significant changes. Further, this patch assumes l2 and l3 unified
> caches and does not allow l(2/3)(i/d). The level terminology is
> thread/core/cluster/socket. Hierarchy assumed in this patch:
> Socket level = Cluster level + 1 = Core level + 2 = Thread level + 3;
> 
> Possible future enhancements:
>    1) Separated data and instruction cache at L2 and L3.
>    2) Additional cache controls.  e.g. size of L3 may not want to just
>    match the underlying system, because only some of the associated host
>    CPUs may be bound to this VM.
> 
> [1] https://lore.kernel.org/devicetree-spec/20250203120527.3534-1-alireza.sanaee@huawei.com/
> 
> Change Log:
>    v15->v16:
>      * Rebase to e771ba98de25c9f43959f79fc7099cf7fbba44cc (Open 10.2 development tree)
>      * v15: https://lore.kernel.org/qemu-devel/20250812122829.204-1-alireza.sanaee@huawei.com/
> 
>    v14->v15:
>     * Introduced a separate patch for loongarch64 build_pptt function.
>     * Made sure loongarch64 tests pass.
>     * Downgraded to V2 for ACPI PPTT. Removed PPTT IDs as was not necessary.
>     * Removed dependency as it's been merged in the recent cycle.
>       -- 20250604115233.1234-1-alireza.sanaee@huawei.com
>     * Fixed styling issues and removed irrelevant changes.
>     * Moved cache headers to core/cpu.h to be used in both acpi and virt.
>     * v14: https://lore.kernel.org/qemu-devel/20250707121908.155-1-alireza.sanaee@huawei.com/
>     # Thanks to Jonathan and Zhao for their comments.
> 
>    v13->v14:
>     * Rebased on latest staging.
>     * Made some naming changes to machine-smp.c, addd docs added to the
>          same file.
> 
>    v12->v13:
>     * Applied comments from Zhao.
>     * Introduced a new patch for machine specific cache topology functions.
>     * Base: bc98ffdc7577e55ab8373c579c28fe24d600c40f.
> 
>    v11->v12:
>     * Patch #4 couldn't not merge properly as the main file diverged. Now it is fixed (hopefully).
>     * Loonarch build_pptt function updated.
>     * Rebased on 09be8a511a2e278b45729d7b065d30c68dd699d0.
> 
>    v10->v11:
>     * Fix some coding style issues.
>     * Rename some variables.
> 
>    v9->v10:
>     * PPTT rev down to 2.
> 
>    v8->v9:
>     * rebase to 10
>     * Fixed a bug in device-tree generation related to a scenario when
>          caches are shared at core in higher levels than 1.
>    v7->v8:
>     * rebase: Merge tag 'pull-nbd-2024-08-26' of https://repo.or.cz/qemu/ericb into staging
>     * I mis-included a file in patch #4 and I removed it in this one.
> 
>    v6->v7:
>     * Intel stuff got pulled up, so rebase.
>     * added some discussions on device tree.
> 
>    v5->v6:
>     * Minor bug fix.
>     * rebase based on new Intel patchset.
>       - https://lore.kernel.org/qemu-devel/20250110145115.1574345-1-zhao1.liu@intel.com/
> 
>    v4->v5:
>      * Added Reviewed-by tags.
>      * Applied some comments.
> 
>    v3->v4:
>      * Device tree added.
> 
> Alireza Sanaee (8):
>    target/arm/tcg: increase cache level for cpu=max
>    hw/core/machine: topology functions capabilities added
>    hw/arm/virt: add cache hierarchy to device tree
>    bios-tables-test: prepare to change ARM ACPI virt PPTT
>    acpi: add caches to ACPI build_pptt table function
>    hw/acpi: add cache hierarchy to pptt table
>    tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology
>    Update the ACPI tables based on new aml-build.c
> 
>   hw/acpi/aml-build.c                        | 203 +++++++++-
>   hw/arm/virt-acpi-build.c                   |   8 +-
>   hw/arm/virt.c                              | 412 ++++++++++++++++++++-
>   hw/core/machine-smp.c                      |  56 +++
>   hw/loongarch/virt-acpi-build.c             |   4 +-
>   include/hw/acpi/aml-build.h                |   4 +-
>   include/hw/acpi/cpu.h                      |  10 +
>   include/hw/arm/virt.h                      |   7 +-
>   include/hw/boards.h                        |   5 +
>   include/hw/core/cpu.h                      |  12 +
>   target/arm/tcg/cpu64.c                     |  13 +
>   tests/data/acpi/aarch64/virt/PPTT.topology | Bin 356 -> 516 bytes
>   tests/qtest/bios-tables-test.c             |   4 +
>   13 files changed, 723 insertions(+), 15 deletions(-)
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max
  2025-08-27 14:21 ` [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
@ 2025-10-20 13:51   ` Gustavo Romero
  0 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2025-10-20 13:51 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	imammedo, jiangkunkun, jonathan.cameron, linuxarm, maobibo, mst,
	mtosatti, peter.maydell, philmd, qemu-arm, richard.henderson,
	shannon.zhaosl, yangyicong, zhao1.liu

Hi Alireza,

On 8/27/25 11:21, Alireza Sanaee wrote:
> This patch addresses cache description in the `aarch64_max_tcg_initfn`
> function for cpu=max. It introduces three layers of caches and modifies
> the cache description registers accordingly.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
> ---
>   target/arm/tcg/cpu64.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> index 35cddbafa4..bf1372ecdf 100644
> --- a/target/arm/tcg/cpu64.c
> +++ b/target/arm/tcg/cpu64.c
> @@ -1093,6 +1093,19 @@ void aarch64_max_tcg_initfn(Object *obj)
>       uint64_t t;
>       uint32_t u;
>   
> +    /*
> +     * Expanded cache set
> +     */

Can't make sense of this comment. I think it can be confused with anything
related to "Expanded cache index" (FEAT_CCIDX), which is a format not being
used to set the caches below, so maybe remove it?


> +    SET_IDREG(isar, CLIDR, 0x8200123); /* 4 4 3 in 3 bit fields */

Please improve the comment on CLIDR fields here if you want to keep it, like you
did below, i.e., stating what is selected for LoUU, LoC, LoUIS, and the type
of caches at L1, L2, and L3, like "Separate", "Unified", "Unified" etc.

Just to confirm, the ICB field is set to "Not disclosed by this mechanism" because
we don't want to bother setting it as we customize/tweak the topology?


> +    /* 64KB L1 dcache */
> +    cpu->ccsidr[0] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 4, 64, 64 * KiB, 7);
> +    /* 64KB L1 icache */
> +    cpu->ccsidr[1] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 4, 64, 64 * KiB, 2);
> +    /* 1MB L2 unified cache */
> +    cpu->ccsidr[2] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 8, 64, 1 * MiB, 7);
> +    /* 2MB L3 unified cache */
> +    cpu->ccsidr[4] = make_ccsidr(CCSIDR_FORMAT_LEGACY, 8, 64, 2 * MiB, 7);
> +
>       /*
>        * Unset ARM_FEATURE_BACKCOMPAT_CNTFRQ, which we would otherwise default
>        * to because we started with aarch64_a57_initfn(). A 'max' CPU might


Cheers,
Gustavo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v16 2/8] hw/core/machine: topology functions capabilities added
  2025-08-27 14:21 ` [PATCH v16 2/8] hw/core/machine: topology functions capabilities added Alireza Sanaee via
@ 2025-10-20 13:52   ` Gustavo Romero
  0 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2025-10-20 13:52 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	imammedo, jiangkunkun, jonathan.cameron, linuxarm, maobibo, mst,
	mtosatti, peter.maydell, philmd, qemu-arm, richard.henderson,
	shannon.zhaosl, yangyicong, zhao1.liu

Hi Alireza,

On 8/27/25 11:21, Alireza Sanaee wrote:
> Add two functions one of which finds the lowest level cache defined in

Maybe s/lowest level cache/lowest cache level/?


> the cache description input, and the other checks if caches are defined
> at a particular level.

Maybe improve the comment message with smtg like "if a given cache topology is defined
at a particular cache level"? For reviewing this series I'm sticking with
the term "cache level" to mean the levels as in L1, L2, etc., and with the term
"topology level" to mean "thread", "core", "module", etc.


> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
> ---
>   hw/core/machine-smp.c | 56 +++++++++++++++++++++++++++++++++++++++++++
>   include/hw/boards.h   |  5 ++++
>   2 files changed, 61 insertions(+)
> 
> diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
> index 0be0ac044c..32f3e7d6c9 100644
> --- a/hw/core/machine-smp.c
> +++ b/hw/core/machine-smp.c
> @@ -406,3 +406,59 @@ bool machine_check_smp_cache(const MachineState *ms, Error **errp)
>   
>       return true;
>   }
> +
> +/*
> + * This function assumes l3 and l2 have unified cache and l1 is split l1d and
> + * l1i.
> + */

Please, let's pick a form to write Ln cache and try to be consistent using it.
I prefer L1, L1d, etc. Here you're using l3, l2 form but in the comment on
machine_defines_cache_at_topo_level just below you use L2, L3. Also to be observed
in the other patches in this series.


> +bool machine_find_lowest_level_cache_at_topo_level(const MachineState *ms,
> +                                                   int *level_found,
> +                                                   CpuTopologyLevel topo_level)

"level_found" sounds a bit like a bool? how about "lowest_cache_level" instead?


> +{
> +
> +    CpuTopologyLevel level;
> +
> +    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I);
> +    if (level == topo_level) {
> +        *level_found = 1;
> +        return true;
> +    }
> +
> +    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D);
> +    if (level == topo_level) {
> +        *level_found = 1;
> +        return true;
> +    }
> +
> +    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2);
> +    if (level == topo_level) {
> +        *level_found = 2;
> +        return true;
> +    }
> +
> +    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3);
> +    if (level == topo_level) {
> +        *level_found = 3;
> +        return true;
> +    }
> +
> +    return false;
> +}

hmm it's a bit unfortunate that the cache level structure we use in QEMU
(actually it's a simple enum) doesn't allow us to obtain promptly the cache
level itself. Maybe, although just a slight enhancement, would be better to
iterate over the cache levels defined in QEMU and just set the cache level
for the separate cache levels, like:

enum CacheLevelAndType cache_level;
enum CpuTopologyLevel t;

for (cache_level = CACHE_LEVEL_AND_TYPE_L1D;
      cache_level < CACHE_LEVEL_AND_TYPE__MAX;
      cache_level++) {
     t = machine_get_cache_topo_level(ms, cache_level);
     if (t == topo_level) {
         /* Assume L1 is split into L1d and L1i caches. */
         if (cache_level == CACHE_LEVEL_AND_TYPE_L1D ||
             cache_level == CACHE_LEVEL_AND_TYPE_L1I) {
             *lowest_cache_level = 1; /* L1 */
         } else {
             /* Assume the other caches are unified. */
             *lowest_cache_level = cache_level;
         }

         return true;
     }
}

return false;

That won't avoid adding the separate caches in this function if new ones
are added to QEMU but will avoid adding the unified ones. Wdyt?


> +/*
> + * Check if there are caches defined at a particular level. It supports only
> + * L1, L2 and L3 caches, but this can be extended to more levels as needed.
> + *
> + * Return True on success, False otherwise.
> + */
> +bool machine_defines_cache_at_topo_level(const MachineState *ms,
> +                                       CpuTopologyLevel level)

How about using "topology" for CpuTopologyLevel var. instead of "level". We use
level in "cache level" and in "topology level" currently, so I think it's better
being more specific here.


> +{
> +    if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3) == level ||
> +        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2) == level ||
> +        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I) == level ||
> +        machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D) == level) {
> +        return true;
> +    }
> +    return false;
> +}

How about checking for all the levels defined in QEMU now and in future? That looks
possible, like:

enum CacheLevelAndType cache_level;

for (cache_level = CACHE_LEVEL_AND_TYPE_L1D;
      cache_level < CACHE_LEVEL_AND_TYPE__MAX;
      cache_level++) {
     if (machine_get_cache_topo_level(ms, cache_level) == topology) {
         return true;
     }
}

return false;

?


Cheers,
Gustavo

> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index f94713e6e2..3c1a999791 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -55,6 +55,11 @@ void machine_set_cache_topo_level(MachineState *ms, CacheLevelAndType cache,
>                                     CpuTopologyLevel level);
>   bool machine_check_smp_cache(const MachineState *ms, Error **errp);
>   void machine_memory_devices_init(MachineState *ms, hwaddr base, uint64_t size);
> +bool machine_defines_cache_at_topo_level(const MachineState *ms,
> +                                       CpuTopologyLevel level);
> +bool machine_find_lowest_level_cache_at_topo_level(const MachineState *ms,
> +                                                   int *level_found,
> +                                                   CpuTopologyLevel topo_level);
>   
>   /**
>    * machine_class_allow_dynamic_sysbus_dev: Add type to list of valid devices



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree
  2025-08-27 14:21 ` [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree Alireza Sanaee via
@ 2025-10-20 14:33   ` Gustavo Romero
  0 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2025-10-20 14:33 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	imammedo, jiangkunkun, jonathan.cameron, linuxarm, maobibo, mst,
	mtosatti, peter.maydell, philmd, qemu-arm, richard.henderson,
	shannon.zhaosl, yangyicong, zhao1.liu

Hi Alireza,

On 8/27/25 11:21, Alireza Sanaee wrote:
> Specify which layer (core/cluster/socket) caches found at in the CPU
> topology. Updating cache topology to device tree (spec v0.4).
> Example:

Could we stick with the terms "cache topology" and "cache level", which
are already used in the context in QEMU, instead of "layer caches" here?

So it would smth like "Specify the cache topology for the cache levels".


> For example, 2 sockets (packages), and 2 clusters, 4 cores and 2 threads
> created, in aggregate 2*2*4*2 logical cores. In the smp-cache object,
> cores will have l1d and l1i.  However, extending this is not difficult.
> The clusters will share a unified l2 level cache, and finally sockets
> will share l3. In this patch, threads will share l1 caches by default,
> but this can be adjusted if case required.

As I mentioned before, let's trick to stick to no notation form for the
caches, preferable the levels always in upper case. Here you use l1i
and right below L1i and the same happens in other code comments.

> Only three levels of caches are supported.  The patch does not
> allow partial declaration of caches. In other words, the topology level
> of every cache must be specified if that of any level is.
> 
> ./qemu-system-aarch64 \
>      -machine virt,\
>           smp-cache.0.cache=l1i,smp-cache.0.topology=core,\
>           smp-cache.1.cache=l1d,smp-cache.1.topology=core,\
>           smp-cache.2.cache=l2,smp-cache.2.topology=cluster,\
>           smp-cache.3.cache=l3,smp-cache.3.topology=socket\
>      -cpu max \
>      -m 2048 \
>      -smp sockets=2,clusters=2,cores=4,threads=1 \
>      -kernel ./Image.gz \
>      -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
>      -initrd rootfs.cpio.gz \
>      -bios ./edk2-aarch64-code.fd \
>      -nographic
> 
> For instance, following device tree will be generated for a scenario
> where we have 2 sockets, 2 clusters, 2 cores and 2 threads, in total 16
> PEs. L1i and L1d are private to each thread, and l2 and l3 are shared at
> socket level as an example.
> 
> Limitation: SMT cores cannot share L1 cache for now. This
> problem does not exist in PPTT tables.
> 
> Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
> ---
>   hw/arm/virt.c         | 412 +++++++++++++++++++++++++++++++++++++++++-
>   include/hw/arm/virt.h |   7 +-
>   include/hw/core/cpu.h |  12 ++
>   3 files changed, 429 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ef6be3660f..9094d8bef8 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -88,6 +88,7 @@
>   #include "hw/virtio/virtio-md-pci.h"
>   #include "hw/virtio/virtio-iommu.h"
>   #include "hw/char/pl011.h"
> +#include "hw/core/cpu.h"
>   #include "hw/cxl/cxl.h"
>   #include "hw/cxl/cxl_host.h"
>   #include "qemu/guest-random.h"
> @@ -273,6 +274,134 @@ static bool ns_el2_virt_timer_present(void)
>           arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
>   }
>   
> +unsigned int virt_get_caches(const VirtMachineState *vms,
> +                             CPUCoreCaches *caches)
> +{
> +    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0)); /* assume homogeneous CPUs */
> +    bool ccidx = cpu_isar_feature(any_ccidx, armcpu);
> +    ARMISARegisters *isar = &armcpu->isar;
> +    unsigned int num_cache, i;
> +    int level_instr = 1, level_data = 1;
> +
> +    for (i = 0, num_cache = 0; i < CPU_MAX_CACHES; i++, num_cache++) {
> +        uint32_t clidr = GET_IDREG(isar, CLIDR);

Probably the compiler will do it for us, but you can hoist getting the CLIDR out
of the loop since it's an invariant, so for code readability.


> +        int type = (clidr >> (3 * i)) & 7;
> +        int bank_index;
> +        int level;
> +        enum CacheType cache_type;
> +
> +        if (type == 0) {
> +            break;
> +        }

It should be iterating over the 7 cache levels in CLIDR, not over the CPU_MAX_CACHES (16)
defined in QEMU, for two reasons: it might be the case that this break condition is never meet,
like when there is a description for all the 7 cache levels, i.e, no Ctype<n> == 0; and second
because it would avoid the necessity of vars like "level_instr" and "level_data" because we
know the cache level directly (since would be pretty much iterating over all the cache levels
available in the machine).


> +
> +        switch (type) {
> +        case 1:
> +            cache_type = INSTRUCTION_CACHE;
> +            level = level_instr;
> +            break;
> +        case 2:
> +            cache_type = DATA_CACHE;
> +            level = level_data;
> +            break;
> +        case 4:
> +            cache_type = UNIFIED_CACHE;
> +            level = level_instr > level_data ? level_instr : level_data;
> +            break;
> +        case 3: /* Split - Do data first */
> +            cache_type = DATA_CACHE;
> +            level = level_data;
> +            break;
> +        default:
> +            error_setg(&error_abort, "Unrecognized cache type");
> +            return 0;
> +        }
> +        /*
> +         * ccsidr is indexed using both the level and whether it is
> +         * an instruction cache. Unified caches use the same storage
> +         * as data caches.
> +         */
> +        bank_index = (i * 2) | ((type == 1) ? 1 : 0);
> +        if (ccidx) {
> +            caches[num_cache] = (CPUCoreCaches) {
> +                .type =  cache_type,
> +                .level = level,
> +                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                             CCSIDR_EL1,
> +                                             CCIDX_LINESIZE) + 4),
> +                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                            CCSIDR_EL1,
> +                                            CCIDX_ASSOCIATIVITY) + 1,
> +                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
> +                                   CCIDX_NUMSETS) + 1,
> +            };
> +        } else {
> +            caches[num_cache] = (CPUCoreCaches) {
> +                .type =  cache_type,
> +                .level = level,
> +                .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                             CCSIDR_EL1, LINESIZE) + 4),
> +                .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                            CCSIDR_EL1,
> +                                            ASSOCIATIVITY) + 1,
> +                .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
> +                                   NUMSETS) + 1,
> +            };
> +        }
> +        caches[num_cache].size = caches[num_cache].associativity *
> +            caches[num_cache].sets * caches[num_cache].linesize;
> +
> +        /* Break one 'split' entry up into two records */
> +        if (type == 3) {
> +            num_cache++;
> +            bank_index = (i * 2) | 1;
> +            if (ccidx) {
> +                /* Instruction cache: bottom bit set when reading banked reg */
> +                caches[num_cache] = (CPUCoreCaches) {
> +                    .type = INSTRUCTION_CACHE,
> +                    .level = level_instr,
> +                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                                 CCSIDR_EL1,
> +                                                 CCIDX_LINESIZE) + 4),
> +                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                                CCSIDR_EL1,
> +                                                CCIDX_ASSOCIATIVITY) + 1,
> +                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
> +                                       CCIDX_NUMSETS) + 1,
> +                };
> +            } else {
> +                caches[num_cache] = (CPUCoreCaches) {
> +                    .type = INSTRUCTION_CACHE,
> +                    .level = level_instr,
> +                    .linesize = 1 << (FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                                 CCSIDR_EL1, LINESIZE) + 4),
> +                    .associativity = FIELD_EX64(armcpu->ccsidr[bank_index],
> +                                                CCSIDR_EL1,
> +                                                ASSOCIATIVITY) + 1,
> +                    .sets = FIELD_EX64(armcpu->ccsidr[bank_index], CCSIDR_EL1,
> +                                       NUMSETS) + 1,
> +                };
> +            }
> +            caches[num_cache].size = caches[num_cache].associativity *
> +                caches[num_cache].sets * caches[num_cache].linesize;
> +        }

Could you please move the field extractions to a new function to make this
function less cluttered? I think having a function with the following
signature would make it (it's a suggestion, few free to adjust it):

void set_cpu_cache(CPUCoreCaches *cpu_cache, enum CacheType cache_type, int cache_level, bool is_i_cache)

where you can call again qemu_get_cpu(0) (ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0))) to
check for FEAT_CCIDX and acccess the ccidr bank.


> +        switch (type) {
> +        case 1:
> +            level_instr++;
> +            break;
> +        case 2:
> +            level_data++;
> +            break;
> +        case 3:
> +        case 4:
> +            level_instr++;
> +            level_data++;
> +            break;
> +        }
> +    }
> +
> +    return num_cache;
> +}
> +

In general this function is clunky and it's necessary to improve it.

Besides the iteration issue mentioned above and the factoring out of
the field extractions:

- Defines should be used in the cases, instead of magics constants, the could be:

#define CLIDR_CTYPE_NO_CACHE 0x00
#define CLIDR_CTYPE_I_CACHE 0x01
#define CLIDR_CTYPE_D_CACHE 0x02
#define CLIDR_CTYPE_SEPARATE_I_D_CACHES 0x03
#define CLIDR_CTYPE_UNIFIED_CACHE 0x04

also, it would be convenint to have:

#define CLIDR_CTYPE_MAX_CACHE_LEVEL 7 for the iteration over CLIDR cache type fields.


- Get rid of the level_instr and level_data variables, since the cache level is already available if we iterate over CLIDR.
   For example:

   int num_cache = 0;
   for (int cache_level = 1; cache_level <= CLIDR_CTYPE_MAX_CACHE_LEVEL; cache_level++) {
       uint8_t ctype = (clidr >> (3 * (cache_level - 1))) & 7;

       if (ctype == CLIDR_CTYPE_NO_CACHE) {
           /*
           * If a "No cache" cache type is found it means no menageable caches
           * exist at further-out levels of hierarchy, so ignore them.
           */
          break;
       } else if (ctype == CLIDR_CTYPE_SEPARATE_I_D_CACHES) {
          /*
          * Create separate D and I caches. D-cache is stored first.
          */
         enum CacheType cache_type;
         for (cache_type = DATA_CACHE;
              cache_type <= INSTRUCTION_CACHE;
              cache_type++) {
             set_cpu_cache(&caches[num_cache++], cache_type, cache_level,
                           cache_type == INSTRUCTION_CACHE ? true : false);
          }
      } else if (ctype == CLIDR_CTYPE_UNIFIED_CACHE) {
          set_cpu_cache(&caches[num_cache++], UNIFIED_CACHE, cache_level, false);
      } else if (ctype == CLIDR_CTYPE_D_CACHE) {
          set_cpu_cache(&caches[num_cache++], DATA_CACHE, cache_level, false);
      } else if (ctype == CLIDR_CTYPE_I_CACHE) {
          set_cpu_cache(&caches[num_cache++], INSTRUCTION_CACHE, cache_level, true);
      } else {
          error_setg(&error_abort, "Unrecognized cache type");
          return 0;
      }
   }

   return num_cache;


   This an example so you can see what I mean here precisely. Feel free to adjust it or keep it.


Cheers,
Gustavo

>   static void create_fdt(VirtMachineState *vms)
>   {
>       MachineState *ms = MACHINE(vms);
> @@ -423,13 +552,150 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
>       }
>   }
>   
> +static void add_cache_node(void *fdt, char *nodepath, CPUCoreCaches cache,
> +                           uint32_t *next_level)
> +{
> +    /* Assume L2/3 are unified caches. */
> +
> +    uint32_t phandle;
> +
> +    qemu_fdt_add_path(fdt, nodepath);
> +    phandle = qemu_fdt_alloc_phandle(fdt);
> +    qemu_fdt_setprop_cell(fdt, nodepath, "phandle", phandle);
> +    qemu_fdt_setprop_cell(fdt, nodepath, "cache-level", cache.level);
> +    qemu_fdt_setprop_cell(fdt, nodepath, "cache-size", cache.size);
> +    qemu_fdt_setprop_cell(fdt, nodepath, "cache-block-size", cache.linesize);
> +    qemu_fdt_setprop_cell(fdt, nodepath, "cache-sets", cache.sets);
> +    qemu_fdt_setprop(fdt, nodepath, "cache-unified", NULL, 0);
> +    qemu_fdt_setprop_string(fdt, nodepath, "compatible", "cache");
> +    if (cache.level != 3) {
> +        /* top level cache doesn't have next-level-cache property */
> +        qemu_fdt_setprop_cell(fdt, nodepath, "next-level-cache", *next_level);
> +    }
> +
> +    *next_level = phandle;
> +}
> +
> +static bool add_cpu_cache_hierarchy(void *fdt, CPUCoreCaches* cache,
> +                                    uint32_t cache_cnt,
> +                                    uint32_t top_level,
> +                                    uint32_t bottom_level,
> +                                    uint32_t cpu_id,
> +                                    uint32_t *next_level) {
> +    bool found_cache = false;
> +
> +    for (int level = top_level; level >= bottom_level; level--) {
> +        for (int i = 0; i < cache_cnt; i++) {
> +            char *nodepath;
> +
> +            if (i != level) {
> +                continue;
> +            }
> +
> +            nodepath = g_strdup_printf("/cpus/cpu@%d/l%d-cache",
> +                                       cpu_id, level);
> +            add_cache_node(fdt, nodepath, cache[i], next_level);
> +            found_cache = true;
> +            g_free(nodepath);
> +
> +        }
> +    }
> +
> +    return found_cache;
> +}
> +
> +static void set_cache_properties(void *fdt, const char *nodename,
> +                                 const char *prefix, CPUCoreCaches cache)
> +{
> +    char prop_name[64];
> +
> +    snprintf(prop_name, sizeof(prop_name), "%s-block-size", prefix);
> +    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.linesize);
> +
> +    snprintf(prop_name, sizeof(prop_name), "%s-size", prefix);
> +    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.size);
> +
> +    snprintf(prop_name, sizeof(prop_name), "%s-sets", prefix);
> +    qemu_fdt_setprop_cell(fdt, nodename, prop_name, cache.sets);
> +}
> +
> +static int partial_cache_description(const MachineState *ms,
> +                                     CPUCoreCaches *caches, int num_caches)
> +{
> +    int level, c;
> +
> +    for (level = 1; level < num_caches; level++) {
> +        for (c = 0; c < num_caches; c++) {
> +            if (caches[c].level != level) {
> +                continue;
> +            }
> +
> +            switch (level) {
> +            case 1:
> +                /*
> +                 * L1 cache is assumed to have both L1I and L1D available.
> +                 * Technically both need to be checked.
> +                 */
> +                if (machine_get_cache_topo_level(ms,
> +                                                 CACHE_LEVEL_AND_TYPE_L1I) ==
> +                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
> +                    return level;
> +                }
> +                break;
> +            case 2:
> +                if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2) ==
> +                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
> +                    return level;
> +                }
> +                break;
> +            case 3:
> +                if (machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3) ==
> +                    CPU_TOPOLOGY_LEVEL_DEFAULT) {
> +                    return level;
> +                }
> +                break;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>   static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>   {
>       int cpu;
>       int addr_cells = 1;
>       const MachineState *ms = MACHINE(vms);
> +    const MachineClass *mc = MACHINE_GET_CLASS(ms);
>       const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
>       int smp_cpus = ms->smp.cpus;
> +    int socket_id, cluster_id, core_id;
> +    uint32_t next_level = 0;
> +    uint32_t socket_offset = 0;
> +    uint32_t cluster_offset = 0;
> +    uint32_t core_offset = 0;
> +    int last_socket = -1;
> +    int last_cluster = -1;
> +    int last_core = -1;
> +    int top_node = 3;
> +    int top_cluster = 3;
> +    int top_core = 3;
> +    int bottom_node = 3;
> +    int bottom_cluster = 3;
> +    int bottom_core = 3;
> +    unsigned int num_cache;
> +    CPUCoreCaches caches[16];
> +    bool cache_created = false;
> +    bool cache_available;
> +    bool llevel;
> +
> +    num_cache = virt_get_caches(vms, caches);
> +
> +    if (mc->smp_props.has_caches &&
> +        partial_cache_description(ms, caches, num_cache)) {
> +        error_setg(&error_fatal, "Missing cache description");
> +        return;
> +    }
>   
>       /*
>        * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
> @@ -458,9 +724,14 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>       qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#size-cells", 0x0);
>   
>       for (cpu = smp_cpus - 1; cpu >= 0; cpu--) {
> +        socket_id = cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
> +        cluster_id = cpu / (ms->smp.cores * ms->smp.threads) % ms->smp.clusters;
> +        core_id = cpu / ms->smp.threads % ms->smp.cores;
> +
>           char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
>           ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
>           CPUState *cs = CPU(armcpu);
> +        const char *prefix = NULL;
>   
>           qemu_fdt_add_subnode(ms->fdt, nodename);
>           qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "cpu");
> @@ -490,6 +761,139 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>                                     qemu_fdt_alloc_phandle(ms->fdt));
>           }
>   
> +        if (!vmc->no_cpu_topology && num_cache) {
> +            for (uint8_t i = 0; i < num_cache; i++) {
> +                /* only level 1 in the CPU entry */
> +                if (caches[i].level > 1) {
> +                    continue;
> +                }
> +
> +                if (caches[i].type == INSTRUCTION_CACHE) {
> +                    prefix = "i-cache";
> +                } else if (caches[i].type == DATA_CACHE) {
> +                    prefix = "d-cache";
> +                } else if (caches[i].type == UNIFIED_CACHE) {
> +                    error_setg(&error_fatal,
> +                               "Unified type is not implemented at level %d",
> +                               caches[i].level);
> +                    return;
> +                } else {
> +                    error_setg(&error_fatal, "Undefined cache type");
> +                    return;
> +                }
> +
> +                set_cache_properties(ms->fdt, nodename, prefix, caches[i]);
> +            }
> +        }
> +
> +        if (socket_id != last_socket) {
> +            bottom_node = top_node;
> +            /* this assumes socket as the highest topological level */
> +            socket_offset = 0;
> +            cluster_offset = 0;
> +            cache_available = machine_defines_cache_at_topo_level(ms,
> +                                  CPU_TOPOLOGY_LEVEL_SOCKET);
> +            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
> +                        &bottom_node,
> +                        CPU_TOPOLOGY_LEVEL_SOCKET);
> +            if (cache_available && llevel) {
> +                if (bottom_node == 1 && !virt_is_acpi_enabled(vms))
> +                    error_setg(
> +                        &error_fatal,
> +                        "Cannot share L1 at socket_id %d."
> +                        "DT limiation on sharing at cache level = 1",
> +                        socket_id);
> +
> +                cache_created = add_cpu_cache_hierarchy(ms->fdt, caches,
> +                                                        num_cache,
> +                                                        top_node,
> +                                                        bottom_node, cpu,
> +                                                        &socket_offset);
> +
> +                if (!cache_created) {
> +                    error_setg(&error_fatal,
> +                               "Socket: No caches at levels %d-%d",
> +                               top_node, bottom_node);
> +                    return;
> +                }
> +
> +                top_cluster = bottom_node - 1;
> +            }
> +
> +            last_socket = socket_id;
> +        }
> +
> +        if (cluster_id != last_cluster) {
> +            bottom_cluster = top_cluster;
> +            cluster_offset = socket_offset;
> +            core_offset = 0;
> +            cache_available = machine_defines_cache_at_topo_level(ms,
> +                                 CPU_TOPOLOGY_LEVEL_CLUSTER);
> +            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
> +                        &bottom_cluster,
> +                        CPU_TOPOLOGY_LEVEL_CLUSTER);
> +            if (cache_available && llevel) {
> +                cache_created = add_cpu_cache_hierarchy(ms->fdt, caches,
> +                                                        num_cache,
> +                                                        top_cluster,
> +                                                        bottom_cluster, cpu,
> +                                                        &cluster_offset);
> +                if (bottom_cluster == 1 && !virt_is_acpi_enabled(vms)) {
> +                    error_setg(&error_fatal,
> +                        "Cannot share L1 at socket_id %d, cluster_id %d. "
> +                        "DT limitation on sharing at cache level = 1.",
> +                        socket_id, cluster_id);
> +                }
> +
> +                if (!cache_created) {
> +                    error_setg(&error_fatal,
> +                               "Cluster: No caches at levels %d-%d.",
> +                               top_cluster, bottom_cluster);
> +                    return;
> +                }
> +
> +                top_core = bottom_cluster - 1;
> +            } else if (top_cluster == bottom_node - 1) {
> +                top_core = bottom_node - 1;
> +            }
> +
> +            last_cluster = cluster_id;
> +        }
> +
> +        if (core_id != last_core) {
> +            bottom_core = top_core;
> +            core_offset = cluster_offset;
> +            cache_available = machine_defines_cache_at_topo_level(ms,
> +                                 CPU_TOPOLOGY_LEVEL_CORE);
> +            llevel = machine_find_lowest_level_cache_at_topo_level(ms,
> +                        &bottom_core,
> +                        CPU_TOPOLOGY_LEVEL_CORE);
> +            if (cache_available && llevel) {
> +                if (bottom_core == 1 && top_core > 1) {
> +                    bottom_core++;
> +                    cache_created = add_cpu_cache_hierarchy(ms->fdt,
> +                                                            caches,
> +                                                            num_cache,
> +                                                            top_core,
> +                                                            bottom_core, cpu,
> +                                                            &core_offset);
> +
> +                    if (!cache_created) {
> +                        error_setg(&error_fatal,
> +                                   "Core: No caches at levels %d-%d",
> +                                   top_core, bottom_core);
> +                        return;
> +                    }
> +                }
> +            }
> +
> +            last_core = core_id;
> +        }
> +
> +        next_level = core_offset;
> +        qemu_fdt_setprop_cell(ms->fdt, nodename, "next-level-cache",
> +                              next_level);
> +
>           g_free(nodename);
>       }
>   
> @@ -2721,7 +3125,7 @@ static void virt_set_oem_table_id(Object *obj, const char *value,
>   }
>   
>   
> -bool virt_is_acpi_enabled(VirtMachineState *vms)
> +bool virt_is_acpi_enabled(const VirtMachineState *vms)
>   {
>       if (vms->acpi == ON_OFF_AUTO_OFF) {
>           return false;
> @@ -3247,6 +3651,12 @@ static void virt_machine_class_init(ObjectClass *oc, const void *data)
>       hc->unplug = virt_machine_device_unplug_cb;
>       mc->nvdimm_supported = true;
>       mc->smp_props.clusters_supported = true;
> +
> +    /* Supported caches */
> +    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1D] = true;
> +    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1I] = true;
> +    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L2] = true;
> +    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L3] = true;
>       mc->auto_enable_numa_with_memhp = true;
>       mc->auto_enable_numa_with_memdev = true;
>       /* platform instead of architectural choice */
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 365a28b082..0099ea7fa1 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -40,6 +40,7 @@
>   #include "system/kvm.h"
>   #include "hw/intc/arm_gicv3_common.h"
>   #include "qom/object.h"
> +#include "hw/core/cpu.h"
>   
>   #define NUM_GICV2M_SPIS       64
>   #define NUM_VIRTIO_TRANSPORTS 32
> @@ -51,6 +52,8 @@
>   /* GPIO pins */
>   #define GPIO_PIN_POWER_BUTTON  3
>   
> +#define CPU_MAX_CACHES 16
> +
>   enum {
>       VIRT_FLASH,
>       VIRT_MEM,
> @@ -187,7 +190,9 @@ struct VirtMachineState {
>   OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
>   
>   void virt_acpi_setup(VirtMachineState *vms);
> -bool virt_is_acpi_enabled(VirtMachineState *vms);
> +bool virt_is_acpi_enabled(const VirtMachineState *vms);
> +unsigned int virt_get_caches(const VirtMachineState *vms,
> +                             CPUCoreCaches *caches);
>   
>   /* Return number of redistributors that fit in the specified region */
>   static uint32_t virt_redist_capacity(VirtMachineState *vms, int region)
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 5eaf41a566..045219a68b 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -1134,4 +1134,16 @@ enum CacheType {
>       UNIFIED_CACHE
>   };
>   
> +struct CPUCoreCaches {
> +    enum CacheType type;
> +    uint32_t sets;
> +    uint32_t size;
> +    uint32_t level;
> +    uint16_t linesize;
> +    uint8_t attributes; /* write policy: 0x0 write back, 0x1 write through */
> +    uint8_t associativity;
> +};
> +
> +typedef struct CPUCoreCaches CPUCoreCaches;
> +
>   #endif


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table
  2025-08-27 14:21 ` [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table Alireza Sanaee via
@ 2025-10-20 14:33   ` Gustavo Romero
  0 siblings, 0 replies; 14+ messages in thread
From: Gustavo Romero @ 2025-10-20 14:33 UTC (permalink / raw)
  To: Alireza Sanaee, qemu-devel
  Cc: anisinha, armbru, berrange, dapeng1.mi, eric.auger, farman,
	imammedo, jiangkunkun, jonathan.cameron, linuxarm, maobibo, mst,
	mtosatti, peter.maydell, philmd, qemu-arm, richard.henderson,
	shannon.zhaosl, yangyicong, zhao1.liu

Hi Alireza,

On 8/27/25 11:21, Alireza Sanaee wrote:
> Add cache topology to PPTT table. With this patch, both ACPI PPTT table
> and device tree will represent the same cache topology given users
> input.

This patch touches APCI only, so please remove "and device tree" from the
comment message. I understand you're saying it considering the series, but
in this case the commit message should describe the changes in its commit only.


Cheers,
Gustavo

> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
> ---
>   hw/acpi/aml-build.c      | 200 +++++++++++++++++++++++++++++++++++++--
>   hw/arm/virt-acpi-build.c |   8 +-
>   include/hw/acpi/cpu.h    |  10 ++
>   3 files changed, 209 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index e854f14565..72b6bfdbe9 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -31,6 +31,7 @@
>   #include "hw/pci/pci_bus.h"
>   #include "hw/pci/pci_bridge.h"
>   #include "qemu/cutils.h"
> +#include "hw/core/cpu.h"
>   
>   static GArray *build_alloc_array(void)
>   {
> @@ -2140,6 +2141,104 @@ void build_spcr(GArray *table_data, BIOSLinker *linker,
>       }
>       acpi_table_end(linker, &table);
>   }
> +
> +static void build_cache_nodes(GArray *tbl, CPUCoreCaches *cache,
> +                              uint32_t next_offset)
> +{
> +    int val;
> +
> +    /* Type 1 - cache */
> +    build_append_byte(tbl, 1);
> +    /* Length */
> +    build_append_byte(tbl, 24);
> +    /* Reserved */
> +    build_append_int_noprefix(tbl, 0, 2);
> +    /* Flags */
> +    build_append_int_noprefix(tbl, 0x7f, 4);
> +    /* Offset of next cache up */
> +    build_append_int_noprefix(tbl, next_offset, 4);
> +    build_append_int_noprefix(tbl, cache->size, 4);
> +    build_append_int_noprefix(tbl, cache->sets, 4);
> +    build_append_byte(tbl, cache->associativity);
> +    val = 0x3;
> +    switch (cache->type) {
> +    case INSTRUCTION_CACHE:
> +        val |= (1 << 2);
> +        break;
> +    case DATA_CACHE:
> +        val |= (0 << 2); /* Data */
> +        break;
> +    case UNIFIED_CACHE:
> +        val |= (3 << 2); /* Unified */
> +        break;
> +    }
> +    build_append_byte(tbl, val);
> +    build_append_int_noprefix(tbl, cache->linesize, 2);
> +}
> +
> +/*
> + * builds caches from the top level (`level_high` parameter) to the bottom
> + * level (`level_low` parameter).  It searches for caches found in
> + * systems' registers, and fills up the table. Then it updates the
> + * `data_offset` and `instr_offset` parameters with the offset of the data
> + * and instruction caches of the lowest level, respectively.
> + */
> +static bool build_caches(GArray *table_data, uint32_t pptt_start,
> +                         int num_caches, CPUCoreCaches *caches,
> +                         uint8_t level_high, /* Inclusive */
> +                         uint8_t level_low,  /* Inclusive */
> +                         uint32_t *data_offset,
> +                         uint32_t *instr_offset)
> +{
> +    uint32_t next_level_offset_data = 0, next_level_offset_instruction = 0;
> +    uint32_t this_offset, next_offset = 0;
> +    int c, level;
> +    bool found_cache = false;
> +
> +    /* Walk caches from top to bottom */
> +    for (level = level_high; level >= level_low; level--) {
> +        for (c = 0; c < num_caches; c++) {
> +            if (caches[c].level != level) {
> +                continue;
> +            }
> +
> +            /* Assume only unified above l1 for now */
> +            this_offset = table_data->len - pptt_start;
> +            switch (caches[c].type) {
> +            case INSTRUCTION_CACHE:
> +                next_offset = next_level_offset_instruction;
> +                break;
> +            case DATA_CACHE:
> +                next_offset = next_level_offset_data;
> +                break;
> +            case UNIFIED_CACHE:
> +                /* Either is fine here */
> +                next_offset = next_level_offset_instruction;
> +                break;
> +            }
> +            build_cache_nodes(table_data, &caches[c], next_offset);
> +            switch (caches[c].type) {
> +            case INSTRUCTION_CACHE:
> +                next_level_offset_instruction = this_offset;
> +                break;
> +            case DATA_CACHE:
> +                next_level_offset_data = this_offset;
> +                break;
> +            case UNIFIED_CACHE:
> +                next_level_offset_instruction = this_offset;
> +                next_level_offset_data = this_offset;
> +                break;
> +            }
> +            *data_offset = next_level_offset_data;
> +            *instr_offset = next_level_offset_instruction;
> +
> +            found_cache = true;
> +        }
> +    }
> +
> +    return found_cache;
> +}
> +
>   /*
>    * ACPI spec, Revision 6.3
>    * 5.2.29 Processor Properties Topology Table (PPTT)
> @@ -2150,11 +2249,32 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
>   {
>       MachineClass *mc = MACHINE_GET_CLASS(ms);
>       CPUArchIdList *cpus = ms->possible_cpus;
> -    int64_t socket_id = -1, cluster_id = -1, core_id = -1;
> -    uint32_t socket_offset = 0, cluster_offset = 0, core_offset = 0;
> +    uint32_t core_data_offset = 0;
> +    uint32_t core_instr_offset = 0;
> +    uint32_t cluster_instr_offset = 0;
> +    uint32_t cluster_data_offset = 0;
> +    uint32_t node_data_offset = 0;
> +    uint32_t node_instr_offset = 0;
> +    int top_node = 3;
> +    int top_cluster = 3;
> +    int top_core = 3;
> +    int bottom_node = 3;
> +    int bottom_cluster = 3;
> +    int bottom_core = 3;
> +    int64_t socket_id = -1;
> +    int64_t cluster_id = -1;
> +    int64_t core_id = -1;
> +    uint32_t socket_offset = 0;
> +    uint32_t cluster_offset = 0;
> +    uint32_t core_offset = 0;
>       uint32_t pptt_start = table_data->len;
>       uint32_t root_offset;
>       int n;
> +    uint32_t priv_rsrc[2];
> +    uint32_t num_priv = 0;
> +    bool cache_available;
> +    bool llevel;
> +
>       AcpiTable table = { .sig = "PPTT", .rev = 2,
>                           .oem_id = oem_id, .oem_table_id = oem_table_id };
>   
> @@ -2184,11 +2304,30 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
>               socket_id = cpus->cpus[n].props.socket_id;
>               cluster_id = -1;
>               core_id = -1;
> +            bottom_node = top_node;
> +            num_priv = 0;
> +            cache_available = machine_defines_cache_at_topo_level(
> +                ms, CPU_TOPOLOGY_LEVEL_SOCKET);
> +            llevel = machine_find_lowest_level_cache_at_topo_level(
> +                ms, &bottom_node, CPU_TOPOLOGY_LEVEL_SOCKET);
> +            if (cache_available && llevel) {
> +                build_caches(table_data, pptt_start, num_caches, caches,
> +                             top_node, bottom_node, &node_data_offset,
> +                             &node_instr_offset);
> +                priv_rsrc[0] = node_instr_offset;
> +                priv_rsrc[1] = node_data_offset;
> +                if (node_instr_offset || node_data_offset) {
> +                    num_priv = node_instr_offset == node_data_offset ? 1 : 2;
> +                }
> +
> +                top_cluster = bottom_node - 1;
> +            }
> +
>               socket_offset = table_data->len - pptt_start;
>               build_processor_hierarchy_node(table_data,
>                   (1 << 0) | /* Physical package */
>                   (1 << 4), /* Identical Implementation */
> -                root_offset, socket_id, NULL, 0);
> +                root_offset, socket_id, priv_rsrc, num_priv);
>           }
>   
>           if (mc->smp_props.clusters_supported && mc->smp_props.has_clusters) {
> @@ -2196,21 +2335,68 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
>                   assert(cpus->cpus[n].props.cluster_id > cluster_id);
>                   cluster_id = cpus->cpus[n].props.cluster_id;
>                   core_id = -1;
> +                bottom_cluster = top_cluster;
> +                num_priv = 0;
> +                cache_available = machine_defines_cache_at_topo_level(
> +                    ms, CPU_TOPOLOGY_LEVEL_CLUSTER);
> +                llevel = machine_find_lowest_level_cache_at_topo_level(
> +                    ms, &bottom_cluster, CPU_TOPOLOGY_LEVEL_CLUSTER);
> +
> +                if (cache_available && llevel) {
> +                    build_caches(table_data, pptt_start, num_caches, caches,
> +                                 top_cluster, bottom_cluster,
> +                                 &cluster_data_offset, &cluster_instr_offset);
> +                    priv_rsrc[0] = cluster_instr_offset;
> +                    priv_rsrc[1] = cluster_data_offset;
> +                    if (cluster_instr_offset || cluster_data_offset) {
> +                        num_priv =
> +                            cluster_instr_offset == cluster_data_offset ? 1 : 2;
> +                    }
> +                    top_core = bottom_cluster - 1;
> +                } else if (top_cluster == bottom_node - 1) {
> +                    /* socket cache but no cluster cache */
> +                    top_core = bottom_node - 1;
> +                }
> +
>                   cluster_offset = table_data->len - pptt_start;
>                   build_processor_hierarchy_node(table_data,
>                       (0 << 0) | /* Not a physical package */
>                       (1 << 4), /* Identical Implementation */
> -                    socket_offset, cluster_id, NULL, 0);
> +                    socket_offset, cluster_id, priv_rsrc, num_priv);
>               }
>           } else {
> +            if (machine_defines_cache_at_topo_level(
> +                    ms, CPU_TOPOLOGY_LEVEL_CLUSTER)) {
> +                error_setg(&error_fatal, "Not clusters found for the cache");
> +                return;
> +            }
> +
>               cluster_offset = socket_offset;
> +            top_core = bottom_node - 1; /* there is no cluster */
> +        }
> +
> +        if (cpus->cpus[n].props.core_id != core_id) {
> +            bottom_core = top_core;
> +            num_priv = 0;
> +            cache_available = machine_defines_cache_at_topo_level(
> +                ms, CPU_TOPOLOGY_LEVEL_CORE);
> +            llevel = machine_find_lowest_level_cache_at_topo_level(
> +                ms, &bottom_core, CPU_TOPOLOGY_LEVEL_CORE);
> +            if (cache_available && llevel) {
> +                build_caches(table_data, pptt_start, num_caches, caches,
> +                             top_core, bottom_core, &core_data_offset,
> +                             &core_instr_offset);
> +                priv_rsrc[0] = core_instr_offset;
> +                priv_rsrc[1] = core_data_offset;
> +                num_priv = core_instr_offset == core_data_offset ? 1 : 2;
> +            }
>           }
>   
>           if (ms->smp.threads == 1) {
>               build_processor_hierarchy_node(table_data,
>                   (1 << 1) | /* ACPI Processor ID valid */
> -                (1 << 3),  /* Node is a Leaf */
> -                cluster_offset, n, NULL, 0);
> +                (1 << 3), /* Node is a Leaf */
> +                cluster_offset, n, priv_rsrc, num_priv);
>           } else {
>               if (cpus->cpus[n].props.core_id != core_id) {
>                   assert(cpus->cpus[n].props.core_id > core_id);
> @@ -2219,7 +2405,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
>                   build_processor_hierarchy_node(table_data,
>                       (0 << 0) | /* Not a physical package */
>                       (1 << 4), /* Identical Implementation */
> -                    cluster_offset, core_id, NULL, 0);
> +                    cluster_offset, core_id, priv_rsrc, num_priv);
>               }
>   
>               build_processor_hierarchy_node(table_data,
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index a6115f2f80..5fca69fcb2 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -1022,6 +1022,10 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>       unsigned dsdt, xsdt;
>       GArray *tables_blob = tables->table_data;
>       MachineState *ms = MACHINE(vms);
> +    CPUCoreCaches caches[CPU_MAX_CACHES];
> +    unsigned int num_caches;
> +
> +    num_caches = virt_get_caches(vms, caches);
>   
>       table_offsets = g_array_new(false, true /* clear */,
>                                           sizeof(uint32_t));
> @@ -1043,8 +1047,8 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>   
>       if (!vmc->no_cpu_topology) {
>           acpi_add_table(table_offsets, tables_blob);
> -        build_pptt(tables_blob, tables->linker, ms,
> -                   vms->oem_id, vms->oem_table_id, 0, NULL);
> +        build_pptt(tables_blob, tables->linker, ms, vms->oem_id,
> +                   vms->oem_table_id, num_caches, caches);
>       }
>   
>       acpi_add_table(table_offsets, tables_blob);
> diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
> index 32654dc274..a4027a2a76 100644
> --- a/include/hw/acpi/cpu.h
> +++ b/include/hw/acpi/cpu.h
> @@ -70,6 +70,16 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>   
>   void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list);
>   
> +struct CPUPPTTCaches {
> +    enum CacheType type;
> +    uint32_t sets;
> +    uint32_t size;
> +    uint32_t level;
> +    uint16_t linesize;
> +    uint8_t attributes; /* write policy: 0x0 write back, 0x1 write through */
> +    uint8_t associativity;
> +};
> +
>   extern const VMStateDescription vmstate_cpu_hotplug;
>   #define VMSTATE_CPU_HOTPLUG(cpuhp, state) \
>       VMSTATE_STRUCT(cpuhp, state, 1, \



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-10-20 14:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 14:21 [PATCH v16 0/8] Specifying cache topology on ARM Alireza Sanaee via
2025-08-27 14:21 ` [PATCH v16 1/8] target/arm/tcg: increase cache level for cpu=max Alireza Sanaee via
2025-10-20 13:51   ` Gustavo Romero
2025-08-27 14:21 ` [PATCH v16 2/8] hw/core/machine: topology functions capabilities added Alireza Sanaee via
2025-10-20 13:52   ` Gustavo Romero
2025-08-27 14:21 ` [PATCH v16 3/8] hw/arm/virt: add cache hierarchy to device tree Alireza Sanaee via
2025-10-20 14:33   ` Gustavo Romero
2025-08-27 14:21 ` [PATCH v16 4/8] bios-tables-test: prepare to change ARM ACPI virt PPTT Alireza Sanaee via
2025-08-27 14:21 ` [PATCH v16 5/8] acpi: add caches to ACPI build_pptt table function Alireza Sanaee via
2025-08-27 14:21 ` [PATCH v16 6/8] hw/acpi: add cache hierarchy to pptt table Alireza Sanaee via
2025-10-20 14:33   ` Gustavo Romero
2025-08-27 14:21 ` [PATCH v16 7/8] tests/qtest/bios-table-test: testing new ARM ACPI PPTT topology Alireza Sanaee via
2025-08-27 14:21 ` [PATCH v16 8/8] Update the ACPI tables based on new aml-build.c Alireza Sanaee via
2025-10-20 13:51 ` [PATCH v16 0/8] Specifying cache topology on ARM Gustavo Romero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).