[PATCH v6 0/4] i386: Support SMP Cache Topology

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 0/4] i386: Support SMP Cache Topology
@ 2024-12-19  8:32 Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 1/4] i386/cpu: Support thread and module level cache topology Zhao Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Zhao Liu @ 2024-12-19  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm, Zhao Liu

Hi folks,

This is my v6. since Phili has already merged the general smp cache
part, v6 just includes the remaining i386-specific changes to support
SMP cache topology for PC machine (currently all patches have got
Reviewed-by from previous review).

Compared with v5 [1], there's no change and just series just picks
the unmerged patches and rebases on the master branch (based on the
commit 8032c78e556c "Merge tag 'firmware-20241216-pull-request' of
https://gitlab.com/kraxel/qemu into staging").

The patch 4 ("i386/cpu: add has_caches flag to check smp_cache"), which
introduced a has_caches flag, is also ARM side wanted.

Though now this series targets to i386, to help review, I still include
the previous introduction about smp cache topology feature.

Background
==========

The x86 and ARM (RISCV) need to allow user to configure cache properties
(current only topology):
 * For x86, the default cache topology model (of max/host CPU) does not
   always match the Host's real physical cache topology. Performance can
   increase when the configured virtual topology is closer to the
   physical topology than a default topology would be.
 * For ARM, QEMU can't get the cache topology information from the CPU
   registers, then user configuration is necessary. Additionally, the
   cache information is also needed for MPAM emulation (for TCG) to
   build the right PPTT. (Originally from Jonathan)

About smp-cache
===============

The API design has been discussed heavily in [3].

Now, smp-cache is implemented as a array integrated in -machine. Though
-machine currently can't support JSON format, this is the one of the
directions of future.

An example is as follows:

smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die

"cache" specifies the cache that the properties will be applied on. This
field is the combination of cache level and cache type. Now it supports
"l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
cache) and "l3" (L3 unified cache).

"topology" field accepts CPU topology levels including "thread", "core",
"module", "cluster", "die", "socket", "book", "drawer" and a special
value "default".

The "default" is introduced to make it easier for libvirt to set a
default parameter value without having to care about the specific
machine (because currently there is no proper way for machine to
expose supported topology levels and caches).

If "default" is set, then the cache topology will follow the
architecture's default cache topology model. If other CPU topology level
is set, the cache will be shared at corresponding CPU topology level.

[1]: Patch v5: https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1.liu@intel.com/
[2]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20241010111822.345-1-alireza.sanaee@huawei.com/
[3]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/

Thanks and Best Regards,
Zhao
---
Alireza Sanaee (1):
  i386/cpu: add has_caches flag to check smp_cache configuration

Zhao Liu (3):
  i386/cpu: Support thread and module level cache topology
  i386/cpu: Update cache topology with machine's configuration
  i386/pc: Support cache topology in -machine for PC machine

 hw/core/machine-smp.c |  2 ++
 hw/i386/pc.c          |  4 +++
 include/hw/boards.h   |  3 ++
 qemu-options.hx       | 31 +++++++++++++++++-
 target/i386/cpu.c     | 76 ++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 111 insertions(+), 5 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/4] i386/cpu: Support thread and module level cache topology
  2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
@ 2024-12-19  8:32 ` Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 2/4] i386/cpu: Update cache topology with machine's configuration Zhao Liu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Zhao Liu @ 2024-12-19  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm, Zhao Liu, Yongwei Ma

Allow cache to be defined at the thread and module level. This
increases flexibility for x86 users to customize their cache topology.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 target/i386/cpu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 525339945920..87ffb9840cc1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -243,9 +243,15 @@ static uint32_t max_thread_ids_for_cache(X86CPUTopoInfo *topo_info,
     uint32_t num_ids = 0;
 
     switch (share_level) {
+    case CPU_TOPOLOGY_LEVEL_THREAD:
+        num_ids = 1;
+        break;
     case CPU_TOPOLOGY_LEVEL_CORE:
         num_ids = 1 << apicid_core_offset(topo_info);
         break;
+    case CPU_TOPOLOGY_LEVEL_MODULE:
+        num_ids = 1 << apicid_module_offset(topo_info);
+        break;
     case CPU_TOPOLOGY_LEVEL_DIE:
         num_ids = 1 << apicid_die_offset(topo_info);
         break;
@@ -253,10 +259,6 @@ static uint32_t max_thread_ids_for_cache(X86CPUTopoInfo *topo_info,
         num_ids = 1 << apicid_pkg_offset(topo_info);
         break;
     default:
-        /*
-         * Currently there is no use case for THREAD and MODULE, so use
-         * assert directly to facilitate debugging.
-         */
         g_assert_not_reached();
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/4] i386/cpu: Update cache topology with machine's configuration
  2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 1/4] i386/cpu: Support thread and module level cache topology Zhao Liu
@ 2024-12-19  8:32 ` Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 3/4] i386/pc: Support cache topology in -machine for PC machine Zhao Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Zhao Liu @ 2024-12-19  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm, Zhao Liu, Yongwei Ma

User will configure smp cache topology via -machine smp-cache.

For this case, update the x86 CPUs' cache topology with user's
configuration in MachineState.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Changes since Patch v3:
 * Updated MachineState.smp_cache to consume "default" level and did a
   check to ensure topological hierarchical relationships are correct.
---
 target/i386/cpu.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 87ffb9840cc1..bd5620dcc086 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7757,6 +7757,64 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu)
     cpu->hyperv_limits[2] = 0;
 }
 
+#ifndef CONFIG_USER_ONLY
+static bool x86_cpu_update_smp_cache_topo(MachineState *ms, X86CPU *cpu,
+                                          Error **errp)
+{
+    CPUX86State *env = &cpu->env;
+    CpuTopologyLevel level;
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D);
+    if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+        env->cache_info_cpuid4.l1d_cache->share_level = level;
+        env->cache_info_amd.l1d_cache->share_level = level;
+    } else {
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D,
+            env->cache_info_cpuid4.l1d_cache->share_level);
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D,
+            env->cache_info_amd.l1d_cache->share_level);
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I);
+    if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+        env->cache_info_cpuid4.l1i_cache->share_level = level;
+        env->cache_info_amd.l1i_cache->share_level = level;
+    } else {
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I,
+            env->cache_info_cpuid4.l1i_cache->share_level);
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I,
+            env->cache_info_amd.l1i_cache->share_level);
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2);
+    if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+        env->cache_info_cpuid4.l2_cache->share_level = level;
+        env->cache_info_amd.l2_cache->share_level = level;
+    } else {
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2,
+            env->cache_info_cpuid4.l2_cache->share_level);
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2,
+            env->cache_info_amd.l2_cache->share_level);
+    }
+
+    level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3);
+    if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+        env->cache_info_cpuid4.l3_cache->share_level = level;
+        env->cache_info_amd.l3_cache->share_level = level;
+    } else {
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3,
+            env->cache_info_cpuid4.l3_cache->share_level);
+        machine_set_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3,
+            env->cache_info_amd.l3_cache->share_level);
+    }
+
+    if (!machine_check_smp_cache(ms, errp)) {
+        return false;
+    }
+    return true;
+}
+#endif
+
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
     CPUState *cs = CPU(dev);
@@ -7981,6 +8039,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
 #ifndef CONFIG_USER_ONLY
     MachineState *ms = MACHINE(qdev_get_machine());
+
+    /*
+     * TODO: Add a SMPCompatProps.has_caches flag to avoid useless updates
+     * if user didn't set smp_cache.
+     */
+    if (!x86_cpu_update_smp_cache_topo(ms, cpu, errp)) {
+        return;
+    }
+
     qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
 
     if (cpu->env.features[FEAT_1_EDX] & CPUID_APIC || ms->smp.cpus > 1) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/4] i386/pc: Support cache topology in -machine for PC machine
  2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 1/4] i386/cpu: Support thread and module level cache topology Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 2/4] i386/cpu: Update cache topology with machine's configuration Zhao Liu
@ 2024-12-19  8:32 ` Zhao Liu
  2024-12-19  8:32 ` [PATCH v6 4/4] i386/cpu: add has_caches flag to check smp_cache configuration Zhao Liu
  2024-12-24 16:04 ` [PATCH v6 0/4] i386: Support SMP Cache Topology Paolo Bonzini
  4 siblings, 0 replies; 13+ messages in thread
From: Zhao Liu @ 2024-12-19  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm, Zhao Liu, Yongwei Ma

Allow user to configure l1d, l1i, l2 and l3 cache topologies for PC
machine.

Additionally, add the document of "-machine smp-cache" in
qemu-options.hx.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Changes since Patch v3:
 * Described the omitting cache will use "default" level and described
   the default cache topology model of i386 PC machine. (Daniel)
---
 hw/i386/pc.c    |  4 ++++
 qemu-options.hx | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 92047ce8c9df..7804991229f1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1797,6 +1797,10 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     mc->nvdimm_supported = true;
     mc->smp_props.dies_supported = true;
     mc->smp_props.modules_supported = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1D] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1I] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L2] = true;
+    mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L3] = true;
     mc->default_ram_id = "pc.ram";
     pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_AUTO;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index cc694d3b890c..257563437c05 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -39,7 +39,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
     "                hmat=on|off controls ACPI HMAT support (default=off)\n"
     "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
-    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
+    "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
+    "                smp-cache.0.cache=cachename,smp-cache.0.topology=topologylevel\n",
     QEMU_ARCH_ALL)
 SRST
 ``-machine [type=]name[,prop=value[,...]]``
@@ -159,6 +160,34 @@ SRST
         ::
 
             -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
+
+    ``smp-cache.0.cache=cachename,smp-cache.0.topology=topologylevel``
+        Define cache properties for SMP system.
+
+        ``cache=cachename`` specifies the cache that the properties will be
+        applied on. This field is the combination of cache level and cache
+        type. It supports ``l1d`` (L1 data cache), ``l1i`` (L1 instruction
+        cache), ``l2`` (L2 unified cache) and ``l3`` (L3 unified cache).
+
+        ``topology=topologylevel`` sets the cache topology level. It accepts
+        CPU topology levels including ``thread``, ``core``, ``module``,
+        ``cluster``, ``die``, ``socket``, ``book``, ``drawer`` and a special
+        value ``default``. If ``default`` is set, then the cache topology will
+        follow the architecture's default cache topology model. If another
+        topology level is set, the cache will be shared at corresponding CPU
+        topology level. For example, ``topology=core`` makes the cache shared
+        by all threads within a core. The omitting cache will default to using
+        the ``default`` level.
+
+        The default cache topology model for an i386 PC machine is as follows:
+        ``l1d``, ``l1i``, and ``l2`` caches are per ``core``, while the ``l3``
+        cache is per ``die``.
+
+        Example:
+
+        ::
+
+            -machine smp-cache.0.cache=l1d,smp-cache.0.topology=core,smp-cache.1.cache=l1i,smp-cache.1.topology=core
 ERST
 
 DEF("M", HAS_ARG, QEMU_OPTION_M,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/4] i386/cpu: add has_caches flag to check smp_cache configuration
  2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
                   ` (2 preceding siblings ...)
  2024-12-19  8:32 ` [PATCH v6 3/4] i386/pc: Support cache topology in -machine for PC machine Zhao Liu
@ 2024-12-19  8:32 ` Zhao Liu
  2024-12-24 16:04 ` [PATCH v6 0/4] i386: Support SMP Cache Topology Paolo Bonzini
  4 siblings, 0 replies; 13+ messages in thread
From: Zhao Liu @ 2024-12-19  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm, Zhao Liu

From: Alireza Sanaee <alireza.sanaee@huawei.com>

Add has_caches flag to SMPCompatProps, which helps in avoiding
extra checks for every single layer of caches in x86 (and ARM in
future).

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Note: Picked from Alireza's series with the changes:
 * Moved the flag to SMPCompatProps with a new name "has_caches".
   This way, it remains consistent with the function and style of
   "has_clusters" in SMPCompatProps.
 * Dropped my previous TODO with the new flag.
---
Changes since Patch v2:
 * Picked a new patch frome Alireza's ARM smp-cache series.
---
 hw/core/machine-smp.c |  2 ++
 include/hw/boards.h   |  3 +++
 target/i386/cpu.c     | 11 +++++------
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index b954eb849027..fe66961341fe 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -325,6 +325,8 @@ bool machine_parse_smp_cache(MachineState *ms,
             return false;
         }
     }
+
+    mc->smp_props.has_caches = true;
     return true;
 }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 5723ee76bdea..c647e507d1a9 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -156,6 +156,8 @@ typedef struct {
  * @modules_supported - whether modules are supported by the machine
  * @cache_supported - whether cache (l1d, l1i, l2 and l3) configuration are
  *                    supported by the machine
+ * @has_caches - whether cache properties are explicitly specified in the
+ *               user provided smp-cache configuration
  */
 typedef struct {
     bool prefer_sockets;
@@ -166,6 +168,7 @@ typedef struct {
     bool drawers_supported;
     bool modules_supported;
     bool cache_supported[CACHE_LEVEL_AND_TYPE__MAX];
+    bool has_caches;
 } SMPCompatProps;
 
 /**
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bd5620dcc086..a9700fba991f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8039,13 +8039,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
 #ifndef CONFIG_USER_ONLY
     MachineState *ms = MACHINE(qdev_get_machine());
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
 
-    /*
-     * TODO: Add a SMPCompatProps.has_caches flag to avoid useless updates
-     * if user didn't set smp_cache.
-     */
-    if (!x86_cpu_update_smp_cache_topo(ms, cpu, errp)) {
-        return;
+    if (mc->smp_props.has_caches) {
+        if (!x86_cpu_update_smp_cache_topo(ms, cpu, errp)) {
+            return;
+        }
     }
 
     qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
                   ` (3 preceding siblings ...)
  2024-12-19  8:32 ` [PATCH v6 4/4] i386/cpu: add has_caches flag to check smp_cache configuration Zhao Liu
@ 2024-12-24 16:04 ` Paolo Bonzini
  2024-12-25  3:03   ` Zhao Liu
  4 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2024-12-24 16:04 UTC (permalink / raw)
  To: Zhao Liu, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Alireza Sanaee, Sia Jee Heng
  Cc: qemu-devel, kvm

On 12/19/24 09:32, Zhao Liu wrote:
> Hi folks,
> 
> This is my v6. since Phili has already merged the general smp cache
> part, v6 just includes the remaining i386-specific changes to support
> SMP cache topology for PC machine (currently all patches have got
> Reviewed-by from previous review).
> 
> Compared with v5 [1], there's no change and just series just picks
> the unmerged patches and rebases on the master branch (based on the
> commit 8032c78e556c "Merge tag 'firmware-20241216-pull-request' of
> https://gitlab.com/kraxel/qemu into staging").
> 
> The patch 4 ("i386/cpu: add has_caches flag to check smp_cache"), which
> introduced a has_caches flag, is also ARM side wanted.
> 
> Though now this series targets to i386, to help review, I still include
> the previous introduction about smp cache topology feature.
> 
> 
> Background
> ==========
> 
> The x86 and ARM (RISCV) need to allow user to configure cache properties
> (current only topology):
>   * For x86, the default cache topology model (of max/host CPU) does not
>     always match the Host's real physical cache topology. Performance can
>     increase when the configured virtual topology is closer to the
>     physical topology than a default topology would be.
>   * For ARM, QEMU can't get the cache topology information from the CPU
>     registers, then user configuration is necessary. Additionally, the
>     cache information is also needed for MPAM emulation (for TCG) to
>     build the right PPTT. (Originally from Jonathan)
> 
> 
> About smp-cache
> ===============
> 
> The API design has been discussed heavily in [3].
> 
> Now, smp-cache is implemented as a array integrated in -machine. Though
> -machine currently can't support JSON format, this is the one of the
> directions of future.
> 
> An example is as follows:
> 
> smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> 
> "cache" specifies the cache that the properties will be applied on. This
> field is the combination of cache level and cache type. Now it supports
> "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> cache) and "l3" (L3 unified cache).
> 
> "topology" field accepts CPU topology levels including "thread", "core",
> "module", "cluster", "die", "socket", "book", "drawer" and a special
> value "default".

Looks good; just one thing, does "thread" make sense?  I think that it's 
almost by definition that threads within a core share all caches, but 
maybe I'm missing some hardware configurations.

Paolo

> The "default" is introduced to make it easier for libvirt to set a
> default parameter value without having to care about the specific
> machine (because currently there is no proper way for machine to
> expose supported topology levels and caches).
> 
> If "default" is set, then the cache topology will follow the
> architecture's default cache topology model. If other CPU topology level
> is set, the cache will be shared at corresponding CPU topology level.
> 
> 
> [1]: Patch v5: https://lore.kernel.org/qemu-devel/20241101083331.340178-1-zhao1.liu@intel.com/
> [2]: ARM smp-cache: https://lore.kernel.org/qemu-devel/20241010111822.345-1-alireza.sanaee@huawei.com/
> [3]: API disscussion: https://lore.kernel.org/qemu-devel/8734ndj33j.fsf@pond.sub.org/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Alireza Sanaee (1):
>    i386/cpu: add has_caches flag to check smp_cache configuration
> 
> Zhao Liu (3):
>    i386/cpu: Support thread and module level cache topology
>    i386/cpu: Update cache topology with machine's configuration
>    i386/pc: Support cache topology in -machine for PC machine
> 
>   hw/core/machine-smp.c |  2 ++
>   hw/i386/pc.c          |  4 +++
>   include/hw/boards.h   |  3 ++
>   qemu-options.hx       | 31 +++++++++++++++++-
>   target/i386/cpu.c     | 76 ++++++++++++++++++++++++++++++++++++++++---
>   5 files changed, 111 insertions(+), 5 deletions(-)
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2024-12-24 16:04 ` [PATCH v6 0/4] i386: Support SMP Cache Topology Paolo Bonzini
@ 2024-12-25  3:03   ` Zhao Liu
  2025-01-02 14:57     ` Alireza Sanaee via
  0 siblings, 1 reply; 13+ messages in thread
From: Zhao Liu @ 2024-12-25  3:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel P . Berrangé, Igor Mammedov, Eduardo Habkost,
	Marcel Apfelbaum, Philippe Mathieu-Daudé, Yanan Wang,
	Michael S . Tsirkin, Richard Henderson, Jonathan Cameron,
	Alireza Sanaee, Sia Jee Heng, qemu-devel, kvm

> > About smp-cache
> > ===============
> > 
> > The API design has been discussed heavily in [3].
> > 
> > Now, smp-cache is implemented as a array integrated in -machine. Though
> > -machine currently can't support JSON format, this is the one of the
> > directions of future.
> > 
> > An example is as follows:
> > 
> > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > 
> > "cache" specifies the cache that the properties will be applied on. This
> > field is the combination of cache level and cache type. Now it supports
> > "l1d" (L1 data cache), "l1i" (L1 instruction cache), "l2" (L2 unified
> > cache) and "l3" (L3 unified cache).
> > 
> > "topology" field accepts CPU topology levels including "thread", "core",
> > "module", "cluster", "die", "socket", "book", "drawer" and a special
> > value "default".
> 
> Looks good; just one thing, does "thread" make sense?  I think that it's
> almost by definition that threads within a core share all caches, but maybe
> I'm missing some hardware configurations.

Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
level cache.

I considered the thread case is that it could be used for vCPU
scheduling optimization (although I haven't rigorously tested the actual
impact). Without CPU affinity, tasks in Linux are generally distributed
evenly across different cores (for example, vCPU0 on Core 0, vCPU1 on
Core 1). In this case, the thread-level cache settings are closer to the
actual situation, with vCPU0 occupying the L1/L2 of Host core0 and vCPU1
occupying the L1/L2 of Host core1.


  ┌───┐        ┌───┐
  vCPU0        vCPU1
  │   │        │   │
  └───┘        └───┘
 ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐
 ││T0 ││T1 ││ ││T2 ││T3 ││
 │└───┘└───┘│ │└───┘└───┘│
 └────C0────┘ └────C1────┘


The L2 cache topology affects performance, and cluster-aware scheduling
feature in the Linux kernel will try to schedule tasks on the same L2
cache. So, in cases like the figure above, setting the L2 cache to be
per thread should, in principle, be better.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2024-12-25  3:03   ` Zhao Liu
@ 2025-01-02 14:57     ` Alireza Sanaee via
  2025-01-02 17:09       ` Rob Herring
  0 siblings, 1 reply; 13+ messages in thread
From: Alireza Sanaee via @ 2025-01-02 14:57 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Sia Jee Heng, qemu-devel, kvm, robh

On Wed, 25 Dec 2024 11:03:42 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> > > About smp-cache
> > > ===============
> > > 
> > > The API design has been discussed heavily in [3].
> > > 
> > > Now, smp-cache is implemented as a array integrated in -machine.
> > > Though -machine currently can't support JSON format, this is the
> > > one of the directions of future.
> > > 
> > > An example is as follows:
> > > 
> > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > 
> > > "cache" specifies the cache that the properties will be applied
> > > on. This field is the combination of cache level and cache type.
> > > Now it supports "l1d" (L1 data cache), "l1i" (L1 instruction
> > > cache), "l2" (L2 unified cache) and "l3" (L3 unified cache).
> > > 
> > > "topology" field accepts CPU topology levels including "thread",
> > > "core", "module", "cluster", "die", "socket", "book", "drawer"
> > > and a special value "default".  
> > 
> > Looks good; just one thing, does "thread" make sense?  I think that
> > it's almost by definition that threads within a core share all
> > caches, but maybe I'm missing some hardware configurations.  
> 
> Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
> level cache.

Hi Zhao and Paolo,

While the example looks OK to me, and makes sense. But would be curious
to know more scenarios where I can legitimately see benefit there.

I am wrestling with this point on ARM too. If I were to
have device trees describing caches in a way that threads get their own
private caches then this would not be possible to be
described via device tree due to spec limitations (+CCed Rob) if I
understood correctly.

Thanks,
Alireza

> 
> I considered the thread case is that it could be used for vCPU
> scheduling optimization (although I haven't rigorously tested the
> actual impact). Without CPU affinity, tasks in Linux are generally
> distributed evenly across different cores (for example, vCPU0 on Core
> 0, vCPU1 on Core 1). In this case, the thread-level cache settings
> are closer to the actual situation, with vCPU0 occupying the L1/L2 of
> Host core0 and vCPU1 occupying the L1/L2 of Host core1.
> 
> 
>   ┌───┐        ┌───┐
>   vCPU0        vCPU1
>   │   │        │   │
>   └───┘        └───┘
>  ┌┌───┐┌───┐┐ ┌┌───┐┌───┐┐
>  ││T0 ││T1 ││ ││T2 ││T3 ││
>  │└───┘└───┘│ │└───┘└───┘│
>  └────C0────┘ └────C1────┘
> 
> 
> The L2 cache topology affects performance, and cluster-aware
> scheduling feature in the Linux kernel will try to schedule tasks on
> the same L2 cache. So, in cases like the figure above, setting the L2
> cache to be per thread should, in principle, be better.
> 
> Thanks,
> Zhao
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2025-01-02 14:57     ` Alireza Sanaee via
@ 2025-01-02 17:09       ` Rob Herring
  2025-01-02 18:01         ` Alireza Sanaee via
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Herring @ 2025-01-02 17:09 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: Zhao Liu, Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Sia Jee Heng, qemu-devel, kvm

On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
>
> On Wed, 25 Dec 2024 11:03:42 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
>
> > > > About smp-cache
> > > > ===============
> > > >
> > > > The API design has been discussed heavily in [3].
> > > >
> > > > Now, smp-cache is implemented as a array integrated in -machine.
> > > > Though -machine currently can't support JSON format, this is the
> > > > one of the directions of future.
> > > >
> > > > An example is as follows:
> > > >
> > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > >
> > > > "cache" specifies the cache that the properties will be applied
> > > > on. This field is the combination of cache level and cache type.
> > > > Now it supports "l1d" (L1 data cache), "l1i" (L1 instruction
> > > > cache), "l2" (L2 unified cache) and "l3" (L3 unified cache).
> > > >
> > > > "topology" field accepts CPU topology levels including "thread",
> > > > "core", "module", "cluster", "die", "socket", "book", "drawer"
> > > > and a special value "default".
> > >
> > > Looks good; just one thing, does "thread" make sense?  I think that
> > > it's almost by definition that threads within a core share all
> > > caches, but maybe I'm missing some hardware configurations.
> >
> > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has thread
> > level cache.
>
> Hi Zhao and Paolo,
>
> While the example looks OK to me, and makes sense. But would be curious
> to know more scenarios where I can legitimately see benefit there.
>
> I am wrestling with this point on ARM too. If I were to
> have device trees describing caches in a way that threads get their own
> private caches then this would not be possible to be
> described via device tree due to spec limitations (+CCed Rob) if I
> understood correctly.

You asked me for the opposite though, and I described how you can
share the cache. If you want a cache per thread, then you probably
want a node per thread.

Rob


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2025-01-02 17:09       ` Rob Herring
@ 2025-01-02 18:01         ` Alireza Sanaee via
  2025-01-03  8:25           ` Zhao Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Alireza Sanaee via @ 2025-01-02 18:01 UTC (permalink / raw)
  To: Rob Herring
  Cc: Zhao Liu, Paolo Bonzini, Daniel P . Berrangé, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Sia Jee Heng, qemu-devel, kvm

On Thu, 2 Jan 2025 11:09:51 -0600
Rob Herring <robh@kernel.org> wrote:

> On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> <alireza.sanaee@huawei.com> wrote:
> >
> > On Wed, 25 Dec 2024 11:03:42 +0800
> > Zhao Liu <zhao1.liu@intel.com> wrote:
> >  
> > > > > About smp-cache
> > > > > ===============
> > > > >
> > > > > The API design has been discussed heavily in [3].
> > > > >
> > > > > Now, smp-cache is implemented as a array integrated in
> > > > > -machine. Though -machine currently can't support JSON
> > > > > format, this is the one of the directions of future.
> > > > >
> > > > > An example is as follows:
> > > > >
> > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > >
> > > > > "cache" specifies the cache that the properties will be
> > > > > applied on. This field is the combination of cache level and
> > > > > cache type. Now it supports "l1d" (L1 data cache), "l1i" (L1
> > > > > instruction cache), "l2" (L2 unified cache) and "l3" (L3
> > > > > unified cache).
> > > > >
> > > > > "topology" field accepts CPU topology levels including
> > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > "book", "drawer" and a special value "default".  
> > > >
> > > > Looks good; just one thing, does "thread" make sense?  I think
> > > > that it's almost by definition that threads within a core share
> > > > all caches, but maybe I'm missing some hardware configurations.
> > > >  
> > >
> > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > thread level cache.  
> >
> > Hi Zhao and Paolo,
> >
> > While the example looks OK to me, and makes sense. But would be
> > curious to know more scenarios where I can legitimately see benefit
> > there.
> >
> > I am wrestling with this point on ARM too. If I were to
> > have device trees describing caches in a way that threads get their
> > own private caches then this would not be possible to be
> > described via device tree due to spec limitations (+CCed Rob) if I
> > understood correctly.  
> 
> You asked me for the opposite though, and I described how you can
> share the cache. If you want a cache per thread, then you probably
> want a node per thread.
> 
> Rob
> 

Hi Rob,

That's right, I made the mistake in my prior message, and you recalled
correctly. I wanted shared caches between two threads, though I have
missed your answer before, just found it.

Thanks,
Alireza


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2025-01-02 18:01         ` Alireza Sanaee via
@ 2025-01-03  8:25           ` Zhao Liu
  2025-01-03 14:04             ` Alireza Sanaee via
  0 siblings, 1 reply; 13+ messages in thread
From: Zhao Liu @ 2025-01-03  8:25 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: Rob Herring, Paolo Bonzini, Daniel P . Berrang�,
	Igor Mammedov, Eduardo Habkost, Marcel Apfelbaum,
	Philippe Mathieu-Daud�, Yanan Wang, Michael S . Tsirkin,
	Richard Henderson, Jonathan Cameron, Sia Jee Heng, qemu-devel,
	kvm

On Thu, Jan 02, 2025 at 06:01:41PM +0000, Alireza Sanaee wrote:
> Date: Thu, 2 Jan 2025 18:01:41 +0000
> From: Alireza Sanaee <alireza.sanaee@huawei.com>
> Subject: Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32)
> 
> On Thu, 2 Jan 2025 11:09:51 -0600
> Rob Herring <robh@kernel.org> wrote:
> 
> > On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> > <alireza.sanaee@huawei.com> wrote:
> > >
> > > On Wed, 25 Dec 2024 11:03:42 +0800
> > > Zhao Liu <zhao1.liu@intel.com> wrote:
> > >  
> > > > > > About smp-cache
> > > > > > ===============
> > > > > >
> > > > > > The API design has been discussed heavily in [3].
> > > > > >
> > > > > > Now, smp-cache is implemented as a array integrated in
> > > > > > -machine. Though -machine currently can't support JSON
> > > > > > format, this is the one of the directions of future.
> > > > > >
> > > > > > An example is as follows:
> > > > > >
> > > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > > >
> > > > > > "cache" specifies the cache that the properties will be
> > > > > > applied on. This field is the combination of cache level and
> > > > > > cache type. Now it supports "l1d" (L1 data cache), "l1i" (L1
> > > > > > instruction cache), "l2" (L2 unified cache) and "l3" (L3
> > > > > > unified cache).
> > > > > >
> > > > > > "topology" field accepts CPU topology levels including
> > > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > > "book", "drawer" and a special value "default".  
> > > > >
> > > > > Looks good; just one thing, does "thread" make sense?  I think
> > > > > that it's almost by definition that threads within a core share
> > > > > all caches, but maybe I'm missing some hardware configurations.
> > > > >  
> > > >
> > > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > > thread level cache.  
> > >
> > > Hi Zhao and Paolo,
> > >
> > > While the example looks OK to me, and makes sense. But would be
> > > curious to know more scenarios where I can legitimately see benefit
> > > there.
> > >
> > > I am wrestling with this point on ARM too. If I were to
> > > have device trees describing caches in a way that threads get their
> > > own private caches then this would not be possible to be
> > > described via device tree due to spec limitations (+CCed Rob) if I
> > > understood correctly.  
> > 
> > You asked me for the opposite though, and I described how you can
> > share the cache. If you want a cache per thread, then you probably
> > want a node per thread.
> > 
> > Rob
> > 
> 
> Hi Rob,
> 
> That's right, I made the mistake in my prior message, and you recalled
> correctly. I wanted shared caches between two threads, though I have
> missed your answer before, just found it.
> 

Thank you all!

Alireza, do you know how to configure arm node through QEMU options?

However, IIUC, arm needs more effort to configure cache per thread (by
configuring node topology)...In that case, since no one has explicitly
requested the need for cache per thread, I can disable cache per thread
for now. I can return an error for this scenario during the general
smp-cache option parsing (in the future, if there is a real need, it can
be easily re-enabled).

Will drop cache per thread in the next version.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2025-01-03  8:25           ` Zhao Liu
@ 2025-01-03 14:04             ` Alireza Sanaee via
  2025-01-03 15:50               ` Zhao Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Alireza Sanaee via @ 2025-01-03 14:04 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Rob Herring, Paolo Bonzini, Daniel P . Berrang, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daud,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Sia Jee Heng, qemu-devel, kvm

On Fri, 3 Jan 2025 16:25:58 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> On Thu, Jan 02, 2025 at 06:01:41PM +0000, Alireza Sanaee wrote:
> > Date: Thu, 2 Jan 2025 18:01:41 +0000
> > From: Alireza Sanaee <alireza.sanaee@huawei.com>
> > Subject: Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
> > X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32)
> > 
> > On Thu, 2 Jan 2025 11:09:51 -0600
> > Rob Herring <robh@kernel.org> wrote:
> >   
> > > On Thu, Jan 2, 2025 at 8:57 AM Alireza Sanaee
> > > <alireza.sanaee@huawei.com> wrote:  
> > > >
> > > > On Wed, 25 Dec 2024 11:03:42 +0800
> > > > Zhao Liu <zhao1.liu@intel.com> wrote:
> > > >    
> > > > > > > About smp-cache
> > > > > > > ===============
> > > > > > >
> > > > > > > The API design has been discussed heavily in [3].
> > > > > > >
> > > > > > > Now, smp-cache is implemented as a array integrated in
> > > > > > > -machine. Though -machine currently can't support JSON
> > > > > > > format, this is the one of the directions of future.
> > > > > > >
> > > > > > > An example is as follows:
> > > > > > >
> > > > > > > smp_cache=smp-cache.0.cache=l1i,smp-cache.0.topology=core,smp-cache.1.cache=l1d,smp-cache.1.topology=core,smp-cache.2.cache=l2,smp-cache.2.topology=module,smp-cache.3.cache=l3,smp-cache.3.topology=die
> > > > > > >
> > > > > > > "cache" specifies the cache that the properties will be
> > > > > > > applied on. This field is the combination of cache level
> > > > > > > and cache type. Now it supports "l1d" (L1 data cache),
> > > > > > > "l1i" (L1 instruction cache), "l2" (L2 unified cache) and
> > > > > > > "l3" (L3 unified cache).
> > > > > > >
> > > > > > > "topology" field accepts CPU topology levels including
> > > > > > > "thread", "core", "module", "cluster", "die", "socket",
> > > > > > > "book", "drawer" and a special value "default".    
> > > > > >
> > > > > > Looks good; just one thing, does "thread" make sense?  I
> > > > > > think that it's almost by definition that threads within a
> > > > > > core share all caches, but maybe I'm missing some hardware
> > > > > > configurations. 
> > > > >
> > > > > Hi Paolo, merry Christmas. Yes, AFAIK, there's no hardware has
> > > > > thread level cache.    
> > > >
> > > > Hi Zhao and Paolo,
> > > >
> > > > While the example looks OK to me, and makes sense. But would be
> > > > curious to know more scenarios where I can legitimately see
> > > > benefit there.
> > > >
> > > > I am wrestling with this point on ARM too. If I were to
> > > > have device trees describing caches in a way that threads get
> > > > their own private caches then this would not be possible to be
> > > > described via device tree due to spec limitations (+CCed Rob)
> > > > if I understood correctly.    
> > > 
> > > You asked me for the opposite though, and I described how you can
> > > share the cache. If you want a cache per thread, then you probably
> > > want a node per thread.
> > > 
> > > Rob
> > >   
> > 
> > Hi Rob,
> > 
> > That's right, I made the mistake in my prior message, and you
> > recalled correctly. I wanted shared caches between two threads,
> > though I have missed your answer before, just found it.
> >   
> 
> Thank you all!
> 
> Alireza, do you know how to configure arm node through QEMU options?

Hi Zhao, do you mean the -smp param?
> 
> However, IIUC, arm needs more effort to configure cache per thread (by
> configuring node topology)...In that case, since no one has explicitly
> requested the need for cache per thread, I can disable cache per
> thread for now. I can return an error for this scenario during the
> general smp-cache option parsing (in the future, if there is a real
> need, it can be easily re-enabled).
> 
> Will drop cache per thread in the next version.
> 
> Thanks,
> Zhao
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/4] i386: Support SMP Cache Topology
  2025-01-03 14:04             ` Alireza Sanaee via
@ 2025-01-03 15:50               ` Zhao Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Zhao Liu @ 2025-01-03 15:50 UTC (permalink / raw)
  To: Alireza Sanaee
  Cc: Rob Herring, Paolo Bonzini, Daniel P . Berrang, Igor Mammedov,
	Eduardo Habkost, Marcel Apfelbaum, Philippe Mathieu-Daud,
	Yanan Wang, Michael S . Tsirkin, Richard Henderson,
	Jonathan Cameron, Sia Jee Heng, qemu-devel, kvm

> > > > You asked me for the opposite though, and I described how you can
> > > > share the cache. If you want a cache per thread, then you probably
> > > > want a node per thread.
> > > > 
> > > > Rob
> > > >   
> > > 
> > > Hi Rob,
> > > 
> > > That's right, I made the mistake in my prior message, and you
> > > recalled correctly. I wanted shared caches between two threads,
> > > though I have missed your answer before, just found it.
> > >   
> > 
> > Thank you all!
> > 
> > Alireza, do you know how to configure arm node through QEMU options?
> 
> Hi Zhao, do you mean the -smp param?
>

I mean do you know how to configure something like "a node per thread"
by QEMU option? :-) I'm curious about the relationship between "node"
and the SMP topology on the ARM side in the current QEMU. I'm not sure
if this "node" refers to the NUMA node.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-03 15:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-19  8:32 [PATCH v6 0/4] i386: Support SMP Cache Topology Zhao Liu
2024-12-19  8:32 ` [PATCH v6 1/4] i386/cpu: Support thread and module level cache topology Zhao Liu
2024-12-19  8:32 ` [PATCH v6 2/4] i386/cpu: Update cache topology with machine's configuration Zhao Liu
2024-12-19  8:32 ` [PATCH v6 3/4] i386/pc: Support cache topology in -machine for PC machine Zhao Liu
2024-12-19  8:32 ` [PATCH v6 4/4] i386/cpu: add has_caches flag to check smp_cache configuration Zhao Liu
2024-12-24 16:04 ` [PATCH v6 0/4] i386: Support SMP Cache Topology Paolo Bonzini
2024-12-25  3:03   ` Zhao Liu
2025-01-02 14:57     ` Alireza Sanaee via
2025-01-02 17:09       ` Rob Herring
2025-01-02 18:01         ` Alireza Sanaee via
2025-01-03  8:25           ` Zhao Liu
2025-01-03 14:04             ` Alireza Sanaee via
2025-01-03 15:50               ` Zhao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).