qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/16] Support smp.clusters for x86 in QEMU
@ 2024-01-08  8:27 Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
                   ` (16 more replies)
  0 siblings, 17 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu

From: Zhao Liu <zhao1.liu@intel.com>

Hi list,

This is the our v7 patch series, rebased on the master branch at the
commit d328fef93ae7 ("Merge tag 'pull-20231230' of
https://gitlab.com/rth7680/qemu into staging").

No more change since v6 [1] exclude the comment nit update.

Welcome your comments!


PS: Since v5, we have dropped "x-l2-cache-topo" option and now are
working on porting the original x-l2-cache-topo option to smp [2].
Just like:

-smp cpus=4,sockets=2,cores=2,threads=1, \
     l3-cache=socket,l2-cache=core,l1-i-cache=core,l1-d-cache=core

The cache topology enhancement in this patch set is the preparation for
supporting future user-configurable cache topology (via generic cli
interface).


---
# Introduction

This series adds the cluster support for x86 PC machine, which allows
x86 can use smp.clusters to configure the module level CPU topology
of x86.

This series also is the preparation to help x86 to define the more
flexible cache topology, such as having multiple cores share the
same L2 cache at cluster level. (That was what x-l2-cache-topo did,
and we will explore a generic way.)

About why we don't share L2 cache at cluster and need a configuration
way, pls see section: ## Why not share L2 cache in cluster directly.


# Background

The "clusters" parameter in "smp" is introduced by ARM [3], but x86
hasn't supported it.

At present, x86 defaults L2 cache is shared in one core, but this is
not enough. There're some platforms that multiple cores share the
same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
Atom cores [4], that is, every four Atom cores shares one L2 cache.
Therefore, we need the new CPU topology level (cluster/module).

Another reason is for hybrid architecture. cluster support not only
provides another level of topology definition in x86, but would also
provide required code change for future our hybrid topology support.


# Overview

## Introduction of module level for x86

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [4],
which is above the core level. So use the "module" other than "cluster"
in x86 code.

And please note that x86 already has a cpu topology level also named
"cluster" [5], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.


## Why not share L2 cache in cluster directly

Though "clusters" was introduced to help define L2 cache topology
[3], using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.


## module level in CPUID

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] (for Intel CPUs) if the machine
has more than 1 modules since v3.


## New cache topology info in CPUCacheInfo

(This is in preparation for users being able to configure cache topology
from the cli later on.)

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache [3].

Thus, in this patch set, we explicitly define the corresponding cache
topology for different cpu models and this has two benefits:
1. Easy to expand to new CPU models in the future, which has different
   cache topology.
2. It can easily support custom cache topology by some command.


# Patch description

patch 1 Fixes about x86 topology and Intel l1 cache topology.

patch 2-3 Cleanups about topology related CPUID encoding and QEMU
          topology variables.

patch 4-5 Refactor CPUID[0x1F] encoding to prepare to introduce module
          level.

patch 6-12 Add the module as the new CPU topology level in x86, and it
            is corresponding to the cluster level in generic code.

patch 13,14,16 Add cache topology information in cache models.

patch 15 Update AMD CPUs' cache topology encoding.


[1]: https://lore.kernel.org/qemu-devel/20231117075106.432499-1-zhao1.liu@linux.intel.com/
[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01954.html
[3]: https://patchew.org/QEMU/20211228092221.21068-1-wangyanan55@huawei.com/
[4]: https://www.intel.com/content/www/us/en/products/platforms/details/alder-lake-p.html
[5]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.

Best Regards,
Zhao
---
Changelog:

Changes since v6:
 * Update the comment when check cluster-id. Since there's no
   v8.2, the cluster-id support should at least start from v9.0.
 * Rebase on commit d328fef93ae7 ("Merge tag 'pull-20231230' of
   https://gitlab.com/rth7680/qemu into staging").

Changes since v5:
 * The first four patches of v5 [1] have been merged, v6 contains
   the remaining patches.
 * Reabse on the latest master.
 * Update the comment when check cluster-id. Since current QEMU is
   v8.2, the cluster-id support should at least start from v8.3.

Changes since v4:
 * Drop the "x-l2-cache-topo" option. (Michael)
 * Add A/R/T tags.

Changes since v3 (main changes):
 * Expose module level in CPUID[0x1F].
 * Fix compile warnings. (Babu)
 * Fixes cache topology uninitialization bugs for some AMD CPUs. (Babu)

Changes since v2:
 * Add "Tested-by", "Reviewed-by" and "ACKed-by" tags.
 * Use newly added wrapped helper to get cores per socket in
   qemu_init_vcpu().

Changes since v1:
 * Reordered patches. (Yanan)
 * Deprecated the patch to fix comment of machine_parse_smp_config().
   (Yanan)
 * Rename test-x86-cpuid.c to test-x86-topo.c. (Yanan)
 * Split the intel's l1 cache topology fix into a new separate patch.
   (Yanan)
 * Combined module_id and APIC ID for module level support into one
   patch. (Yanan)
 * Make cache_into_passthrough case of cpuid 0x04 leaf in
 * cpu_x86_cpuid() use max_processor_ids_for_cache() and
   max_core_ids_in_package() to encode CPUID[4]. (Yanan)
 * Add the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)

---
Zhao Liu (10):
  i386/cpu: Fix i/d-cache topology to core level for Intel CPU
  i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  i386: Split topology types of CPUID[0x1F] from the definitions of
    CPUID[0xB]
  i386: Decouple CPUID[0x1F] subleaf with specific topology level
  i386: Expose module level in CPUID[0x1F]
  i386: Add cache topology info in CPUCacheInfo
  i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits
    25:14]
  i386: Use CPUCacheInfo.share_level to encode
    CPUID[0x8000001D].EAX[bits 25:14]

Zhuocheng Ding (6):
  i386: Introduce module-level cpu topology to CPUX86State
  i386: Support modules_per_die in X86CPUTopoInfo
  i386: Support module_id in X86CPUTopoIDs
  i386/cpu: Introduce cluster-id to X86CPU
  tests: Add test case of APIC ID for module level parsing
  hw/i386/pc: Support smp.clusters for x86 PC machine

 hw/i386/pc.c               |   1 +
 hw/i386/x86.c              |  49 ++++++-
 include/hw/i386/topology.h |  35 ++++-
 qemu-options.hx            |  10 +-
 target/i386/cpu.c          | 289 +++++++++++++++++++++++++++++--------
 target/i386/cpu.h          |  43 +++++-
 target/i386/kvm/kvm.c      |   2 +-
 tests/unit/test-x86-topo.c |  56 ++++---
 8 files changed, 379 insertions(+), 106 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Robert Hoo, Xiaoyao Li, Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
25:14]) to 0, and this means i-cache and d-cache are shared in the SMT
level.

This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x8000001D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Change the description of current i/d cache encoding status to avoid
   misleading to "architectural rules". (Xiaoyao)

Changes since v1:
 * Split this fix from the patch named "i386/cpu: Fix number of
   addressable IDs in CPUID.04H".
 * Add the explanation of the impact on performance. (Xiaoyao)
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 95d5f16cd5eb..5a3678a789cf 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6113,12 +6113,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             switch (count) {
             case 0: /* L1 dcache info */
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-                                    1, cs->nr_cores,
+                                    cs->nr_threads, cs->nr_cores,
                                     eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-                                    1, cs->nr_cores,
+                                    cs->nr_threads, cs->nr_cores,
                                     eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-10  9:31   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Robert Hoo, Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.

The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset (like L3 topology using 1 << die_offset [3]).

But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".

Therefore the offset of APIC ID should be preferred to calculate nearest
power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
   instand of "cs->nr_threads" in encode_cache_cpuid4() for
   CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
   1 << core_offset should also be used instand of "cs->nr_threads" in
   encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
   calculated with the bit width between the Package and SMT levels in
   the APIC ID (1 << (pkg_offset - core_offset) - 1).

In addition, use APIC ID offset to replace "pow2ceil()" for
cache_info_passthrough case.

[1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
[2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
[3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Fix compile warnings. (Babu)
 * Fix spelling typo.

Changes since v1:
 * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
   case. (Yanan)
 * Split the L1 cache fix into a separate patch.
 * Rename the title of this patch (the original is "i386/cpu: Fix number
   of addressable IDs in CPUID.04H").
---
 target/i386/cpu.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5a3678a789cf..c8d2a585723a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 {
     X86CPU *cpu = env_archcpu(env);
     CPUState *cs = env_cpu(env);
-    uint32_t die_offset;
     uint32_t limit;
     uint32_t signature[3];
     X86CPUTopoInfo topo_info;
@@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
                 int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
                 if (cs->nr_cores > 1) {
+                    int addressable_cores_offset =
+                                                apicid_pkg_offset(&topo_info) -
+                                                apicid_core_offset(&topo_info);
+
                     *eax &= ~0xFC000000;
-                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
+                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
                 }
                 if (host_vcpus_per_cache > vcpus_per_socket) {
+                    int pkg_offset = apicid_pkg_offset(&topo_info);
+
                     *eax &= ~0x3FFC000;
-                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
+                    *eax |= (1 << (pkg_offset - 1)) << 14;
                 }
             }
         } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
             *eax = *ebx = *ecx = *edx = 0;
         } else {
             *eax = 0;
+            int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
+                                           apicid_core_offset(&topo_info);
+            int core_offset, die_offset;
+
             switch (count) {
             case 0: /* L1 dcache info */
+                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-                                    cs->nr_threads, cs->nr_cores,
+                                    (1 << core_offset),
+                                    (1 << addressable_cores_offset),
                                     eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
+                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-                                    cs->nr_threads, cs->nr_cores,
+                                    (1 << core_offset),
+                                    (1 << addressable_cores_offset),
                                     eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */
+                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
-                                    cs->nr_threads, cs->nr_cores,
+                                    (1 << core_offset),
+                                    (1 << addressable_cores_offset),
                                     eax, ebx, ecx, edx);
                 break;
             case 3: /* L3 cache info */
                 die_offset = apicid_die_offset(&topo_info);
                 if (cpu->enable_l3_cache) {
                     encode_cache_cpuid4(env->cache_info_cpuid4.l3_cache,
-                                        (1 << die_offset), cs->nr_cores,
+                                        (1 << die_offset),
+                                        (1 << addressable_cores_offset),
                                         eax, ebx, ecx, edx);
                     break;
                 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-10 11:52   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] Zhao Liu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Robert Hoo, Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores/cs->nr_threads.

Since the names of cs->nr_cores/cs->nr_threads does not accurately
represent its meaning, the use of cs->nr_cores/cs->nr_threads is prone
to confusion and mistakes.

And the structure X86CPUTopoInfo names its members clearly, thus the
variable "topo_info" should be preferred.

In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.

Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Fix typo. (Babu)

Changes since v1:
 * Extract cores_per_socket from the code block and use it as a local
   variable for cpu_x86_cpuid(). (Yanan)
 * Remove vcpus_per_socket variable and use cpus_per_pkg directly.
   (Yanan)
 * Replace env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
---
 target/i386/cpu.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index c8d2a585723a..6f8fa772ecf8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6017,11 +6017,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
     uint32_t limit;
     uint32_t signature[3];
     X86CPUTopoInfo topo_info;
+    uint32_t cores_per_pkg;
+    uint32_t cpus_per_pkg;
 
     topo_info.dies_per_pkg = env->nr_dies;
     topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
     topo_info.threads_per_core = cs->nr_threads;
 
+    cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
+    cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
+
     /* Calculate & apply limits for different index ranges */
     if (index >= 0xC0000000) {
         limit = env->cpuid_xlevel2;
@@ -6057,8 +6062,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             *ecx |= CPUID_EXT_OSXSAVE;
         }
         *edx = env->features[FEAT_1_EDX];
-        if (cs->nr_cores * cs->nr_threads > 1) {
-            *ebx |= (cs->nr_cores * cs->nr_threads) << 16;
+        if (cpus_per_pkg > 1) {
+            *ebx |= cpus_per_pkg << 16;
             *edx |= CPUID_HT;
         }
         if (!cpu->enable_pmu) {
@@ -6095,8 +6100,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
              */
             if (*eax & 31) {
                 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
-                int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
-                if (cs->nr_cores > 1) {
+
+                if (cores_per_pkg > 1) {
                     int addressable_cores_offset =
                                                 apicid_pkg_offset(&topo_info) -
                                                 apicid_core_offset(&topo_info);
@@ -6104,7 +6109,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                     *eax &= ~0xFC000000;
                     *eax |= (1 << (addressable_cores_offset - 1)) << 26;
                 }
-                if (host_vcpus_per_cache > vcpus_per_socket) {
+                if (host_vcpus_per_cache > cpus_per_pkg) {
                     int pkg_offset = apicid_pkg_offset(&topo_info);
 
                     *eax &= ~0x3FFC000;
@@ -6249,12 +6254,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         switch (count) {
         case 0:
             *eax = apicid_core_offset(&topo_info);
-            *ebx = cs->nr_threads;
+            *ebx = topo_info.threads_per_core;
             *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
             break;
         case 1:
             *eax = apicid_pkg_offset(&topo_info);
-            *ebx = cs->nr_cores * cs->nr_threads;
+            *ebx = cpus_per_pkg;
             *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
             break;
         default:
@@ -6274,7 +6279,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         break;
     case 0x1F:
         /* V2 Extended Topology Enumeration Leaf */
-        if (env->nr_dies < 2) {
+        if (topo_info.dies_per_pkg < 2) {
             *eax = *ebx = *ecx = *edx = 0;
             break;
         }
@@ -6284,7 +6289,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         switch (count) {
         case 0:
             *eax = apicid_core_offset(&topo_info);
-            *ebx = cs->nr_threads;
+            *ebx = topo_info.threads_per_core;
             *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
             break;
         case 1:
@@ -6294,7 +6299,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         case 2:
             *eax = apicid_pkg_offset(&topo_info);
-            *ebx = cs->nr_cores * cs->nr_threads;
+            *ebx = cpus_per_pkg;
             *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
             break;
         default:
@@ -6518,7 +6523,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
          * discards multiple thread information if it is set.
          * So don't set it here for Intel to make Linux guests happy.
          */
-        if (cs->nr_cores * cs->nr_threads > 1) {
+        if (cpus_per_pkg > 1) {
             if (env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1 ||
                 env->cpuid_vendor2 != CPUID_VENDOR_INTEL_2 ||
                 env->cpuid_vendor3 != CPUID_VENDOR_INTEL_3) {
@@ -6584,7 +6589,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
              *eax |= (cpu_x86_virtual_addr_width(env) << 8);
         }
         *ebx = env->features[FEAT_8000_0008_EBX];
-        if (cs->nr_cores * cs->nr_threads > 1) {
+        if (cpus_per_pkg > 1) {
             /*
              * Bits 15:12 is "The number of bits in the initial
              * Core::X86::Apic::ApicId[ApicId] value that indicate
@@ -6592,7 +6597,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
              * Bits 7:0 is "The number of threads in the package is NC+1"
              */
             *ecx = (apicid_pkg_offset(&topo_info) << 12) |
-                   ((cs->nr_cores * cs->nr_threads) - 1);
+                   (cpus_per_pkg - 1);
         } else {
             *ecx = 0;
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (2 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared
by Intel and AMD CPUs.

But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU
(in CPUID[0x80000026]) have the different definitions with different
enumeration values.

Though CPUID[0x80000026] hasn't been implemented in QEMU, to avoid
possible misunderstanding, split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology
types.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * New commit to prepare to refactor CPUID[0x1F] encoding.
---
 target/i386/cpu.c | 14 +++++++-------
 target/i386/cpu.h | 13 +++++++++----
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6f8fa772ecf8..bc440477d13d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6255,17 +6255,17 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         case 0:
             *eax = apicid_core_offset(&topo_info);
             *ebx = topo_info.threads_per_core;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+            *ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
             break;
         case 1:
             *eax = apicid_pkg_offset(&topo_info);
             *ebx = cpus_per_pkg;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+            *ecx |= CPUID_B_ECX_TOPO_LEVEL_CORE << 8;
             break;
         default:
             *eax = 0;
             *ebx = 0;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+            *ecx |= CPUID_B_ECX_TOPO_LEVEL_INVALID << 8;
         }
 
         assert(!(*eax & ~0x1f));
@@ -6290,22 +6290,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         case 0:
             *eax = apicid_core_offset(&topo_info);
             *ebx = topo_info.threads_per_core;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
+            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
             break;
         case 1:
             *eax = apicid_die_offset(&topo_info);
             *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
+            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
             break;
         case 2:
             *eax = apicid_pkg_offset(&topo_info);
             *ebx = cpus_per_pkg;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
+            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
             break;
         default:
             *eax = 0;
             *ebx = 0;
-            *ecx |= CPUID_TOPOLOGY_LEVEL_INVALID;
+            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
         }
         assert(!(*eax & ~0x1f));
         *ebx &= 0xffff; /* The count doesn't need to be reliable. */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ef987f344cff..f47bad46db5e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1009,10 +1009,15 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_MWAIT_EMX     (1U << 0) /* enumeration supported */
 
 /* CPUID[0xB].ECX level types */
-#define CPUID_TOPOLOGY_LEVEL_INVALID  (0U << 8)
-#define CPUID_TOPOLOGY_LEVEL_SMT      (1U << 8)
-#define CPUID_TOPOLOGY_LEVEL_CORE     (2U << 8)
-#define CPUID_TOPOLOGY_LEVEL_DIE      (5U << 8)
+#define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
+#define CPUID_B_ECX_TOPO_LEVEL_SMT      1
+#define CPUID_B_ECX_TOPO_LEVEL_CORE     2
+
+/* COUID[0x1F].ECX level types */
+#define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
+#define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
+#define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
 
 /* MSR Feature Bits */
 #define MSR_ARCH_CAP_RDCL_NO            (1U << 0)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (3 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-11  3:19   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State Zhao Liu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.

In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).

To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * New patch to prepare to expose module level in 0x1F.
 * Move the CPUTopoLevel enumeration definition from "i386: Add cache
   topology info in CPUCacheInfo" to this patch. Note, to align with
   topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
   CPU_TOPO_LEVEL_INVALID.
---
 target/i386/cpu.c | 136 +++++++++++++++++++++++++++++++++++++---------
 target/i386/cpu.h |  15 +++++
 2 files changed, 126 insertions(+), 25 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc440477d13d..5c295c9a9e2d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -269,6 +269,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
            (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
 }
 
+static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
+                                       enum CPUTopoLevel topo_level)
+{
+    switch (topo_level) {
+    case CPU_TOPO_LEVEL_SMT:
+        return 1;
+    case CPU_TOPO_LEVEL_CORE:
+        return topo_info->threads_per_core;
+    case CPU_TOPO_LEVEL_DIE:
+        return topo_info->threads_per_core * topo_info->cores_per_die;
+    case CPU_TOPO_LEVEL_PACKAGE:
+        return topo_info->threads_per_core * topo_info->cores_per_die *
+               topo_info->dies_per_pkg;
+    default:
+        g_assert_not_reached();
+    }
+    return 0;
+}
+
+static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
+                                            enum CPUTopoLevel topo_level)
+{
+    switch (topo_level) {
+    case CPU_TOPO_LEVEL_SMT:
+        return 0;
+    case CPU_TOPO_LEVEL_CORE:
+        return apicid_core_offset(topo_info);
+    case CPU_TOPO_LEVEL_DIE:
+        return apicid_die_offset(topo_info);
+    case CPU_TOPO_LEVEL_PACKAGE:
+        return apicid_pkg_offset(topo_info);
+    default:
+        g_assert_not_reached();
+    }
+    return 0;
+}
+
+static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
+{
+    switch (topo_level) {
+    case CPU_TOPO_LEVEL_INVALID:
+        return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
+    case CPU_TOPO_LEVEL_SMT:
+        return CPUID_1F_ECX_TOPO_LEVEL_SMT;
+    case CPU_TOPO_LEVEL_CORE:
+        return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+    case CPU_TOPO_LEVEL_DIE:
+        return CPUID_1F_ECX_TOPO_LEVEL_DIE;
+    default:
+        /* Other types are not supported in QEMU. */
+        g_assert_not_reached();
+    }
+    return 0;
+}
+
+static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
+                                X86CPUTopoInfo *topo_info,
+                                uint32_t *eax, uint32_t *ebx,
+                                uint32_t *ecx, uint32_t *edx)
+{
+    static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+    X86CPU *cpu = env_archcpu(env);
+    unsigned long level, next_level;
+    uint32_t num_cpus_next_level, offset_next_level;
+
+    /*
+     * Initialize the bitmap to decide which levels should be
+     * encoded in 0x1f.
+     */
+    if (!count) {
+        /* SMT and core levels are exposed in 0x1f leaf by default. */
+        set_bit(CPU_TOPO_LEVEL_SMT, topo_bitmap);
+        set_bit(CPU_TOPO_LEVEL_CORE, topo_bitmap);
+
+        if (env->nr_dies > 1) {
+            set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+        }
+    }
+
+    *ecx = count & 0xff;
+    *edx = cpu->apic_id;
+
+    level = find_first_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX);
+    if (level == CPU_TOPO_LEVEL_MAX) {
+        num_cpus_next_level = 0;
+        offset_next_level = 0;
+
+        /* Encode CPU_TOPO_LEVEL_INVALID into the last subleaf of 0x1f. */
+        level = CPU_TOPO_LEVEL_INVALID;
+    } else {
+        next_level = find_next_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX, level + 1);
+        if (next_level == CPU_TOPO_LEVEL_MAX) {
+            next_level = CPU_TOPO_LEVEL_PACKAGE;
+        }
+
+        num_cpus_next_level = num_cpus_by_topo_level(topo_info, next_level);
+        offset_next_level = apicid_offset_by_topo_level(topo_info, next_level);
+    }
+
+    *eax = offset_next_level;
+    *ebx = num_cpus_next_level;
+    *ecx |= cpuid1f_topo_type(level) << 8;
+
+    assert(!(*eax & ~0x1f));
+    *ebx &= 0xffff; /* The count doesn't need to be reliable. */
+    if (level != CPU_TOPO_LEVEL_MAX) {
+        clear_bit(level, topo_bitmap);
+    }
+}
+
 /* Encode cache info for CPUID[0x80000005].ECX or CPUID[0x80000005].EDX */
 static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
 {
@@ -6284,31 +6394,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         }
 
-        *ecx = count & 0xff;
-        *edx = cpu->apic_id;
-        switch (count) {
-        case 0:
-            *eax = apicid_core_offset(&topo_info);
-            *ebx = topo_info.threads_per_core;
-            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
-            break;
-        case 1:
-            *eax = apicid_die_offset(&topo_info);
-            *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
-            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
-            break;
-        case 2:
-            *eax = apicid_pkg_offset(&topo_info);
-            *ebx = cpus_per_pkg;
-            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
-            break;
-        default:
-            *eax = 0;
-            *ebx = 0;
-            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
-        }
-        assert(!(*eax & ~0x1f));
-        *ebx &= 0xffff; /* The count doesn't need to be reliable. */
+        encode_topo_cpuid1f(env, count, &topo_info, eax, ebx, ecx, edx);
         break;
     case 0xD: {
         /* Processor Extended State */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f47bad46db5e..9c78cfc3f322 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1008,6 +1008,21 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_MWAIT_IBE     (1U << 1) /* Interrupts can exit capability */
 #define CPUID_MWAIT_EMX     (1U << 0) /* enumeration supported */
 
+/*
+ * CPUTopoLevel is the general i386 topology hierarchical representation,
+ * ordered by increasing hierarchical relationship.
+ * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
+ * or AMD (CPUID[0x80000026]).
+ */
+enum CPUTopoLevel {
+    CPU_TOPO_LEVEL_INVALID,
+    CPU_TOPO_LEVEL_SMT,
+    CPU_TOPO_LEVEL_CORE,
+    CPU_TOPO_LEVEL_DIE,
+    CPU_TOPO_LEVEL_PACKAGE,
+    CPU_TOPO_LEVEL_MAX,
+};
+
 /* CPUID[0xB].ECX level types */
 #define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
 #define CPUID_B_ECX_TOPO_LEVEL_SMT      1
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (4 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

smp command has the "clusters" parameter but x86 hasn't supported that
level. "cluster" is a CPU topology level concept above cores, in which
the cores may share some resources (L2 cache or some others like L3
cache tags, depending on the Archs) [1][2]. For x86, the resource shared
by cores at the cluster level is mainly the L2 cache.

However, using cluster to define x86's L2 cache topology will cause the
compatibility problem:

Currently, x86 defaults that the L2 cache is shared in one core, which
actually implies a default setting "cores per L2 cache is 1" and
therefore implicitly defaults to having as many L2 caches as cores.

For example (i386 PC machine):
-smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)

Considering the topology of the L2 cache, this (*) implicitly means "1
core per L2 cache" and "2 L2 caches per die".

If we use cluster to configure L2 cache topology with the new default
setting "clusters per L2 cache is 1", the above semantics will change
to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
cores per L2 cache".

So the same command (*) will cause changes in the L2 cache topology,
further affecting the performance of the virtual machine.

Therefore, x86 should only treat cluster as a cpu topology level and
avoid using it to change L2 cache by default for compatibility.

"cluster" in smp is the CPU topology level which is between "core" and
die.

For x86, the "cluster" in smp is corresponding to the module level [2],
which is above the core level. So use the "module" other than "cluster"
in i386 code.

And please note that x86 already has a cpu topology level also named
"cluster" [3], this level is at the upper level of the package. Here,
the cluster in x86 cpu topology is completely different from the
"clusters" as the smp parameter. After the module level is introduced,
the cluster as the smp parameter will actually refer to the module level
of x86.

[1]: 864c3b5c32f0 ("hw/core/machine: Introduce CPU cluster topology support")
[2]: Yanan's comment about "cluster",
     https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg04051.html
[3]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v1:
 * The background of the introduction of the "cluster" parameter and its
   exact meaning were revised according to Yanan's explanation. (Yanan)
---
 hw/i386/x86.c     | 1 +
 target/i386/cpu.c | 1 +
 target/i386/cpu.h | 5 +++++
 3 files changed, 7 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 2b6291ad8d5f..1d19a8c609b1 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -310,6 +310,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
     init_topo_info(&topo_info, x86ms);
 
     env->nr_dies = ms->smp.dies;
+    env->nr_modules = ms->smp.clusters;
 
     /*
      * If APIC ID is not set,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5c295c9a9e2d..0a2ce9b92b1f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7699,6 +7699,7 @@ static void x86_cpu_initfn(Object *obj)
     CPUX86State *env = &cpu->env;
 
     env->nr_dies = 1;
+    env->nr_modules = 1;
 
     object_property_add(obj, "feature-words", "X86CPUFeatureWordInfo",
                         x86_cpu_get_feature_words,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9c78cfc3f322..eecd30bde92b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1904,6 +1904,11 @@ typedef struct CPUArchState {
 
     /* Number of dies within this CPU package. */
     unsigned nr_dies;
+    /*
+     * Number of modules within this CPU package.
+     * Module level in x86 cpu topology is corresponding to smp.clusters.
+     */
+    unsigned nr_modules;
 } CPUX86State;
 
 struct kvm_msrs;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (5 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-11  5:53   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

Support module level in i386 cpu topology structure "X86CPUTopoInfo".

Since x86 does not yet support the "clusters" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the
module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0
for now, so we can directly add APIC ID related helpers to support
module level parsing.

In addition, update topology structure in test-x86-topo.c.

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Drop the description about not exposing module level in commit
   message.
 * Update topology related calculation in newly added helpers:
   num_cpus_by_topo_level() and apicid_offset_by_topo_level().

Changes since v1:
 * Include module level related helpers (apicid_module_width() and
   apicid_module_offset()) in this patch. (Yanan)
---
 hw/i386/x86.c              |  3 ++-
 include/hw/i386/topology.h | 22 +++++++++++++++----
 target/i386/cpu.c          | 17 +++++++++-----
 tests/unit/test-x86-topo.c | 45 ++++++++++++++++++++------------------
 4 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 1d19a8c609b1..85b847ac7914 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -72,7 +72,8 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
     MachineState *ms = MACHINE(x86ms);
 
     topo_info->dies_per_pkg = ms->smp.dies;
-    topo_info->cores_per_die = ms->smp.cores;
+    topo_info->modules_per_die = ms->smp.clusters;
+    topo_info->cores_per_module = ms->smp.cores;
     topo_info->threads_per_core = ms->smp.threads;
 }
 
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index d4eeb7ab8290..517e51768c13 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
 
 typedef struct X86CPUTopoInfo {
     unsigned dies_per_pkg;
-    unsigned cores_per_die;
+    unsigned modules_per_die;
+    unsigned cores_per_module;
     unsigned threads_per_core;
 } X86CPUTopoInfo;
 
@@ -77,7 +78,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo *topo_info)
 /* Bit width of the Core_ID field */
 static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
 {
-    return apicid_bitwidth_for_count(topo_info->cores_per_die);
+    return apicid_bitwidth_for_count(topo_info->cores_per_module);
+}
+
+/* Bit width of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
+{
+    return apicid_bitwidth_for_count(topo_info->modules_per_die);
 }
 
 /* Bit width of the Die_ID field */
@@ -92,10 +99,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo *topo_info)
     return apicid_smt_width(topo_info);
 }
 
+/* Bit offset of the Module_ID (cluster ID) field */
+static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
+{
+    return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+}
+
 /* Bit offset of the Die_ID field */
 static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
 {
-    return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
+    return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
 }
 
 /* Bit offset of the Pkg_ID (socket ID) field */
@@ -127,7 +140,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
                                          X86CPUTopoIDs *topo_ids)
 {
     unsigned nr_dies = topo_info->dies_per_pkg;
-    unsigned nr_cores = topo_info->cores_per_die;
+    unsigned nr_cores = topo_info->cores_per_module *
+                        topo_info->modules_per_die;
     unsigned nr_threads = topo_info->threads_per_core;
 
     topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0a2ce9b92b1f..294ca6b8947a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -278,10 +278,11 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
     case CPU_TOPO_LEVEL_CORE:
         return topo_info->threads_per_core;
     case CPU_TOPO_LEVEL_DIE:
-        return topo_info->threads_per_core * topo_info->cores_per_die;
+        return topo_info->threads_per_core * topo_info->cores_per_module *
+               topo_info->modules_per_die;
     case CPU_TOPO_LEVEL_PACKAGE:
-        return topo_info->threads_per_core * topo_info->cores_per_die *
-               topo_info->dies_per_pkg;
+        return topo_info->threads_per_core * topo_info->cores_per_module *
+               topo_info->modules_per_die * topo_info->dies_per_pkg;
     default:
         g_assert_not_reached();
     }
@@ -450,7 +451,9 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
 
     /* L3 is shared among multiple cores */
     if (cache->level == 3) {
-        l3_threads = topo_info->cores_per_die * topo_info->threads_per_core;
+        l3_threads = topo_info->modules_per_die *
+                     topo_info->cores_per_module *
+                     topo_info->threads_per_core;
         *eax |= (l3_threads - 1) << 14;
     } else {
         *eax |= ((topo_info->threads_per_core - 1) << 14);
@@ -6131,10 +6134,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
     uint32_t cpus_per_pkg;
 
     topo_info.dies_per_pkg = env->nr_dies;
-    topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
+    topo_info.modules_per_die = env->nr_modules;
+    topo_info.cores_per_module = cs->nr_cores / env->nr_dies / env->nr_modules;
     topo_info.threads_per_core = cs->nr_threads;
 
-    cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
+    cores_per_pkg = topo_info.cores_per_module * topo_info.modules_per_die *
+                    topo_info.dies_per_pkg;
     cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
 
     /* Calculate & apply limits for different index ranges */
diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
index 2b104f86d7c2..f21b8a5d95c2 100644
--- a/tests/unit/test-x86-topo.c
+++ b/tests/unit/test-x86-topo.c
@@ -30,13 +30,16 @@ static void test_topo_bits(void)
 {
     X86CPUTopoInfo topo_info = {0};
 
-    /* simple tests for 1 thread per core, 1 core per die, 1 die per package */
-    topo_info = (X86CPUTopoInfo) {1, 1, 1};
+    /*
+     * simple tests for 1 thread per core, 1 core per module,
+     *                  1 module per die, 1 die per package
+     */
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
 
-    topo_info = (X86CPUTopoInfo) {1, 1, 1};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 0), ==, 0);
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1), ==, 1);
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 2), ==, 2);
@@ -45,39 +48,39 @@ static void test_topo_bits(void)
 
     /* Test field width calculation for multiple values
      */
-    topo_info = (X86CPUTopoInfo) {1, 1, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 2};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 1);
-    topo_info = (X86CPUTopoInfo) {1, 1, 3};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 3};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
-    topo_info = (X86CPUTopoInfo) {1, 1, 4};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 4};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
 
-    topo_info = (X86CPUTopoInfo) {1, 1, 14};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 14};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
-    topo_info = (X86CPUTopoInfo) {1, 1, 15};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 15};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
-    topo_info = (X86CPUTopoInfo) {1, 1, 16};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 16};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
-    topo_info = (X86CPUTopoInfo) {1, 1, 17};
+    topo_info = (X86CPUTopoInfo) {1, 1, 1, 17};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 5);
 
 
-    topo_info = (X86CPUTopoInfo) {1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
-    topo_info = (X86CPUTopoInfo) {1, 31, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 31, 2};
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
-    topo_info = (X86CPUTopoInfo) {1, 32, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 32, 2};
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
-    topo_info = (X86CPUTopoInfo) {1, 33, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
 
-    topo_info = (X86CPUTopoInfo) {1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
-    topo_info = (X86CPUTopoInfo) {2, 30, 2};
+    topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
-    topo_info = (X86CPUTopoInfo) {3, 30, 2};
+    topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
-    topo_info = (X86CPUTopoInfo) {4, 30, 2};
+    topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
 
     /* build a weird topology and see if IDs are calculated correctly
@@ -85,18 +88,18 @@ static void test_topo_bits(void)
 
     /* This will use 2 bits for thread ID and 3 bits for core ID
      */
-    topo_info = (X86CPUTopoInfo) {1, 6, 3};
+    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
     g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
     g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
     g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
 
-    topo_info = (X86CPUTopoInfo) {1, 6, 3};
+    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 0), ==, 0);
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1), ==, 1);
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 2), ==, 2);
 
-    topo_info = (X86CPUTopoInfo) {1, 6, 3};
+    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1 * 3 + 0), ==,
                      (1 << 2) | 0);
     g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1 * 3 + 1), ==,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (6 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-11  6:04   ` Xiaoyao Li
  2024-01-15  3:25   ` Yuan Yao
  2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.

Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.

(Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
configurations in "-smp".)

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * New patch to expose module level in 0x1F.
 * Add Tested-by tag from Yongwei.
---
 target/i386/cpu.c     | 12 +++++++++++-
 target/i386/cpu.h     |  2 ++
 target/i386/kvm/kvm.c |  2 +-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 294ca6b8947a..a2d39d2198b6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
         return 1;
     case CPU_TOPO_LEVEL_CORE:
         return topo_info->threads_per_core;
+    case CPU_TOPO_LEVEL_MODULE:
+        return topo_info->threads_per_core * topo_info->cores_per_module;
     case CPU_TOPO_LEVEL_DIE:
         return topo_info->threads_per_core * topo_info->cores_per_module *
                topo_info->modules_per_die;
@@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
         return 0;
     case CPU_TOPO_LEVEL_CORE:
         return apicid_core_offset(topo_info);
+    case CPU_TOPO_LEVEL_MODULE:
+        return apicid_module_offset(topo_info);
     case CPU_TOPO_LEVEL_DIE:
         return apicid_die_offset(topo_info);
     case CPU_TOPO_LEVEL_PACKAGE:
@@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
         return CPUID_1F_ECX_TOPO_LEVEL_SMT;
     case CPU_TOPO_LEVEL_CORE:
         return CPUID_1F_ECX_TOPO_LEVEL_CORE;
+    case CPU_TOPO_LEVEL_MODULE:
+        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
     case CPU_TOPO_LEVEL_DIE:
         return CPUID_1F_ECX_TOPO_LEVEL_DIE;
     default:
@@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
         if (env->nr_dies > 1) {
             set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
         }
+
+        if (env->nr_modules > 1) {
+            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
+        }
     }
 
     *ecx = count & 0xff;
@@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         break;
     case 0x1F:
         /* V2 Extended Topology Enumeration Leaf */
-        if (topo_info.dies_per_pkg < 2) {
+        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
             *eax = *ebx = *ecx = *edx = 0;
             break;
         }
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index eecd30bde92b..97b290e10576 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
     CPU_TOPO_LEVEL_INVALID,
     CPU_TOPO_LEVEL_SMT,
     CPU_TOPO_LEVEL_CORE,
+    CPU_TOPO_LEVEL_MODULE,
     CPU_TOPO_LEVEL_DIE,
     CPU_TOPO_LEVEL_PACKAGE,
     CPU_TOPO_LEVEL_MAX,
@@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
 #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
 #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
 #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
+#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
 #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
 
 /* MSR Feature Bits */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4ce80555b45c..e5ddb214cb36 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             break;
         }
         case 0x1f:
-            if (env->nr_dies < 2) {
+            if (env->nr_modules < 2 && env->nr_dies < 2) {
                 break;
             }
             /* fallthrough */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (7 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-14 12:42   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

Add module_id member in X86CPUTopoIDs.

module_id can be parsed from APIC ID, so also update APIC ID parsing
rule to support module level. With this support, the conversions with
module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
completed.

module_id can be also generated from cpu topology, and before i386
supports "clusters" in smp, the default "clusters per die" is only 1,
thus the module_id generated in this way is 0, so that it will not
conflict with the module_id generated by APIC ID.

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v1:
 * Merge the patch "i386: Update APIC ID parsing rule to support module
   level" into this one. (Yanan)
 * Move the apicid_module_width() and apicid_module_offset() support
   into the previous modules_per_die related patch. (Yanan)
---
 hw/i386/x86.c              | 28 +++++++++++++++++++++-------
 include/hw/i386/topology.h | 17 +++++++++++++----
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 85b847ac7914..5269aae3a5c2 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -315,11 +315,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
     /*
      * If APIC ID is not set,
-     * set it based on socket/die/core/thread properties.
+     * set it based on socket/die/cluster/core/thread properties.
      */
     if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-        int max_socket = (ms->smp.max_cpus - 1) /
-                                smp_threads / smp_cores / ms->smp.dies;
+        int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores /
+                                ms->smp.clusters / ms->smp.dies;
 
         /*
          * die-id was optional in QEMU 4.0 and older, so keep it optional
@@ -366,17 +366,27 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
         topo_ids.die_id = cpu->die_id;
         topo_ids.core_id = cpu->core_id;
         topo_ids.smt_id = cpu->thread_id;
+
+        /*
+         * TODO: This is the temporary initialization for topo_ids.module_id to
+         * avoid "maybe-uninitialized" compilation errors. Will remove when
+         * X86CPU supports cluster_id.
+         */
+        topo_ids.module_id = 0;
+
         cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
     }
 
     cpu_slot = x86_find_cpu_slot(MACHINE(x86ms), cpu->apic_id, &idx);
     if (!cpu_slot) {
         x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
+
         error_setg(errp,
-            "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
-            " APIC ID %" PRIu32 ", valid index range 0:%d",
-            topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, topo_ids.smt_id,
-            cpu->apic_id, ms->possible_cpus->len - 1);
+            "Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: %u]"
+            " with APIC ID %" PRIu32 ", valid index range 0:%d",
+            topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
+            topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
+            ms->possible_cpus->len - 1);
         return;
     }
 
@@ -502,6 +512,10 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
             ms->possible_cpus->cpus[i].props.has_die_id = true;
             ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
         }
+        if (ms->smp.clusters > 1) {
+            ms->possible_cpus->cpus[i].props.has_cluster_id = true;
+            ms->possible_cpus->cpus[i].props.cluster_id = topo_ids.module_id;
+        }
         ms->possible_cpus->cpus[i].props.has_core_id = true;
         ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
         ms->possible_cpus->cpus[i].props.has_thread_id = true;
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 517e51768c13..ed1f3d6c1d5e 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
 typedef struct X86CPUTopoIDs {
     unsigned pkg_id;
     unsigned die_id;
+    unsigned module_id;
     unsigned core_id;
     unsigned smt_id;
 } X86CPUTopoIDs;
@@ -127,6 +128,7 @@ static inline apic_id_t x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
 {
     return (topo_ids->pkg_id  << apicid_pkg_offset(topo_info)) |
            (topo_ids->die_id  << apicid_die_offset(topo_info)) |
+           (topo_ids->module_id << apicid_module_offset(topo_info)) |
            (topo_ids->core_id << apicid_core_offset(topo_info)) |
            topo_ids->smt_id;
 }
@@ -140,12 +142,16 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
                                          X86CPUTopoIDs *topo_ids)
 {
     unsigned nr_dies = topo_info->dies_per_pkg;
-    unsigned nr_cores = topo_info->cores_per_module *
-                        topo_info->modules_per_die;
+    unsigned nr_modules = topo_info->modules_per_die;
+    unsigned nr_cores = topo_info->cores_per_module;
     unsigned nr_threads = topo_info->threads_per_core;
 
-    topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
-    topo_ids->die_id = cpu_index / (nr_cores * nr_threads) % nr_dies;
+    topo_ids->pkg_id = cpu_index / (nr_dies * nr_modules *
+                       nr_cores * nr_threads);
+    topo_ids->die_id = cpu_index / (nr_modules * nr_cores *
+                       nr_threads) % nr_dies;
+    topo_ids->module_id = cpu_index / (nr_cores * nr_threads) %
+                          nr_modules;
     topo_ids->core_id = cpu_index / nr_threads % nr_cores;
     topo_ids->smt_id = cpu_index % nr_threads;
 }
@@ -163,6 +169,9 @@ static inline void x86_topo_ids_from_apicid(apic_id_t apicid,
     topo_ids->core_id =
             (apicid >> apicid_core_offset(topo_info)) &
             ~(0xFFFFFFFFUL << apicid_core_width(topo_info));
+    topo_ids->module_id =
+            (apicid >> apicid_module_offset(topo_info)) &
+            ~(0xFFFFFFFFUL << apicid_module_width(topo_info));
     topo_ids->die_id =
             (apicid >> apicid_die_offset(topo_info)) &
             ~(0xFFFFFFFFUL << apicid_die_width(topo_info));
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (8 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-14 13:49   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing Zhao Liu
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

Introduce cluster-id other than module-id to be consistent with
CpuInstanceProperties.cluster-id, and this avoids the confusion
of parameter names when hotplugging.

Following the legacy smp check rules, also add the cluster_id validity
into x86_cpu_pre_plug().

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v6:
 * Update the comment when check cluster-id. Since there's no
   v8.2, the cluster-id support should at least start from v9.0.

Changes since v5:
 * Update the comment when check cluster-id. Since current QEMU is
   v8.2, the cluster-id support should at least start from v8.3.

Changes since v3:
 * Use the imperative in the commit message. (Babu)
---
 hw/i386/x86.c     | 33 +++++++++++++++++++++++++--------
 target/i386/cpu.c |  2 ++
 target/i386/cpu.h |  1 +
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 5269aae3a5c2..1c1d368614ee 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -329,6 +329,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
             cpu->die_id = 0;
         }
 
+        /*
+         * cluster-id was optional in QEMU 9.0 and older, so keep it optional
+         * if there's only one cluster per die.
+         */
+        if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
+            cpu->cluster_id = 0;
+        }
+
         if (cpu->socket_id < 0) {
             error_setg(errp, "CPU socket-id is not set");
             return;
@@ -345,6 +353,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
                        cpu->die_id, ms->smp.dies - 1);
             return;
         }
+        if (cpu->cluster_id < 0) {
+            error_setg(errp, "CPU cluster-id is not set");
+            return;
+        } else if (cpu->cluster_id > ms->smp.clusters - 1) {
+            error_setg(errp, "Invalid CPU cluster-id: %u must be in range 0:%u",
+                       cpu->cluster_id, ms->smp.clusters - 1);
+            return;
+        }
         if (cpu->core_id < 0) {
             error_setg(errp, "CPU core-id is not set");
             return;
@@ -364,16 +380,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
         topo_ids.pkg_id = cpu->socket_id;
         topo_ids.die_id = cpu->die_id;
+        topo_ids.module_id = cpu->cluster_id;
         topo_ids.core_id = cpu->core_id;
         topo_ids.smt_id = cpu->thread_id;
-
-        /*
-         * TODO: This is the temporary initialization for topo_ids.module_id to
-         * avoid "maybe-uninitialized" compilation errors. Will remove when
-         * X86CPU supports cluster_id.
-         */
-        topo_ids.module_id = 0;
-
         cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
     }
 
@@ -418,6 +427,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
     }
     cpu->die_id = topo_ids.die_id;
 
+    if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
+        error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
+            " 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
+            topo_ids.module_id);
+        return;
+    }
+    cpu->cluster_id = topo_ids.module_id;
+
     if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
         error_setg(errp, "property core-id: %u doesn't match set apic-id:"
             " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a2d39d2198b6..498a4be62b40 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7909,12 +7909,14 @@ static Property x86_cpu_properties[] = {
     DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
     DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
     DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
+    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
     DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
     DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
 #else
     DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
     DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
     DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
+    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
     DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
     DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 97b290e10576..009950b87203 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2057,6 +2057,7 @@ struct ArchCPU {
     int32_t node_id; /* NUMA node this CPU belongs to */
     int32_t socket_id;
     int32_t die_id;
+    int32_t cluster_id;
     int32_t core_id;
     int32_t thread_id;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (9 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine Zhao Liu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Yanan Wang, Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

After i386 supports module level, it's time to add the test for module
level's parsing.

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/unit/test-x86-topo.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
index f21b8a5d95c2..55b731ccae55 100644
--- a/tests/unit/test-x86-topo.c
+++ b/tests/unit/test-x86-topo.c
@@ -37,6 +37,7 @@ static void test_topo_bits(void)
     topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
+    g_assert_cmpuint(apicid_module_width(&topo_info), ==, 0);
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
 
     topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
@@ -74,13 +75,22 @@ static void test_topo_bits(void)
     topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
     g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
 
-    topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
+    g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+    topo_info = (X86CPUTopoInfo) {1, 7, 30, 2};
+    g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+    topo_info = (X86CPUTopoInfo) {1, 8, 30, 2};
+    g_assert_cmpuint(apicid_module_width(&topo_info), ==, 3);
+    topo_info = (X86CPUTopoInfo) {1, 9, 30, 2};
+    g_assert_cmpuint(apicid_module_width(&topo_info), ==, 4);
+
+    topo_info = (X86CPUTopoInfo) {1, 6, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
-    topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {2, 6, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
-    topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {3, 6, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
-    topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
+    topo_info = (X86CPUTopoInfo) {4, 6, 30, 2};
     g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
 
     /* build a weird topology and see if IDs are calculated correctly
@@ -91,6 +101,7 @@ static void test_topo_bits(void)
     topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
     g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
     g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
+    g_assert_cmpuint(apicid_module_offset(&topo_info), ==, 5);
     g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
     g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (10 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo Zhao Liu
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Yanan Wang, Babu Moger, Yongwei Ma

From: Zhuocheng Ding <zhuocheng.ding@intel.com>

As module-level topology support is added to X86CPU, now we can enable
the support for the cluster parameter on PC machines. With this support,
we can define a 5-level x86 CPU topology with "-smp":

-smp cpus=*,maxcpus=*,sockets=*,dies=*,clusters=*,cores=*,threads=*.

Additionally, add the 5-level topology example in description of "-smp".

Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/pc.c    |  1 +
 qemu-options.hx | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 496498df3a8f..97527b7e0525 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1849,6 +1849,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
     mc->nvdimm_supported = true;
     mc->smp_props.dies_supported = true;
+    mc->smp_props.clusters_supported = true;
     mc->default_ram_id = "pc.ram";
     pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index b66570ae0067..7be8d1b53644 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -337,14 +337,14 @@ SRST
         -smp 8,sockets=2,cores=2,threads=2,maxcpus=8
 
     The following sub-option defines a CPU topology hierarchy (2 sockets
-    totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
-    per core) for PC machines which support sockets/dies/cores/threads.
-    Some members of the option can be omitted but their values will be
-    automatically computed:
+    totally on the machine, 2 dies per socket, 2 clusters per die, 2 cores per
+    cluster, 2 threads per core) for PC machines which support sockets/dies
+    /clusters/cores/threads. Some members of the option can be omitted but
+    their values will be automatically computed:
 
     ::
 
-        -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
+        -smp 32,sockets=2,dies=2,clusters=2,cores=2,threads=2,maxcpus=32
 
     The following sub-option defines a CPU topology hierarchy (2 sockets
     totally on the machine, 2 clusters per socket, 2 cores per cluster,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (11 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.

This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.

In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.

Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.

Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.

The field for CPUID[4].EAX[bits 25:14] or CPUID[0x8000001D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Fix cache topology uninitialization bugs for some AMD CPUs. (Babu)
 * Move the CPUTopoLevel enumeration definition to the previous 0x1f
   rework patch.

Changes since v1:
 * Add the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
   (Yanan)
 * (Revert, pls refer "i386: Decouple CPUID[0x1F] subleaf with specific
   topology level") Rename the "INVALID" level to CPU_TOPO_LEVEL_UNKNOW.
   (Yanan)
---
 target/i386/cpu.c | 36 ++++++++++++++++++++++++++++++++++++
 target/i386/cpu.h |  7 +++++++
 2 files changed, 43 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 498a4be62b40..81e07474acef 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -558,6 +558,7 @@ static CPUCacheInfo legacy_l1d_cache = {
     .sets = 64,
     .partitions = 1,
     .no_invd_sharing = true,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
@@ -572,6 +573,7 @@ static CPUCacheInfo legacy_l1d_cache_amd = {
     .partitions = 1,
     .lines_per_tag = 1,
     .no_invd_sharing = true,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* L1 instruction cache: */
@@ -585,6 +587,7 @@ static CPUCacheInfo legacy_l1i_cache = {
     .sets = 64,
     .partitions = 1,
     .no_invd_sharing = true,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 0x80000005 is inconsistent with leaves 2 & 4 */
@@ -599,6 +602,7 @@ static CPUCacheInfo legacy_l1i_cache_amd = {
     .partitions = 1,
     .lines_per_tag = 1,
     .no_invd_sharing = true,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 2 unified cache: */
@@ -612,6 +616,7 @@ static CPUCacheInfo legacy_l2_cache = {
     .sets = 4096,
     .partitions = 1,
     .no_invd_sharing = true,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /*FIXME: CPUID leaf 2 descriptor is inconsistent with CPUID leaf 4 */
@@ -621,6 +626,7 @@ static CPUCacheInfo legacy_l2_cache_cpuid2 = {
     .size = 2 * MiB,
     .line_size = 64,
     .associativity = 8,
+    .share_level = CPU_TOPO_LEVEL_INVALID,
 };
 
 
@@ -634,6 +640,7 @@ static CPUCacheInfo legacy_l2_cache_amd = {
     .associativity = 16,
     .sets = 512,
     .partitions = 1,
+    .share_level = CPU_TOPO_LEVEL_CORE,
 };
 
 /* Level 3 unified cache: */
@@ -649,6 +656,7 @@ static CPUCacheInfo legacy_l3_cache = {
     .self_init = true,
     .inclusive = true,
     .complex_indexing = true,
+    .share_level = CPU_TOPO_LEVEL_DIE,
 };
 
 /* TLB definitions: */
@@ -1947,6 +1955,7 @@ static const CPUCaches epyc_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -1959,6 +1968,7 @@ static const CPUCaches epyc_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -1969,6 +1979,7 @@ static const CPUCaches epyc_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -1982,6 +1993,7 @@ static const CPUCaches epyc_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = true,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -1997,6 +2009,7 @@ static CPUCaches epyc_v4_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2009,6 +2022,7 @@ static CPUCaches epyc_v4_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2019,6 +2033,7 @@ static CPUCaches epyc_v4_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2032,6 +2047,7 @@ static CPUCaches epyc_v4_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = false,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -2047,6 +2063,7 @@ static const CPUCaches epyc_rome_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2059,6 +2076,7 @@ static const CPUCaches epyc_rome_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2069,6 +2087,7 @@ static const CPUCaches epyc_rome_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2082,6 +2101,7 @@ static const CPUCaches epyc_rome_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = true,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -2097,6 +2117,7 @@ static const CPUCaches epyc_rome_v3_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2109,6 +2130,7 @@ static const CPUCaches epyc_rome_v3_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2119,6 +2141,7 @@ static const CPUCaches epyc_rome_v3_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2132,6 +2155,7 @@ static const CPUCaches epyc_rome_v3_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = false,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -2147,6 +2171,7 @@ static const CPUCaches epyc_milan_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2159,6 +2184,7 @@ static const CPUCaches epyc_milan_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2169,6 +2195,7 @@ static const CPUCaches epyc_milan_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2182,6 +2209,7 @@ static const CPUCaches epyc_milan_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = true,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -2197,6 +2225,7 @@ static const CPUCaches epyc_milan_v2_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2209,6 +2238,7 @@ static const CPUCaches epyc_milan_v2_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2219,6 +2249,7 @@ static const CPUCaches epyc_milan_v2_cache_info = {
         .partitions = 1,
         .sets = 1024,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2232,6 +2263,7 @@ static const CPUCaches epyc_milan_v2_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = false,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
@@ -2247,6 +2279,7 @@ static const CPUCaches epyc_genoa_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l1i_cache = &(CPUCacheInfo) {
         .type = INSTRUCTION_CACHE,
@@ -2259,6 +2292,7 @@ static const CPUCaches epyc_genoa_cache_info = {
         .lines_per_tag = 1,
         .self_init = 1,
         .no_invd_sharing = true,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l2_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2269,6 +2303,7 @@ static const CPUCaches epyc_genoa_cache_info = {
         .partitions = 1,
         .sets = 2048,
         .lines_per_tag = 1,
+        .share_level = CPU_TOPO_LEVEL_CORE,
     },
     .l3_cache = &(CPUCacheInfo) {
         .type = UNIFIED_CACHE,
@@ -2282,6 +2317,7 @@ static const CPUCaches epyc_genoa_cache_info = {
         .self_init = true,
         .inclusive = true,
         .complex_indexing = false,
+        .share_level = CPU_TOPO_LEVEL_DIE,
     },
 };
 
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 009950b87203..bfd40a9cd254 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1597,6 +1597,13 @@ typedef struct CPUCacheInfo {
      * address bits.  CPUID[4].EDX[bit 2].
      */
     bool complex_indexing;
+
+    /*
+     * Cache Topology. The level that cache is shared in.
+     * Used to encode CPUID[4].EAX[bits 25:14] or
+     * CPUID[0x8000001D].EAX[bits 25:14].
+     */
+    enum CPUTopoLevel share_level;
 } CPUCacheInfo;
 
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (12 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-14 14:31   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
Intel CPUs.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].

And since maximum_processor_id (original "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert.

Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v1:
 * Use "enum CPUTopoLevel share_level" as the parameter in
   max_processor_ids_for_cache().
 * Make cache_into_passthrough case also use
   max_processor_ids_for_cache() and max_core_ids_in_package() to
   encode CPUID[4]. (Yanan)
 * Rename the title of this patch (the original is "i386: Use
   CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
---
 target/i386/cpu.c | 70 +++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 81e07474acef..b23e8190dc68 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -235,22 +235,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo *cache)
                        ((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
                        0 /* Invalid value */)
 
+static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo *topo_info,
+                                            enum CPUTopoLevel share_level)
+{
+    uint32_t num_ids = 0;
+
+    switch (share_level) {
+    case CPU_TOPO_LEVEL_CORE:
+        num_ids = 1 << apicid_core_offset(topo_info);
+        break;
+    case CPU_TOPO_LEVEL_DIE:
+        num_ids = 1 << apicid_die_offset(topo_info);
+        break;
+    case CPU_TOPO_LEVEL_PACKAGE:
+        num_ids = 1 << apicid_pkg_offset(topo_info);
+        break;
+    default:
+        /*
+         * Currently there is no use case for SMT and MODULE, so use
+         * assert directly to facilitate debugging.
+         */
+        g_assert_not_reached();
+    }
+
+    return num_ids - 1;
+}
+
+static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
+{
+    uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
+                               apicid_core_offset(topo_info));
+    return num_cores - 1;
+}
 
 /* Encode cache info for CPUID[4] */
 static void encode_cache_cpuid4(CPUCacheInfo *cache,
-                                int num_apic_ids, int num_cores,
+                                X86CPUTopoInfo *topo_info,
                                 uint32_t *eax, uint32_t *ebx,
                                 uint32_t *ecx, uint32_t *edx)
 {
     assert(cache->size == cache->line_size * cache->associativity *
                           cache->partitions * cache->sets);
 
-    assert(num_apic_ids > 0);
     *eax = CACHE_TYPE(cache->type) |
            CACHE_LEVEL(cache->level) |
            (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
-           ((num_cores - 1) << 26) |
-           ((num_apic_ids - 1) << 14);
+           (max_core_ids_in_package(topo_info) << 26) |
+           (max_processor_ids_for_cache(topo_info, cache->share_level) << 14);
 
     assert(cache->line_size > 0);
     assert(cache->partitions > 0);
@@ -6263,56 +6294,41 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                 int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
 
                 if (cores_per_pkg > 1) {
-                    int addressable_cores_offset =
-                                                apicid_pkg_offset(&topo_info) -
-                                                apicid_core_offset(&topo_info);
-
                     *eax &= ~0xFC000000;
-                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
+                    *eax |= max_core_ids_in_package(&topo_info) << 26;
                 }
                 if (host_vcpus_per_cache > cpus_per_pkg) {
-                    int pkg_offset = apicid_pkg_offset(&topo_info);
-
                     *eax &= ~0x3FFC000;
-                    *eax |= (1 << (pkg_offset - 1)) << 14;
+                    *eax |=
+                        max_processor_ids_for_cache(&topo_info,
+                                                CPU_TOPO_LEVEL_PACKAGE) << 14;
                 }
             }
         } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
             *eax = *ebx = *ecx = *edx = 0;
         } else {
             *eax = 0;
-            int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
-                                           apicid_core_offset(&topo_info);
-            int core_offset, die_offset;
 
             switch (count) {
             case 0: /* L1 dcache info */
-                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-                                    (1 << core_offset),
-                                    (1 << addressable_cores_offset),
+                                    &topo_info,
                                     eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
-                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-                                    (1 << core_offset),
-                                    (1 << addressable_cores_offset),
+                                    &topo_info,
                                     eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */
-                core_offset = apicid_core_offset(&topo_info);
                 encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
-                                    (1 << core_offset),
-                                    (1 << addressable_cores_offset),
+                                    &topo_info,
                                     eax, ebx, ecx, edx);
                 break;
             case 3: /* L3 cache info */
-                die_offset = apicid_die_offset(&topo_info);
                 if (cpu->enable_l3_cache) {
                     encode_cache_cpuid4(env->cache_info_cpuid4.l3_cache,
-                                        (1 << die_offset),
-                                        (1 << addressable_cores_offset),
+                                        &topo_info,
                                         eax, ebx, ecx, edx);
                     break;
                 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (13 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-14 14:42   ` Xiaoyao Li
  2024-01-08  8:27 ` [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode " Zhao Liu
  2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
  16 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.

From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
means [1]:

The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:

ShareId = LocalApicId >> log2(NumSharingCache+1)

Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.

From the description above, the calculation of this field should be same
as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
APIC ID to calculate this field.

[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
     Information

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Rewrite the subject. (Babu)
 * Delete the original "comment/help" expression, as this behavior is
   confirmed for AMD CPUs. (Babu)
 * Rename "num_apic_ids" (v3) to "num_sharing_cache" to match spec
   definition. (Babu)

Changes since v1:
 * Rename "l3_threads" to "num_apic_ids" in
   encode_cache_cpuid8000001d(). (Yanan)
 * Add the description of the original commit and add Cc.
---
 target/i386/cpu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b23e8190dc68..8a4d72f6f760 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -483,7 +483,7 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
                                        uint32_t *eax, uint32_t *ebx,
                                        uint32_t *ecx, uint32_t *edx)
 {
-    uint32_t l3_threads;
+    uint32_t num_sharing_cache;
     assert(cache->size == cache->line_size * cache->associativity *
                           cache->partitions * cache->sets);
 
@@ -492,13 +492,11 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
 
     /* L3 is shared among multiple cores */
     if (cache->level == 3) {
-        l3_threads = topo_info->modules_per_die *
-                     topo_info->cores_per_module *
-                     topo_info->threads_per_core;
-        *eax |= (l3_threads - 1) << 14;
+        num_sharing_cache = 1 << apicid_die_offset(topo_info);
     } else {
-        *eax |= ((topo_info->threads_per_core - 1) << 14);
+        num_sharing_cache = 1 << apicid_core_offset(topo_info);
     }
+    *eax |= (num_sharing_cache - 1) << 14;
 
     assert(cache->line_size > 0);
     assert(cache->partitions > 0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (14 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
@ 2024-01-08  8:27 ` Zhao Liu
  2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
  16 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-08  8:27 UTC (permalink / raw)
  To: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

From: Zhao Liu <zhao1.liu@intel.com>

CPUID[0x8000001D].EAX[bits 25:14] NumSharingCache: number of logical
processors sharing cache.

The number of logical processors sharing this cache is
NumSharingCache + 1.

After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[0x8000001D].EAX[bits 25:14].

Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
Changes since v3:
 * Explain what "CPUID[0x8000001D].EAX[bits 25:14]" means in the commit
   message. (Babu)

Changes since v1:
 * Use cache->share_level as the parameter in
   max_processor_ids_for_cache().
---
 target/i386/cpu.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 8a4d72f6f760..4688b5d584bb 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -483,20 +483,12 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
                                        uint32_t *eax, uint32_t *ebx,
                                        uint32_t *ecx, uint32_t *edx)
 {
-    uint32_t num_sharing_cache;
     assert(cache->size == cache->line_size * cache->associativity *
                           cache->partitions * cache->sets);
 
     *eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
                (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
-
-    /* L3 is shared among multiple cores */
-    if (cache->level == 3) {
-        num_sharing_cache = 1 << apicid_die_offset(topo_info);
-    } else {
-        num_sharing_cache = 1 << apicid_core_offset(topo_info);
-    }
-    *eax |= (num_sharing_cache - 1) << 14;
+    *eax |= max_processor_ids_for_cache(topo_info, cache->share_level) << 14;
 
     assert(cache->line_size > 0);
     assert(cache->partitions > 0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 00/16] Support smp.clusters for x86 in QEMU
  2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
                   ` (15 preceding siblings ...)
  2024-01-08  8:27 ` [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode " Zhao Liu
@ 2024-01-08 17:46 ` Moger, Babu
  2024-01-09  1:48   ` Zhao Liu
  16 siblings, 1 reply; 68+ messages in thread
From: Moger, Babu @ 2024-01-08 17:46 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu

Hi  Zhao,

Ran few basic tests on AMD systems. Changes look good.

Thanks
Babu


Tested-by: Babu Moger <babu.moger@amd.com>


On 1/8/24 02:27, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Hi list,
> 
> This is the our v7 patch series, rebased on the master branch at the
> commit d328fef93ae7 ("Merge tag 'pull-20231230' of
> https://gitlab.com/rth7680/qemu into staging").
> 
> No more change since v6 [1] exclude the comment nit update.
> 
> Welcome your comments!
> 
> 
> PS: Since v5, we have dropped "x-l2-cache-topo" option and now are
> working on porting the original x-l2-cache-topo option to smp [2].
> Just like:
> 
> -smp cpus=4,sockets=2,cores=2,threads=1, \
>      l3-cache=socket,l2-cache=core,l1-i-cache=core,l1-d-cache=core
> 
> The cache topology enhancement in this patch set is the preparation for
> supporting future user-configurable cache topology (via generic cli
> interface).
> 
> 
> ---
> # Introduction
> 
> This series adds the cluster support for x86 PC machine, which allows
> x86 can use smp.clusters to configure the module level CPU topology
> of x86.
> 
> This series also is the preparation to help x86 to define the more
> flexible cache topology, such as having multiple cores share the
> same L2 cache at cluster level. (That was what x-l2-cache-topo did,
> and we will explore a generic way.)
> 
> About why we don't share L2 cache at cluster and need a configuration
> way, pls see section: ## Why not share L2 cache in cluster directly.
> 
> 
> # Background
> 
> The "clusters" parameter in "smp" is introduced by ARM [3], but x86
> hasn't supported it.
> 
> At present, x86 defaults L2 cache is shared in one core, but this is
> not enough. There're some platforms that multiple cores share the
> same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
> Atom cores [4], that is, every four Atom cores shares one L2 cache.
> Therefore, we need the new CPU topology level (cluster/module).
> 
> Another reason is for hybrid architecture. cluster support not only
> provides another level of topology definition in x86, but would also
> provide required code change for future our hybrid topology support.
> 
> 
> # Overview
> 
> ## Introduction of module level for x86
> 
> "cluster" in smp is the CPU topology level which is between "core" and
> die.
> 
> For x86, the "cluster" in smp is corresponding to the module level [4],
> which is above the core level. So use the "module" other than "cluster"
> in x86 code.
> 
> And please note that x86 already has a cpu topology level also named
> "cluster" [5], this level is at the upper level of the package. Here,
> the cluster in x86 cpu topology is completely different from the
> "clusters" as the smp parameter. After the module level is introduced,
> the cluster as the smp parameter will actually refer to the module level
> of x86.
> 
> 
> ## Why not share L2 cache in cluster directly
> 
> Though "clusters" was introduced to help define L2 cache topology
> [3], using cluster to define x86's L2 cache topology will cause the
> compatibility problem:
> 
> Currently, x86 defaults that the L2 cache is shared in one core, which
> actually implies a default setting "cores per L2 cache is 1" and
> therefore implicitly defaults to having as many L2 caches as cores.
> 
> For example (i386 PC machine):
> -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)
> 
> Considering the topology of the L2 cache, this (*) implicitly means "1
> core per L2 cache" and "2 L2 caches per die".
> 
> If we use cluster to configure L2 cache topology with the new default
> setting "clusters per L2 cache is 1", the above semantics will change
> to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
> cores per L2 cache".
> 
> So the same command (*) will cause changes in the L2 cache topology,
> further affecting the performance of the virtual machine.
> 
> Therefore, x86 should only treat cluster as a cpu topology level and
> avoid using it to change L2 cache by default for compatibility.
> 
> 
> ## module level in CPUID
> 
> Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> handle platforms with Module level enumerated via CPUID.1F.
> 
> Expose the module level in CPUID[0x1F] (for Intel CPUs) if the machine
> has more than 1 modules since v3.
> 
> 
> ## New cache topology info in CPUCacheInfo
> 
> (This is in preparation for users being able to configure cache topology
> from the cli later on.)
> 
> Currently, by default, the cache topology is encoded as:
> 1. i/d cache is shared in one core.
> 2. L2 cache is shared in one core.
> 3. L3 cache is shared in one die.
> 
> This default general setting has caused a misunderstanding, that is, the
> cache topology is completely equated with a specific cpu topology, such
> as the connection between L2 cache and core level, and the connection
> between L3 cache and die level.
> 
> In fact, the settings of these topologies depend on the specific
> platform and are not static. For example, on Alder Lake-P, every
> four Atom cores share the same L2 cache [3].
> 
> Thus, in this patch set, we explicitly define the corresponding cache
> topology for different cpu models and this has two benefits:
> 1. Easy to expand to new CPU models in the future, which has different
>    cache topology.
> 2. It can easily support custom cache topology by some command.
> 
> 
> # Patch description
> 
> patch 1 Fixes about x86 topology and Intel l1 cache topology.
> 
> patch 2-3 Cleanups about topology related CPUID encoding and QEMU
>           topology variables.
> 
> patch 4-5 Refactor CPUID[0x1F] encoding to prepare to introduce module
>           level.
> 
> patch 6-12 Add the module as the new CPU topology level in x86, and it
>             is corresponding to the cluster level in generic code.
> 
> patch 13,14,16 Add cache topology information in cache models.
> 
> patch 15 Update AMD CPUs' cache topology encoding.
> 
> 
> [1]: https://lore.kernel.org/qemu-devel/20231117075106.432499-1-zhao1.liu@linux.intel.com/
> [2]: https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg01954.html
> [3]: https://patchew.org/QEMU/20211228092221.21068-1-wangyanan55@huawei.com/
> [4]: https://www.intel.com/content/www/us/en/products/platforms/details/alder-lake-p.html
> [5]: SDM, vol.3, ch.9, 9.9.1 Hierarchical Mapping of Shared Resources.
> 
> Best Regards,
> Zhao
> ---
> Changelog:
> 
> Changes since v6:
>  * Update the comment when check cluster-id. Since there's no
>    v8.2, the cluster-id support should at least start from v9.0.
>  * Rebase on commit d328fef93ae7 ("Merge tag 'pull-20231230' of
>    https://gitlab.com/rth7680/qemu into staging").
> 
> Changes since v5:
>  * The first four patches of v5 [1] have been merged, v6 contains
>    the remaining patches.
>  * Reabse on the latest master.
>  * Update the comment when check cluster-id. Since current QEMU is
>    v8.2, the cluster-id support should at least start from v8.3.
> 
> Changes since v4:
>  * Drop the "x-l2-cache-topo" option. (Michael)
>  * Add A/R/T tags.
> 
> Changes since v3 (main changes):
>  * Expose module level in CPUID[0x1F].
>  * Fix compile warnings. (Babu)
>  * Fixes cache topology uninitialization bugs for some AMD CPUs. (Babu)
> 
> Changes since v2:
>  * Add "Tested-by", "Reviewed-by" and "ACKed-by" tags.
>  * Use newly added wrapped helper to get cores per socket in
>    qemu_init_vcpu().
> 
> Changes since v1:
>  * Reordered patches. (Yanan)
>  * Deprecated the patch to fix comment of machine_parse_smp_config().
>    (Yanan)
>  * Rename test-x86-cpuid.c to test-x86-topo.c. (Yanan)
>  * Split the intel's l1 cache topology fix into a new separate patch.
>    (Yanan)
>  * Combined module_id and APIC ID for module level support into one
>    patch. (Yanan)
>  * Make cache_into_passthrough case of cpuid 0x04 leaf in
>  * cpu_x86_cpuid() use max_processor_ids_for_cache() and
>    max_core_ids_in_package() to encode CPUID[4]. (Yanan)
>  * Add the prefix "CPU_TOPO_LEVEL_*" for CPU topology level names.
>    (Yanan)
> 
> ---
> Zhao Liu (10):
>   i386/cpu: Fix i/d-cache topology to core level for Intel CPU
>   i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
>   i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
>   i386: Split topology types of CPUID[0x1F] from the definitions of
>     CPUID[0xB]
>   i386: Decouple CPUID[0x1F] subleaf with specific topology level
>   i386: Expose module level in CPUID[0x1F]
>   i386: Add cache topology info in CPUCacheInfo
>   i386: Use CPUCacheInfo.share_level to encode CPUID[4]
>   i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits
>     25:14]
>   i386: Use CPUCacheInfo.share_level to encode
>     CPUID[0x8000001D].EAX[bits 25:14]
> 
> Zhuocheng Ding (6):
>   i386: Introduce module-level cpu topology to CPUX86State
>   i386: Support modules_per_die in X86CPUTopoInfo
>   i386: Support module_id in X86CPUTopoIDs
>   i386/cpu: Introduce cluster-id to X86CPU
>   tests: Add test case of APIC ID for module level parsing
>   hw/i386/pc: Support smp.clusters for x86 PC machine
> 
>  hw/i386/pc.c               |   1 +
>  hw/i386/x86.c              |  49 ++++++-
>  include/hw/i386/topology.h |  35 ++++-
>  qemu-options.hx            |  10 +-
>  target/i386/cpu.c          | 289 +++++++++++++++++++++++++++++--------
>  target/i386/cpu.h          |  43 +++++-
>  target/i386/kvm/kvm.c      |   2 +-
>  tests/unit/test-x86-topo.c |  56 ++++---
>  8 files changed, 379 insertions(+), 106 deletions(-)
> 

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 00/16] Support smp.clusters for x86 in QEMU
  2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
@ 2024-01-09  1:48   ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-09  1:48 UTC (permalink / raw)
  To: Moger, Babu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu

Hi Babu,

On Mon, Jan 08, 2024 at 11:46:50AM -0600, Moger, Babu wrote:
> Date: Mon, 8 Jan 2024 11:46:50 -0600
> From: "Moger, Babu" <babu.moger@amd.com>
> Subject: Re: [PATCH v7 00/16] Support smp.clusters for x86 in QEMU
> 
> Hi  Zhao,
> 
> Ran few basic tests on AMD systems. Changes look good.
> 
> Thanks
> Babu
> 
> 
> Tested-by: Babu Moger <babu.moger@amd.com>
> 

Thanks much for your test!

Regards,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
@ 2024-01-10  9:31   ` Xiaoyao Li
  2024-01-11  8:43     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-10  9:31 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Robert Hoo, Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
> CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
> nearest power-of-2 integer.
> 
> The nearest power-of-2 integer can be calculated by pow2ceil() or by
> using APIC ID offset (like L3 topology using 1 << die_offset [3]).
> 
> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> are associated with APIC ID. For example, in linux kernel, the field
> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. 

And for
> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> matched with actual core numbers and it's calculated by:
> "(1 << (pkg_offset - core_offset)) - 1".

could you elaborate it more? what is the value of actual core numbers on 
Alder lake P? and what is the pkg_offset and core_offset?

> Therefore the offset of APIC ID should be preferred to calculate nearest
> power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
> 31:26]:
> 1. d/i cache is shared in a core, 1 << core_offset should be used
>     instand of "cs->nr_threads" in encode_cache_cpuid4() for

/s/instand/instead

>     CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
> 2. L2 cache is supposed to be shared in a core as for now, thereby
>     1 << core_offset should also be used instand of "cs->nr_threads" in

ditto

>     encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
> 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
>     calculated with the bit width between the Package and SMT levels in
>     the APIC ID (1 << (pkg_offset - core_offset) - 1).
> 
> In addition, use APIC ID offset to replace "pow2ceil()" for
> cache_info_passthrough case.
> 
> [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
> [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
> [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
> 
> Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * Fix compile warnings. (Babu)
>   * Fix spelling typo.
> 
> Changes since v1:
>   * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
>     case. (Yanan)
>   * Split the L1 cache fix into a separate patch.
>   * Rename the title of this patch (the original is "i386/cpu: Fix number
>     of addressable IDs in CPUID.04H").
> ---
>   target/i386/cpu.c | 30 +++++++++++++++++++++++-------
>   1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 5a3678a789cf..c8d2a585723a 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>   {
>       X86CPU *cpu = env_archcpu(env);
>       CPUState *cs = env_cpu(env);
> -    uint32_t die_offset;
>       uint32_t limit;
>       uint32_t signature[3];
>       X86CPUTopoInfo topo_info;
> @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
>                   int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
>                   if (cs->nr_cores > 1) {
> +                    int addressable_cores_offset =
> +                                                apicid_pkg_offset(&topo_info) -
> +                                                apicid_core_offset(&topo_info);
> +
>                       *eax &= ~0xFC000000;
> -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
> +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;

it should be ((1 << addressable_cores_offset) - 1) << 26

I think naming it addressable_cores_width is better than 
addressable_cores_offset. It's not offset because offset means the bit 
position from bit 0.

And we can get the width by another algorithm:

int addressable_cores_width = apicid_core_width(&topo_info) + 
apicid_die_width(&topo_info);
*eax |= ((1 << addressable_cores_width) - 1)) << 26;
		
>                   }
>                   if (host_vcpus_per_cache > vcpus_per_socket) {
> +                    int pkg_offset = apicid_pkg_offset(&topo_info);
> +
>                       *eax &= ~0x3FFC000;
> -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> +                    *eax |= (1 << (pkg_offset - 1)) << 14;

Ditto, ((1 << pkg_offset) - 1) << 14

For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's 
intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose 
vcpus_per_cache (configured by users) to VM.

>                   }
>               }
>           } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
>               *eax = *ebx = *ecx = *edx = 0;
>           } else {
>               *eax = 0;
> +            int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
> +                                           apicid_core_offset(&topo_info);
> +            int core_offset, die_offset;
> +
>               switch (count) {
>               case 0: /* L1 dcache info */
> +                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> -                                    cs->nr_threads, cs->nr_cores,
> +                                    (1 << core_offset),
> +                                    (1 << addressable_cores_offset),
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 1: /* L1 icache info */
> +                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> -                                    cs->nr_threads, cs->nr_cores,
> +                                    (1 << core_offset),
> +                                    (1 << addressable_cores_offset),
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 2: /* L2 cache info */
> +                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
> -                                    cs->nr_threads, cs->nr_cores,
> +                                    (1 << core_offset),
> +                                    (1 << addressable_cores_offset),
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 3: /* L3 cache info */
>                   die_offset = apicid_die_offset(&topo_info);
>                   if (cpu->enable_l3_cache) {
>                       encode_cache_cpuid4(env->cache_info_cpuid4.l3_cache,
> -                                        (1 << die_offset), cs->nr_cores,
> +                                        (1 << die_offset),
> +                                        (1 << addressable_cores_offset),
>                                           eax, ebx, ecx, edx);
>                       break;
>                   }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
@ 2024-01-10 11:52   ` Xiaoyao Li
  2024-01-11  8:46     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-10 11:52 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Robert Hoo, Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> In cpu_x86_cpuid(), there are many variables in representing the cpu
> topology, e.g., topo_info, cs->nr_cores/cs->nr_threads.

Please use comma instead of slash. cs->nr_cores/cs->nr_threads looks 
like one variable.

> Since the names of cs->nr_cores/cs->nr_threads does not accurately
> represent its meaning, the use of cs->nr_cores/cs->nr_threads is prone
> to confusion and mistakes.
> 
> And the structure X86CPUTopoInfo names its members clearly, thus the
> variable "topo_info" should be preferred.
> 
> In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
> replace env->dies with topo_info.dies_per_pkg as well.
> 
> Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * Fix typo. (Babu)
> 
> Changes since v1:
>   * Extract cores_per_socket from the code block and use it as a local
>     variable for cpu_x86_cpuid(). (Yanan)
>   * Remove vcpus_per_socket variable and use cpus_per_pkg directly.
>     (Yanan)
>   * Replace env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
> ---
>   target/i386/cpu.c | 31 ++++++++++++++++++-------------
>   1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index c8d2a585723a..6f8fa772ecf8 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6017,11 +6017,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>       uint32_t limit;
>       uint32_t signature[3];
>       X86CPUTopoInfo topo_info;
> +    uint32_t cores_per_pkg;
> +    uint32_t cpus_per_pkg;

I prefer to lps_per_pkg or threads_per_pkg.

Other than it,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

>   
>       topo_info.dies_per_pkg = env->nr_dies;
>       topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
>       topo_info.threads_per_core = cs->nr_threads;
>   
> +    cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
> +    cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
> +
>       /* Calculate & apply limits for different index ranges */
>       if (index >= 0xC0000000) {
>           limit = env->cpuid_xlevel2;
> @@ -6057,8 +6062,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               *ecx |= CPUID_EXT_OSXSAVE;
>           }
>           *edx = env->features[FEAT_1_EDX];
> -        if (cs->nr_cores * cs->nr_threads > 1) {
> -            *ebx |= (cs->nr_cores * cs->nr_threads) << 16;
> +        if (cpus_per_pkg > 1) {
> +            *ebx |= cpus_per_pkg << 16;
>               *edx |= CPUID_HT;
>           }
>           if (!cpu->enable_pmu) {
> @@ -6095,8 +6100,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                */
>               if (*eax & 31) {
>                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> -                int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
> -                if (cs->nr_cores > 1) {
> +
> +                if (cores_per_pkg > 1) {
>                       int addressable_cores_offset =
>                                                   apicid_pkg_offset(&topo_info) -
>                                                   apicid_core_offset(&topo_info);
> @@ -6104,7 +6109,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                       *eax &= ~0xFC000000;
>                       *eax |= (1 << (addressable_cores_offset - 1)) << 26;
>                   }
> -                if (host_vcpus_per_cache > vcpus_per_socket) {
> +                if (host_vcpus_per_cache > cpus_per_pkg) {
>                       int pkg_offset = apicid_pkg_offset(&topo_info);
>   
>                       *eax &= ~0x3FFC000;
> @@ -6249,12 +6254,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           switch (count) {
>           case 0:
>               *eax = apicid_core_offset(&topo_info);
> -            *ebx = cs->nr_threads;
> +            *ebx = topo_info.threads_per_core;
>               *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
>               break;
>           case 1:
>               *eax = apicid_pkg_offset(&topo_info);
> -            *ebx = cs->nr_cores * cs->nr_threads;
> +            *ebx = cpus_per_pkg;
>               *ecx |= CPUID_TOPOLOGY_LEVEL_CORE;
>               break;
>           default:
> @@ -6274,7 +6279,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           break;
>       case 0x1F:
>           /* V2 Extended Topology Enumeration Leaf */
> -        if (env->nr_dies < 2) {
> +        if (topo_info.dies_per_pkg < 2) {
>               *eax = *ebx = *ecx = *edx = 0;
>               break;
>           }
> @@ -6284,7 +6289,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           switch (count) {
>           case 0:
>               *eax = apicid_core_offset(&topo_info);
> -            *ebx = cs->nr_threads;
> +            *ebx = topo_info.threads_per_core;
>               *ecx |= CPUID_TOPOLOGY_LEVEL_SMT;
>               break;
>           case 1:
> @@ -6294,7 +6299,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               break;
>           case 2:
>               *eax = apicid_pkg_offset(&topo_info);
> -            *ebx = cs->nr_cores * cs->nr_threads;
> +            *ebx = cpus_per_pkg;
>               *ecx |= CPUID_TOPOLOGY_LEVEL_DIE;
>               break;
>           default:
> @@ -6518,7 +6523,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>            * discards multiple thread information if it is set.
>            * So don't set it here for Intel to make Linux guests happy.
>            */
> -        if (cs->nr_cores * cs->nr_threads > 1) {
> +        if (cpus_per_pkg > 1) {
>               if (env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1 ||
>                   env->cpuid_vendor2 != CPUID_VENDOR_INTEL_2 ||
>                   env->cpuid_vendor3 != CPUID_VENDOR_INTEL_3) {
> @@ -6584,7 +6589,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                *eax |= (cpu_x86_virtual_addr_width(env) << 8);
>           }
>           *ebx = env->features[FEAT_8000_0008_EBX];
> -        if (cs->nr_cores * cs->nr_threads > 1) {
> +        if (cpus_per_pkg > 1) {
>               /*
>                * Bits 15:12 is "The number of bits in the initial
>                * Core::X86::Apic::ApicId[ApicId] value that indicate
> @@ -6592,7 +6597,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                * Bits 7:0 is "The number of threads in the package is NC+1"
>                */
>               *ecx = (apicid_pkg_offset(&topo_info) << 12) |
> -                   ((cs->nr_cores * cs->nr_threads) - 1);
> +                   (cpus_per_pkg - 1);
>           } else {
>               *ecx = 0;
>           }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level
  2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
@ 2024-01-11  3:19   ` Xiaoyao Li
  2024-01-11  9:07     ` Zhao Liu
  2024-01-23  9:56     ` Zhao Liu
  0 siblings, 2 replies; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-11  3:19 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.
> 
> In fact, the specific topology level exposed in 0x1F depends on the
> platform's support for extension levels (module, tile and die).
> 
> To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
> with specific topology level.
> 
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * New patch to prepare to expose module level in 0x1F.
>   * Move the CPUTopoLevel enumeration definition from "i386: Add cache
>     topology info in CPUCacheInfo" to this patch. Note, to align with
>     topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
>     CPU_TOPO_LEVEL_INVALID.
> ---
>   target/i386/cpu.c | 136 +++++++++++++++++++++++++++++++++++++---------
>   target/i386/cpu.h |  15 +++++
>   2 files changed, 126 insertions(+), 25 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index bc440477d13d..5c295c9a9e2d 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -269,6 +269,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
>              (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
>   }
>   
> +static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> +                                       enum CPUTopoLevel topo_level)
> +{
> +    switch (topo_level) {
> +    case CPU_TOPO_LEVEL_SMT:
> +        return 1;
> +    case CPU_TOPO_LEVEL_CORE:
> +        return topo_info->threads_per_core;
> +    case CPU_TOPO_LEVEL_DIE:
> +        return topo_info->threads_per_core * topo_info->cores_per_die;
> +    case CPU_TOPO_LEVEL_PACKAGE:
> +        return topo_info->threads_per_core * topo_info->cores_per_die *
> +               topo_info->dies_per_pkg;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    return 0;
> +}
> +
> +static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> +                                            enum CPUTopoLevel topo_level)
> +{
> +    switch (topo_level) {
> +    case CPU_TOPO_LEVEL_SMT:
> +        return 0;
> +    case CPU_TOPO_LEVEL_CORE:
> +        return apicid_core_offset(topo_info);
> +    case CPU_TOPO_LEVEL_DIE:
> +        return apicid_die_offset(topo_info);
> +    case CPU_TOPO_LEVEL_PACKAGE:
> +        return apicid_pkg_offset(topo_info);
> +    default:
> +        g_assert_not_reached();
> +    }
> +    return 0;
> +}
> +
> +static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> +{
> +    switch (topo_level) {
> +    case CPU_TOPO_LEVEL_INVALID:
> +        return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
> +    case CPU_TOPO_LEVEL_SMT:
> +        return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> +    case CPU_TOPO_LEVEL_CORE:
> +        return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> +    case CPU_TOPO_LEVEL_DIE:
> +        return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> +    default:
> +        /* Other types are not supported in QEMU. */
> +        g_assert_not_reached();
> +    }
> +    return 0;
> +}
> +
> +static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> +                                X86CPUTopoInfo *topo_info,
> +                                uint32_t *eax, uint32_t *ebx,
> +                                uint32_t *ecx, uint32_t *edx)
> +{
> +    static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
> +    X86CPU *cpu = env_archcpu(env);
> +    unsigned long level, next_level;
> +    uint32_t num_cpus_next_level, offset_next_level;

again, I dislike the name of cpus to represent the logical process or 
thread. we can call it, num_lps_next_level, or num_threads_next_level;

> +
> +    /*
> +     * Initialize the bitmap to decide which levels should be
> +     * encoded in 0x1f.
> +     */
> +    if (!count) {

using static bitmap and initialize the bitmap on (count == 0), looks bad 
to me. It highly relies on the order of how encode_topo_cpuid1f() is 
called, and fragile.

Instead, we can maintain an array in CPUX86State, e.g.,

--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1904,6 +1904,8 @@ typedef struct CPUArchState {

      /* Number of dies within this CPU package. */
      unsigned nr_dies;
+
+    unint8_t valid_cpu_topo[CPU_TOPO_LEVEL_MAX];
  } CPUX86State;


and initialize it as below, when initializing the env

env->valid_cpu_topo[0] = CPU_TOPO_LEVEL_SMT;
env->valid_cpu_topo[1] = CPU_TOPO_LEVEL_CORE;
if (env->nr_dies > 1) {
	env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_DIE;
}

then in encode_topo_cpuid1f(), we can get level and next_level as

level = env->valid_cpu_topo[count];
next_level = env->valid_cpu_topo[count + 1];


> +        /* SMT and core levels are exposed in 0x1f leaf by default. */
> +        set_bit(CPU_TOPO_LEVEL_SMT, topo_bitmap);
> +        set_bit(CPU_TOPO_LEVEL_CORE, topo_bitmap);
> +
> +        if (env->nr_dies > 1) {
> +            set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> +        }
> +    }
> +
> +    *ecx = count & 0xff;
> +    *edx = cpu->apic_id;
> +
> +    level = find_first_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX);
> +    if (level == CPU_TOPO_LEVEL_MAX) {
> +        num_cpus_next_level = 0;
> +        offset_next_level = 0;
> +
> +        /* Encode CPU_TOPO_LEVEL_INVALID into the last subleaf of 0x1f. */
> +        level = CPU_TOPO_LEVEL_INVALID;
> +    } else {
> +        next_level = find_next_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX, level + 1);
> +        if (next_level == CPU_TOPO_LEVEL_MAX) {
> +            next_level = CPU_TOPO_LEVEL_PACKAGE;
> +        }
> +
> +        num_cpus_next_level = num_cpus_by_topo_level(topo_info, next_level);
> +        offset_next_level = apicid_offset_by_topo_level(topo_info, next_level);
> +    }
> +
> +    *eax = offset_next_level;
> +    *ebx = num_cpus_next_level;
> +    *ecx |= cpuid1f_topo_type(level) << 8;
> +
> +    assert(!(*eax & ~0x1f));
> +    *ebx &= 0xffff; /* The count doesn't need to be reliable. */
> +    if (level != CPU_TOPO_LEVEL_MAX) {
> +        clear_bit(level, topo_bitmap);
> +    }
> +}
> +
>   /* Encode cache info for CPUID[0x80000005].ECX or CPUID[0x80000005].EDX */
>   static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
>   {
> @@ -6284,31 +6394,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               break;
>           }
>   
> -        *ecx = count & 0xff;
> -        *edx = cpu->apic_id;
> -        switch (count) {
> -        case 0:
> -            *eax = apicid_core_offset(&topo_info);
> -            *ebx = topo_info.threads_per_core;
> -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
> -            break;
> -        case 1:
> -            *eax = apicid_die_offset(&topo_info);
> -            *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
> -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
> -            break;
> -        case 2:
> -            *eax = apicid_pkg_offset(&topo_info);
> -            *ebx = cpus_per_pkg;
> -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
> -            break;
> -        default:
> -            *eax = 0;
> -            *ebx = 0;
> -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
> -        }
> -        assert(!(*eax & ~0x1f));
> -        *ebx &= 0xffff; /* The count doesn't need to be reliable. */
> +        encode_topo_cpuid1f(env, count, &topo_info, eax, ebx, ecx, edx);
>           break;
>       case 0xD: {
>           /* Processor Extended State */
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index f47bad46db5e..9c78cfc3f322 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1008,6 +1008,21 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_MWAIT_IBE     (1U << 1) /* Interrupts can exit capability */
>   #define CPUID_MWAIT_EMX     (1U << 0) /* enumeration supported */
>   
> +/*
> + * CPUTopoLevel is the general i386 topology hierarchical representation,
> + * ordered by increasing hierarchical relationship.
> + * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
> + * or AMD (CPUID[0x80000026]).
> + */
> +enum CPUTopoLevel {
> +    CPU_TOPO_LEVEL_INVALID,
> +    CPU_TOPO_LEVEL_SMT,
> +    CPU_TOPO_LEVEL_CORE,
> +    CPU_TOPO_LEVEL_DIE,
> +    CPU_TOPO_LEVEL_PACKAGE,
> +    CPU_TOPO_LEVEL_MAX,
> +};
> +
>   /* CPUID[0xB].ECX level types */
>   #define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
>   #define CPUID_B_ECX_TOPO_LEVEL_SMT      1



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo
  2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
@ 2024-01-11  5:53   ` Xiaoyao Li
  2024-01-11  9:18     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-11  5:53 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> 
> Support module level in i386 cpu topology structure "X86CPUTopoInfo".
> 
> Since x86 does not yet support the "clusters" parameter in "-smp",
> X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the
> module level width in APIC ID, which can be calculated by
> "apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0
> for now, so we can directly add APIC ID related helpers to support
> module level parsing.
> 
> In addition, update topology structure in test-x86-topo.c.
> 
> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
> Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * Drop the description about not exposing module level in commit
>     message.
>   * Update topology related calculation in newly added helpers:
>     num_cpus_by_topo_level() and apicid_offset_by_topo_level().
> 
> Changes since v1:
>   * Include module level related helpers (apicid_module_width() and
>     apicid_module_offset()) in this patch. (Yanan)
> ---
>   hw/i386/x86.c              |  3 ++-
>   include/hw/i386/topology.h | 22 +++++++++++++++----
>   target/i386/cpu.c          | 17 +++++++++-----
>   tests/unit/test-x86-topo.c | 45 ++++++++++++++++++++------------------
>   4 files changed, 55 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 1d19a8c609b1..85b847ac7914 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -72,7 +72,8 @@ static void init_topo_info(X86CPUTopoInfo *topo_info,
>       MachineState *ms = MACHINE(x86ms);
>   
>       topo_info->dies_per_pkg = ms->smp.dies;
> -    topo_info->cores_per_die = ms->smp.cores;
> +    topo_info->modules_per_die = ms->smp.clusters;
> +    topo_info->cores_per_module = ms->smp.cores;
>       topo_info->threads_per_core = ms->smp.threads;
>   }
>   
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index d4eeb7ab8290..517e51768c13 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -56,7 +56,8 @@ typedef struct X86CPUTopoIDs {
>   
>   typedef struct X86CPUTopoInfo {
>       unsigned dies_per_pkg;
> -    unsigned cores_per_die;
> +    unsigned modules_per_die;
> +    unsigned cores_per_module;
>       unsigned threads_per_core;
>   } X86CPUTopoInfo;
>   
> @@ -77,7 +78,13 @@ static inline unsigned apicid_smt_width(X86CPUTopoInfo *topo_info)
>   /* Bit width of the Core_ID field */
>   static inline unsigned apicid_core_width(X86CPUTopoInfo *topo_info)
>   {
> -    return apicid_bitwidth_for_count(topo_info->cores_per_die);
> +    return apicid_bitwidth_for_count(topo_info->cores_per_module);
> +}
> +
> +/* Bit width of the Module_ID (cluster ID) field */
> +static inline unsigned apicid_module_width(X86CPUTopoInfo *topo_info)
> +{
> +    return apicid_bitwidth_for_count(topo_info->modules_per_die);
>   }
>   
>   /* Bit width of the Die_ID field */
> @@ -92,10 +99,16 @@ static inline unsigned apicid_core_offset(X86CPUTopoInfo *topo_info)
>       return apicid_smt_width(topo_info);
>   }
>   
> +/* Bit offset of the Module_ID (cluster ID) field */
> +static inline unsigned apicid_module_offset(X86CPUTopoInfo *topo_info)
> +{
> +    return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
> +}
> +
>   /* Bit offset of the Die_ID field */
>   static inline unsigned apicid_die_offset(X86CPUTopoInfo *topo_info)
>   {
> -    return apicid_core_offset(topo_info) + apicid_core_width(topo_info);
> +    return apicid_module_offset(topo_info) + apicid_module_width(topo_info);
>   }
>   
>   /* Bit offset of the Pkg_ID (socket ID) field */
> @@ -127,7 +140,8 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
>                                            X86CPUTopoIDs *topo_ids)
>   {
>       unsigned nr_dies = topo_info->dies_per_pkg;
> -    unsigned nr_cores = topo_info->cores_per_die;
> +    unsigned nr_cores = topo_info->cores_per_module *
> +                        topo_info->modules_per_die;
>       unsigned nr_threads = topo_info->threads_per_core;
>   
>       topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 0a2ce9b92b1f..294ca6b8947a 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -278,10 +278,11 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
>       case CPU_TOPO_LEVEL_CORE:
>           return topo_info->threads_per_core;
>       case CPU_TOPO_LEVEL_DIE:
> -        return topo_info->threads_per_core * topo_info->cores_per_die;
> +        return topo_info->threads_per_core * topo_info->cores_per_module *
> +               topo_info->modules_per_die;
>       case CPU_TOPO_LEVEL_PACKAGE:
> -        return topo_info->threads_per_core * topo_info->cores_per_die *
> -               topo_info->dies_per_pkg;
> +        return topo_info->threads_per_core * topo_info->cores_per_module *
> +               topo_info->modules_per_die * topo_info->dies_per_pkg;
>       default:
>           g_assert_not_reached();
>       }
> @@ -450,7 +451,9 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
>   
>       /* L3 is shared among multiple cores */
>       if (cache->level == 3) {
> -        l3_threads = topo_info->cores_per_die * topo_info->threads_per_core;
> +        l3_threads = topo_info->modules_per_die *
> +                     topo_info->cores_per_module *
> +                     topo_info->threads_per_core;
>           *eax |= (l3_threads - 1) << 14;
>       } else {
>           *eax |= ((topo_info->threads_per_core - 1) << 14);
> @@ -6131,10 +6134,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>       uint32_t cpus_per_pkg;
>   
>       topo_info.dies_per_pkg = env->nr_dies;
> -    topo_info.cores_per_die = cs->nr_cores / env->nr_dies;
> +    topo_info.modules_per_die = env->nr_modules;
> +    topo_info.cores_per_module = cs->nr_cores / env->nr_dies / env->nr_modules;
>       topo_info.threads_per_core = cs->nr_threads;
>   
> -    cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
> +    cores_per_pkg = topo_info.cores_per_module * topo_info.modules_per_die *
> +                    topo_info.dies_per_pkg;

Nit. maybe we can introduce some helper function like

static inline uint32_t topo_info_cores_per_pkg(X86CPUTopoInfo *topo_info) {
	return topo_info.cores_per_module * topo_info.modules_per_die *
                topo_info.dies_per_pkg;
}

so we don't need to care how it calculates.

Besides,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

>       cpus_per_pkg = cores_per_pkg * topo_info.threads_per_core;
>   
>       /* Calculate & apply limits for different index ranges */
> diff --git a/tests/unit/test-x86-topo.c b/tests/unit/test-x86-topo.c
> index 2b104f86d7c2..f21b8a5d95c2 100644
> --- a/tests/unit/test-x86-topo.c
> +++ b/tests/unit/test-x86-topo.c
> @@ -30,13 +30,16 @@ static void test_topo_bits(void)
>   {
>       X86CPUTopoInfo topo_info = {0};
>   
> -    /* simple tests for 1 thread per core, 1 core per die, 1 die per package */
> -    topo_info = (X86CPUTopoInfo) {1, 1, 1};
> +    /*
> +     * simple tests for 1 thread per core, 1 core per module,
> +     *                  1 module per die, 1 die per package
> +     */
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 0);
>       g_assert_cmpuint(apicid_core_width(&topo_info), ==, 0);
>       g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
>   
> -    topo_info = (X86CPUTopoInfo) {1, 1, 1};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 1};
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 0), ==, 0);
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1), ==, 1);
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 2), ==, 2);
> @@ -45,39 +48,39 @@ static void test_topo_bits(void)
>   
>       /* Test field width calculation for multiple values
>        */
> -    topo_info = (X86CPUTopoInfo) {1, 1, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 2};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 1);
> -    topo_info = (X86CPUTopoInfo) {1, 1, 3};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 3};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
> -    topo_info = (X86CPUTopoInfo) {1, 1, 4};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 4};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
>   
> -    topo_info = (X86CPUTopoInfo) {1, 1, 14};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 14};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
> -    topo_info = (X86CPUTopoInfo) {1, 1, 15};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 15};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
> -    topo_info = (X86CPUTopoInfo) {1, 1, 16};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 16};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 4);
> -    topo_info = (X86CPUTopoInfo) {1, 1, 17};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 1, 17};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 5);
>   
>   
> -    topo_info = (X86CPUTopoInfo) {1, 30, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
>       g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
> -    topo_info = (X86CPUTopoInfo) {1, 31, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 31, 2};
>       g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
> -    topo_info = (X86CPUTopoInfo) {1, 32, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 32, 2};
>       g_assert_cmpuint(apicid_core_width(&topo_info), ==, 5);
> -    topo_info = (X86CPUTopoInfo) {1, 33, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 33, 2};
>       g_assert_cmpuint(apicid_core_width(&topo_info), ==, 6);
>   
> -    topo_info = (X86CPUTopoInfo) {1, 30, 2};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 30, 2};
>       g_assert_cmpuint(apicid_die_width(&topo_info), ==, 0);
> -    topo_info = (X86CPUTopoInfo) {2, 30, 2};
> +    topo_info = (X86CPUTopoInfo) {2, 1, 30, 2};
>       g_assert_cmpuint(apicid_die_width(&topo_info), ==, 1);
> -    topo_info = (X86CPUTopoInfo) {3, 30, 2};
> +    topo_info = (X86CPUTopoInfo) {3, 1, 30, 2};
>       g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
> -    topo_info = (X86CPUTopoInfo) {4, 30, 2};
> +    topo_info = (X86CPUTopoInfo) {4, 1, 30, 2};
>       g_assert_cmpuint(apicid_die_width(&topo_info), ==, 2);
>   
>       /* build a weird topology and see if IDs are calculated correctly
> @@ -85,18 +88,18 @@ static void test_topo_bits(void)
>   
>       /* This will use 2 bits for thread ID and 3 bits for core ID
>        */
> -    topo_info = (X86CPUTopoInfo) {1, 6, 3};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
>       g_assert_cmpuint(apicid_smt_width(&topo_info), ==, 2);
>       g_assert_cmpuint(apicid_core_offset(&topo_info), ==, 2);
>       g_assert_cmpuint(apicid_die_offset(&topo_info), ==, 5);
>       g_assert_cmpuint(apicid_pkg_offset(&topo_info), ==, 5);
>   
> -    topo_info = (X86CPUTopoInfo) {1, 6, 3};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 0), ==, 0);
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1), ==, 1);
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 2), ==, 2);
>   
> -    topo_info = (X86CPUTopoInfo) {1, 6, 3};
> +    topo_info = (X86CPUTopoInfo) {1, 1, 6, 3};
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1 * 3 + 0), ==,
>                        (1 << 2) | 0);
>       g_assert_cmpuint(x86_apicid_from_cpu_idx(&topo_info, 1 * 3 + 1), ==,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
@ 2024-01-11  6:04   ` Xiaoyao Li
  2024-01-11  9:21     ` Zhao Liu
  2024-01-15  3:25   ` Yuan Yao
  1 sibling, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-11  6:04 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> handle platforms with Module level enumerated via CPUID.1F.
> 
> Expose the module level in CPUID[0x1F] if the machine has more than 1
> modules.
> 
> (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
> configurations in "-smp".)
> 
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * New patch to expose module level in 0x1F.
>   * Add Tested-by tag from Yongwei.
> ---
>   target/i386/cpu.c     | 12 +++++++++++-
>   target/i386/cpu.h     |  2 ++
>   target/i386/kvm/kvm.c |  2 +-
>   3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 294ca6b8947a..a2d39d2198b6 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
>           return 1;
>       case CPU_TOPO_LEVEL_CORE:
>           return topo_info->threads_per_core;
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return topo_info->threads_per_core * topo_info->cores_per_module;
>       case CPU_TOPO_LEVEL_DIE:
>           return topo_info->threads_per_core * topo_info->cores_per_module *
>                  topo_info->modules_per_die;
> @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
>           return 0;
>       case CPU_TOPO_LEVEL_CORE:
>           return apicid_core_offset(topo_info);
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return apicid_module_offset(topo_info);
>       case CPU_TOPO_LEVEL_DIE:
>           return apicid_die_offset(topo_info);
>       case CPU_TOPO_LEVEL_PACKAGE:
> @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
>           return CPUID_1F_ECX_TOPO_LEVEL_SMT;
>       case CPU_TOPO_LEVEL_CORE:
>           return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
>       case CPU_TOPO_LEVEL_DIE:
>           return CPUID_1F_ECX_TOPO_LEVEL_DIE;
>       default:
> @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
>           if (env->nr_dies > 1) {
>               set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
>           }
> +
> +        if (env->nr_modules > 1) {
> +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
> +        }
>       }
>   
>       *ecx = count & 0xff;
> @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>           break;
>       case 0x1F:
>           /* V2 Extended Topology Enumeration Leaf */
> -        if (topo_info.dies_per_pkg < 2) {
> +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {

maybe we can come up with below function if we have 
env->valid_cpu_topo[] as I suggested in patch 5.

bool cpu_x86_has_valid_cpuid1f(CPUX86State *env) {
	return env->valid_cpu_topo[2] ? true : false;
}

...

>               *eax = *ebx = *ecx = *edx = 0;
>               break;
>           }
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index eecd30bde92b..97b290e10576 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
>       CPU_TOPO_LEVEL_INVALID,
>       CPU_TOPO_LEVEL_SMT,
>       CPU_TOPO_LEVEL_CORE,
> +    CPU_TOPO_LEVEL_MODULE,
>       CPU_TOPO_LEVEL_DIE,
>       CPU_TOPO_LEVEL_PACKAGE,
>       CPU_TOPO_LEVEL_MAX,
> @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
>   #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
>   #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
>   #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
> +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
>   #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
>   
>   /* MSR Feature Bits */
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4ce80555b45c..e5ddb214cb36 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>               break;
>           }
>           case 0x1f:
> -            if (env->nr_dies < 2) {
> +            if (env->nr_modules < 2 && env->nr_dies < 2) {

then cpu_x86_has_valid_cpuid1f() can be used here.

>                   break;
>               }
>               /* fallthrough */



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-10  9:31   ` Xiaoyao Li
@ 2024-01-11  8:43     ` Zhao Liu
  2024-01-14 14:11       ` Xiaoyao Li
  2024-01-15  3:51       ` Xiaoyao Li
  0 siblings, 2 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-11  8:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

Hi Xiaoyao,

On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
> Date: Wed, 10 Jan 2024 17:31:28 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>  topo in CPUID[4]
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
> > CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
> > nearest power-of-2 integer.
> > 
> > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > using APIC ID offset (like L3 topology using 1 << die_offset [3]).
> > 
> > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > are associated with APIC ID. For example, in linux kernel, the field
> > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
> 
> And for
> > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > matched with actual core numbers and it's calculated by:
> > "(1 << (pkg_offset - core_offset)) - 1".
> 
> could you elaborate it more? what is the value of actual core numbers on
> Alder lake P? and what is the pkg_offset and core_offset?

For example, the following's the CPUID dump of an ADL-S machine:

CPUID.04H:

0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000


CPUID.1FH:

0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c

The CPUID.04H:EAX[bits 31:26] is 63.
From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.

Thus we can verify that the above equation as:

1 << (0x7 - 0x1) - 1 = 63.

"Maximum number of addressable IDs" refers to the maximum number of IDs
that can be enumerated in the APIC ID's topology layout, which does not
necessarily correspond to the actual number of topology domains.

> 
> > Therefore the offset of APIC ID should be preferred to calculate nearest
> > power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
> > 31:26]:
> > 1. d/i cache is shared in a core, 1 << core_offset should be used
> >     instand of "cs->nr_threads" in encode_cache_cpuid4() for
> 
> /s/instand/instead

Thanks!

> 
> >     CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
> > 2. L2 cache is supposed to be shared in a core as for now, thereby
> >     1 << core_offset should also be used instand of "cs->nr_threads" in
> 
> ditto

Okay.

> 
> >     encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
> > 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
> >     calculated with the bit width between the Package and SMT levels in
> >     the APIC ID (1 << (pkg_offset - core_offset) - 1).
> > 
> > In addition, use APIC ID offset to replace "pow2ceil()" for
> > cache_info_passthrough case.
> > 
> > [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
> > [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
> > [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
> > 
> > Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> > Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * Fix compile warnings. (Babu)
> >   * Fix spelling typo.
> > 
> > Changes since v1:
> >   * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
> >     case. (Yanan)
> >   * Split the L1 cache fix into a separate patch.
> >   * Rename the title of this patch (the original is "i386/cpu: Fix number
> >     of addressable IDs in CPUID.04H").
> > ---
> >   target/i386/cpu.c | 30 +++++++++++++++++++++++-------
> >   1 file changed, 23 insertions(+), 7 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 5a3678a789cf..c8d2a585723a 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >   {
> >       X86CPU *cpu = env_archcpu(env);
> >       CPUState *cs = env_cpu(env);
> > -    uint32_t die_offset;
> >       uint32_t limit;
> >       uint32_t signature[3];
> >       X86CPUTopoInfo topo_info;
> > @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> >                   int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
> >                   if (cs->nr_cores > 1) {
> > +                    int addressable_cores_offset =
> > +                                                apicid_pkg_offset(&topo_info) -
> > +                                                apicid_core_offset(&topo_info);
> > +
> >                       *eax &= ~0xFC000000;
> > -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
> > +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
> 
> it should be ((1 << addressable_cores_offset) - 1) << 26

Good catch! The helper wrapped in a subsequent patch masks the error here.

> 
> I think naming it addressable_cores_width is better than
> addressable_cores_offset. It's not offset because offset means the bit
> position from bit 0.

I agree, "width" is better.

> 
> And we can get the width by another algorithm:
> 
> int addressable_cores_width = apicid_core_width(&topo_info) +
> apicid_die_width(&topo_info);
> *eax |= ((1 << addressable_cores_width) - 1)) << 26;

This algorithm lacks flexibility because there will be more topology
levels between package and core, such as the cluster being introduced...

Using "addressable_cores_width" is clear enough.

> 		
> >                   }
> >                   if (host_vcpus_per_cache > vcpus_per_socket) {
> > +                    int pkg_offset = apicid_pkg_offset(&topo_info);
> > +
> >                       *eax &= ~0x3FFC000;
> > -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> > +                    *eax |= (1 << (pkg_offset - 1)) << 14;
> 
> Ditto, ((1 << pkg_offset) - 1) << 14

Thanks!

> 
> For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
> intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
> vcpus_per_cache (configured by users) to VM.

I tend to use a uniform calculation that is less confusing and easier to
maintain. Since this field encodes "Maximum number of addressable IDs",
OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
only know that L3 is shared at the package level.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()
  2024-01-10 11:52   ` Xiaoyao Li
@ 2024-01-11  8:46     ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-11  8:46 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

Hi Xiaoyao,

On Wed, Jan 10, 2024 at 07:52:38PM +0800, Xiaoyao Li wrote:
> Date: Wed, 10 Jan 2024 19:52:38 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in
>  cpu_x86_cpuid()
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > In cpu_x86_cpuid(), there are many variables in representing the cpu
> > topology, e.g., topo_info, cs->nr_cores/cs->nr_threads.
> 
> Please use comma instead of slash. cs->nr_cores/cs->nr_threads looks like
> one variable.

Okay.

> 
> > Since the names of cs->nr_cores/cs->nr_threads does not accurately
> > represent its meaning, the use of cs->nr_cores/cs->nr_threads is prone
> > to confusion and mistakes.
> > 
> > And the structure X86CPUTopoInfo names its members clearly, thus the
> > variable "topo_info" should be preferred.
> > 
> > In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
> > replace env->dies with topo_info.dies_per_pkg as well.
> > 
> > Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * Fix typo. (Babu)
> > 
> > Changes since v1:
> >   * Extract cores_per_socket from the code block and use it as a local
> >     variable for cpu_x86_cpuid(). (Yanan)
> >   * Remove vcpus_per_socket variable and use cpus_per_pkg directly.
> >     (Yanan)
> >   * Replace env->dies with topo_info.dies_per_pkg in cpu_x86_cpuid().
> > ---
> >   target/i386/cpu.c | 31 ++++++++++++++++++-------------
> >   1 file changed, 18 insertions(+), 13 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index c8d2a585723a..6f8fa772ecf8 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -6017,11 +6017,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >       uint32_t limit;
> >       uint32_t signature[3];
> >       X86CPUTopoInfo topo_info;
> > +    uint32_t cores_per_pkg;
> > +    uint32_t cpus_per_pkg;
> 
> I prefer to lps_per_pkg or threads_per_pkg.

Okay, lp is not common in QEMU code, so I would change this to
threads_per_pkg.

> 
> Other than it,
> 
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

Thanks!

-Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level
  2024-01-11  3:19   ` Xiaoyao Li
@ 2024-01-11  9:07     ` Zhao Liu
  2024-01-23  9:56     ` Zhao Liu
  1 sibling, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-11  9:07 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Thu, Jan 11, 2024 at 11:19:34AM +0800, Xiaoyao Li wrote:
> Date: Thu, 11 Jan 2024 11:19:34 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with
>  specific topology level
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.
> > 
> > In fact, the specific topology level exposed in 0x1F depends on the
> > platform's support for extension levels (module, tile and die).
> > 
> > To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
> > with specific topology level.
> > 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * New patch to prepare to expose module level in 0x1F.
> >   * Move the CPUTopoLevel enumeration definition from "i386: Add cache
> >     topology info in CPUCacheInfo" to this patch. Note, to align with
> >     topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
> >     CPU_TOPO_LEVEL_INVALID.
> > ---
> >   target/i386/cpu.c | 136 +++++++++++++++++++++++++++++++++++++---------
> >   target/i386/cpu.h |  15 +++++
> >   2 files changed, 126 insertions(+), 25 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index bc440477d13d..5c295c9a9e2d 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -269,6 +269,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
> >              (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
> >   }
> > +static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> > +                                       enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return 1;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return topo_info->threads_per_core;
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_die;
> > +    case CPU_TOPO_LEVEL_PACKAGE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_die *
> > +               topo_info->dies_per_pkg;
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> > +                                            enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return 0;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return apicid_core_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return apicid_die_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_PACKAGE:
> > +        return apicid_pkg_offset(topo_info);
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_INVALID:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> > +    default:
> > +        /* Other types are not supported in QEMU. */
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> > +                                X86CPUTopoInfo *topo_info,
> > +                                uint32_t *eax, uint32_t *ebx,
> > +                                uint32_t *ecx, uint32_t *edx)
> > +{
> > +    static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
> > +    X86CPU *cpu = env_archcpu(env);
> > +    unsigned long level, next_level;
> > +    uint32_t num_cpus_next_level, offset_next_level;
> 
> again, I dislike the name of cpus to represent the logical process or
> thread. we can call it, num_lps_next_level, or num_threads_next_level;

Okay, will use num_threads_next_level ;-)

> 
> > +
> > +    /*
> > +     * Initialize the bitmap to decide which levels should be
> > +     * encoded in 0x1f.
> > +     */
> > +    if (!count) {
> 
> using static bitmap and initialize the bitmap on (count == 0), looks bad to
> me. It highly relies on the order of how encode_topo_cpuid1f() is called,
> and fragile.
> 
> Instead, we can maintain an array in CPUX86State, e.g.,
> 
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1904,6 +1904,8 @@ typedef struct CPUArchState {
> 
>      /* Number of dies within this CPU package. */
>      unsigned nr_dies;
> +
> +    unint8_t valid_cpu_topo[CPU_TOPO_LEVEL_MAX];
>  } CPUX86State;
> 
> 
> and initialize it as below, when initializing the env
> 
> env->valid_cpu_topo[0] = CPU_TOPO_LEVEL_SMT;
> env->valid_cpu_topo[1] = CPU_TOPO_LEVEL_CORE;
> if (env->nr_dies > 1) {
> 	env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_DIE;
> }
> 
> then in encode_topo_cpuid1f(), we can get level and next_level as
> 
> level = env->valid_cpu_topo[count];
> next_level = env->valid_cpu_topo[count + 1];
> 

Good idea, let me try this way.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo
  2024-01-11  5:53   ` Xiaoyao Li
@ 2024-01-11  9:18     ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-11  9:18 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Thu, Jan 11, 2024 at 01:53:53PM +0800, Xiaoyao Li wrote:

> > -    cores_per_pkg = topo_info.cores_per_die * topo_info.dies_per_pkg;
> > +    cores_per_pkg = topo_info.cores_per_module * topo_info.modules_per_die *
> > +                    topo_info.dies_per_pkg;
> 
> Nit. maybe we can introduce some helper function like
> 
> static inline uint32_t topo_info_cores_per_pkg(X86CPUTopoInfo *topo_info) {
> 	return topo_info.cores_per_module * topo_info.modules_per_die *
>                topo_info.dies_per_pkg;
> }
> 
> so we don't need to care how it calculates.

Yeah, will add this helper, maybe in another patch.

> 
> Besides,
> 
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

Thanks!

-Zhao


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-11  6:04   ` Xiaoyao Li
@ 2024-01-11  9:21     ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-11  9:21 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Thu, Jan 11, 2024 at 02:04:53PM +0800, Xiaoyao Li wrote:
> Date: Thu, 11 Jan 2024 14:04:53 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> > erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> > handle platforms with Module level enumerated via CPUID.1F.
> > 
> > Expose the module level in CPUID[0x1F] if the machine has more than 1
> > modules.
> > 
> > (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
> > configurations in "-smp".)
> > 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * New patch to expose module level in 0x1F.
> >   * Add Tested-by tag from Yongwei.
> > ---
> >   target/i386/cpu.c     | 12 +++++++++++-
> >   target/i386/cpu.h     |  2 ++
> >   target/i386/kvm/kvm.c |  2 +-
> >   3 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 294ca6b8947a..a2d39d2198b6 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> >           return 1;
> >       case CPU_TOPO_LEVEL_CORE:
> >           return topo_info->threads_per_core;
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_module;
> >       case CPU_TOPO_LEVEL_DIE:
> >           return topo_info->threads_per_core * topo_info->cores_per_module *
> >                  topo_info->modules_per_die;
> > @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> >           return 0;
> >       case CPU_TOPO_LEVEL_CORE:
> >           return apicid_core_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return apicid_module_offset(topo_info);
> >       case CPU_TOPO_LEVEL_DIE:
> >           return apicid_die_offset(topo_info);
> >       case CPU_TOPO_LEVEL_PACKAGE:
> > @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> >           return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> >       case CPU_TOPO_LEVEL_CORE:
> >           return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
> >       case CPU_TOPO_LEVEL_DIE:
> >           return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> >       default:
> > @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> >           if (env->nr_dies > 1) {
> >               set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> >           }
> > +
> > +        if (env->nr_modules > 1) {
> > +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
> > +        }
> >       }
> >       *ecx = count & 0xff;
> > @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >           break;
> >       case 0x1F:
> >           /* V2 Extended Topology Enumeration Leaf */
> > -        if (topo_info.dies_per_pkg < 2) {
> > +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
> 
> maybe we can come up with below function if we have env->valid_cpu_topo[] as
> I suggested in patch 5.
> 
> bool cpu_x86_has_valid_cpuid1f(CPUX86State *env) {
> 	return env->valid_cpu_topo[2] ? true : false;
> }
> 
> ...

This makes sense.

> 
> >               *eax = *ebx = *ecx = *edx = 0;
> >               break;
> >           }
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index eecd30bde92b..97b290e10576 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
> >       CPU_TOPO_LEVEL_INVALID,
> >       CPU_TOPO_LEVEL_SMT,
> >       CPU_TOPO_LEVEL_CORE,
> > +    CPU_TOPO_LEVEL_MODULE,
> >       CPU_TOPO_LEVEL_DIE,
> >       CPU_TOPO_LEVEL_PACKAGE,
> >       CPU_TOPO_LEVEL_MAX,
> > @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
> >   #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
> >   #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
> >   #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
> > +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
> >   #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
> >   /* MSR Feature Bits */
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 4ce80555b45c..e5ddb214cb36 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >               break;
> >           }
> >           case 0x1f:
> > -            if (env->nr_dies < 2) {
> > +            if (env->nr_modules < 2 && env->nr_dies < 2) {
> 
> then cpu_x86_has_valid_cpuid1f() can be used here.
>

Good idae, I will also try this.

Thanks,
Zhao


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs
  2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
@ 2024-01-14 12:42   ` Xiaoyao Li
  2024-01-15  3:52     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-14 12:42 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> 
> Add module_id member in X86CPUTopoIDs.
> 
> module_id can be parsed from APIC ID, so also update APIC ID parsing
> rule to support module level. With this support, the conversions with
> module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
> completed.
> 
> module_id can be also generated from cpu topology, and before i386
> supports "clusters" in smp, the default "clusters per die" is only 1,
> thus the module_id generated in this way is 0, so that it will not
> conflict with the module_id generated by APIC ID.
> 
> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
> Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v1:
>   * Merge the patch "i386: Update APIC ID parsing rule to support module
>     level" into this one. (Yanan)
>   * Move the apicid_module_width() and apicid_module_offset() support
>     into the previous modules_per_die related patch. (Yanan)
> ---
>   hw/i386/x86.c              | 28 +++++++++++++++++++++-------
>   include/hw/i386/topology.h | 17 +++++++++++++----
>   2 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 85b847ac7914..5269aae3a5c2 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -315,11 +315,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>   
>       /*
>        * If APIC ID is not set,
> -     * set it based on socket/die/core/thread properties.
> +     * set it based on socket/die/cluster/core/thread properties.
>        */
>       if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> -        int max_socket = (ms->smp.max_cpus - 1) /
> -                                smp_threads / smp_cores / ms->smp.dies;
> +        int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores /
> +                                ms->smp.clusters / ms->smp.dies;
>   
>           /*
>            * die-id was optional in QEMU 4.0 and older, so keep it optional
> @@ -366,17 +366,27 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>           topo_ids.die_id = cpu->die_id;
>           topo_ids.core_id = cpu->core_id;
>           topo_ids.smt_id = cpu->thread_id;
> +
> +        /*
> +         * TODO: This is the temporary initialization for topo_ids.module_id to
> +         * avoid "maybe-uninitialized" compilation errors. Will remove when
> +         * X86CPU supports cluster_id.
> +         */
> +        topo_ids.module_id = 0;
> 

if you put patch 10 before this patch, then we don't need this trick.

>           cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
>       }
>   
>       cpu_slot = x86_find_cpu_slot(MACHINE(x86ms), cpu->apic_id, &idx);
>       if (!cpu_slot) {
>           x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> +
>           error_setg(errp,
> -            "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
> -            " APIC ID %" PRIu32 ", valid index range 0:%d",
> -            topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, topo_ids.smt_id,
> -            cpu->apic_id, ms->possible_cpus->len - 1);
> +            "Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: %u]"
> +            " with APIC ID %" PRIu32 ", valid index range 0:%d",
> +            topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
> +            topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
> +            ms->possible_cpus->len - 1);
>           return;
>       }
>   
> @@ -502,6 +512,10 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
>               ms->possible_cpus->cpus[i].props.has_die_id = true;
>               ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
>           }
> +        if (ms->smp.clusters > 1) {
> +            ms->possible_cpus->cpus[i].props.has_cluster_id = true;
> +            ms->possible_cpus->cpus[i].props.cluster_id = topo_ids.module_id;
> +        }
>           ms->possible_cpus->cpus[i].props.has_core_id = true;
>           ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
>           ms->possible_cpus->cpus[i].props.has_thread_id = true;
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index 517e51768c13..ed1f3d6c1d5e 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
>   typedef struct X86CPUTopoIDs {
>       unsigned pkg_id;
>       unsigned die_id;
> +    unsigned module_id;
>       unsigned core_id;
>       unsigned smt_id;
>   } X86CPUTopoIDs;
> @@ -127,6 +128,7 @@ static inline apic_id_t x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
>   {
>       return (topo_ids->pkg_id  << apicid_pkg_offset(topo_info)) |
>              (topo_ids->die_id  << apicid_die_offset(topo_info)) |
> +           (topo_ids->module_id << apicid_module_offset(topo_info)) |
>              (topo_ids->core_id << apicid_core_offset(topo_info)) |
>              topo_ids->smt_id;
>   }
> @@ -140,12 +142,16 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
>                                            X86CPUTopoIDs *topo_ids)
>   {
>       unsigned nr_dies = topo_info->dies_per_pkg;
> -    unsigned nr_cores = topo_info->cores_per_module *
> -                        topo_info->modules_per_die;
> +    unsigned nr_modules = topo_info->modules_per_die;
> +    unsigned nr_cores = topo_info->cores_per_module;
>       unsigned nr_threads = topo_info->threads_per_core;
>   
> -    topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
> -    topo_ids->die_id = cpu_index / (nr_cores * nr_threads) % nr_dies;
> +    topo_ids->pkg_id = cpu_index / (nr_dies * nr_modules *
> +                       nr_cores * nr_threads);
> +    topo_ids->die_id = cpu_index / (nr_modules * nr_cores *
> +                       nr_threads) % nr_dies;
> +    topo_ids->module_id = cpu_index / (nr_cores * nr_threads) %
> +                          nr_modules;
>       topo_ids->core_id = cpu_index / nr_threads % nr_cores;
>       topo_ids->smt_id = cpu_index % nr_threads;
>   }
> @@ -163,6 +169,9 @@ static inline void x86_topo_ids_from_apicid(apic_id_t apicid,
>       topo_ids->core_id =
>               (apicid >> apicid_core_offset(topo_info)) &
>               ~(0xFFFFFFFFUL << apicid_core_width(topo_info));
> +    topo_ids->module_id =
> +            (apicid >> apicid_module_offset(topo_info)) &
> +            ~(0xFFFFFFFFUL << apicid_module_width(topo_info));
>       topo_ids->die_id =
>               (apicid >> apicid_die_offset(topo_info)) &
>               ~(0xFFFFFFFFUL << apicid_die_width(topo_info));



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
@ 2024-01-14 13:49   ` Xiaoyao Li
  2024-01-15  3:27     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-14 13:49 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> 
> Introduce cluster-id other than module-id to be consistent with
> CpuInstanceProperties.cluster-id, and this avoids the confusion
> of parameter names when hotplugging.

I don't think reusing 'cluster' from arm for x86's 'module' is a good 
idea. It introduces confusion around the code.

s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can 
also add a module level for x86 instead of reusing cluster.

(This is also what I want to reply to the cover letter.)

[1] 
https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/

> Following the legacy smp check rules, also add the cluster_id validity
> into x86_cpu_pre_plug().
> 
> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
> Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v6:
>   * Update the comment when check cluster-id. Since there's no
>     v8.2, the cluster-id support should at least start from v9.0.
> 
> Changes since v5:
>   * Update the comment when check cluster-id. Since current QEMU is
>     v8.2, the cluster-id support should at least start from v8.3.
> 
> Changes since v3:
>   * Use the imperative in the commit message. (Babu)
> ---
>   hw/i386/x86.c     | 33 +++++++++++++++++++++++++--------
>   target/i386/cpu.c |  2 ++
>   target/i386/cpu.h |  1 +
>   3 files changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 5269aae3a5c2..1c1d368614ee 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -329,6 +329,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>               cpu->die_id = 0;
>           }
>   
> +        /*
> +         * cluster-id was optional in QEMU 9.0 and older, so keep it optional
> +         * if there's only one cluster per die.
> +         */
> +        if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
> +            cpu->cluster_id = 0;
> +        }
> +
>           if (cpu->socket_id < 0) {
>               error_setg(errp, "CPU socket-id is not set");
>               return;
> @@ -345,6 +353,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>                          cpu->die_id, ms->smp.dies - 1);
>               return;
>           }
> +        if (cpu->cluster_id < 0) {
> +            error_setg(errp, "CPU cluster-id is not set");
> +            return;
> +        } else if (cpu->cluster_id > ms->smp.clusters - 1) {
> +            error_setg(errp, "Invalid CPU cluster-id: %u must be in range 0:%u",
> +                       cpu->cluster_id, ms->smp.clusters - 1);
> +            return;
> +        }
>           if (cpu->core_id < 0) {
>               error_setg(errp, "CPU core-id is not set");
>               return;
> @@ -364,16 +380,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>   
>           topo_ids.pkg_id = cpu->socket_id;
>           topo_ids.die_id = cpu->die_id;
> +        topo_ids.module_id = cpu->cluster_id;
>           topo_ids.core_id = cpu->core_id;
>           topo_ids.smt_id = cpu->thread_id;
> -
> -        /*
> -         * TODO: This is the temporary initialization for topo_ids.module_id to
> -         * avoid "maybe-uninitialized" compilation errors. Will remove when
> -         * X86CPU supports cluster_id.
> -         */
> -        topo_ids.module_id = 0;
> -
>           cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
>       }
>   
> @@ -418,6 +427,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>       }
>       cpu->die_id = topo_ids.die_id;
>   
> +    if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
> +        error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
> +            " 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
> +            topo_ids.module_id);
> +        return;
> +    }
> +    cpu->cluster_id = topo_ids.module_id;
> +
>       if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
>           error_setg(errp, "property core-id: %u doesn't match set apic-id:"
>               " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a2d39d2198b6..498a4be62b40 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -7909,12 +7909,14 @@ static Property x86_cpu_properties[] = {
>       DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
>       DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
>       DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
> +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
>       DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
>       DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
>   #else
>       DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
>       DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
>       DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
> +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
>       DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
>       DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
>   #endif
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 97b290e10576..009950b87203 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2057,6 +2057,7 @@ struct ArchCPU {
>       int32_t node_id; /* NUMA node this CPU belongs to */
>       int32_t socket_id;
>       int32_t die_id;
> +    int32_t cluster_id;
>       int32_t core_id;
>       int32_t thread_id;
>   



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-11  8:43     ` Zhao Liu
@ 2024-01-14 14:11       ` Xiaoyao Li
  2024-01-15  3:04         ` Zhao Liu
  2024-01-15  3:51       ` Xiaoyao Li
  1 sibling, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-14 14:11 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

On 1/11/2024 4:43 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
>> Date: Wed, 10 Jan 2024 17:31:28 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>>   topo in CPUID[4]
>>
>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>> From: Zhao Liu <zhao1.liu@intel.com>
>>>
>>> Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
>>> CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
>>> nearest power-of-2 integer.
>>>
>>> The nearest power-of-2 integer can be calculated by pow2ceil() or by
>>> using APIC ID offset (like L3 topology using 1 << die_offset [3]).
>>>
>>> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
>>> are associated with APIC ID. For example, in linux kernel, the field
>>> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
>>
>> And for
>>> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
>>> matched with actual core numbers and it's calculated by:
>>> "(1 << (pkg_offset - core_offset)) - 1".
>>
>> could you elaborate it more? what is the value of actual core numbers on
>> Alder lake P? and what is the pkg_offset and core_offset?
> 
> For example, the following's the CPUID dump of an ADL-S machine:
> 
> CPUID.04H:
> 
> 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> 
> 
> CPUID.1FH:
> 
> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> 
> The CPUID.04H:EAX[bits 31:26] is 63.
>  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> 
> Thus we can verify that the above equation as:
> 
> 1 << (0x7 - 0x1) - 1 = 63.
> 
> "Maximum number of addressable IDs" refers to the maximum number of IDs
> that can be enumerated in the APIC ID's topology layout, which does not
> necessarily correspond to the actual number of topology domains.
> 
>>
>>> Therefore the offset of APIC ID should be preferred to calculate nearest
>>> power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
>>> 31:26]:
>>> 1. d/i cache is shared in a core, 1 << core_offset should be used
>>>      instand of "cs->nr_threads" in encode_cache_cpuid4() for
>>
>> /s/instand/instead
> 
> Thanks!
> 
>>
>>>      CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
>>> 2. L2 cache is supposed to be shared in a core as for now, thereby
>>>      1 << core_offset should also be used instand of "cs->nr_threads" in
>>
>> ditto
> 
> Okay.
> 
>>
>>>      encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
>>> 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
>>>      calculated with the bit width between the Package and SMT levels in
>>>      the APIC ID (1 << (pkg_offset - core_offset) - 1).
>>>
>>> In addition, use APIC ID offset to replace "pow2ceil()" for
>>> cache_info_passthrough case.
>>>
>>> [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
>>> [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
>>> [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
>>>
>>> Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
>>> Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
>>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>>> Tested-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>> Changes since v3:
>>>    * Fix compile warnings. (Babu)
>>>    * Fix spelling typo.
>>>
>>> Changes since v1:
>>>    * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
>>>      case. (Yanan)
>>>    * Split the L1 cache fix into a separate patch.
>>>    * Rename the title of this patch (the original is "i386/cpu: Fix number
>>>      of addressable IDs in CPUID.04H").
>>> ---
>>>    target/i386/cpu.c | 30 +++++++++++++++++++++++-------
>>>    1 file changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index 5a3678a789cf..c8d2a585723a 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>    {
>>>        X86CPU *cpu = env_archcpu(env);
>>>        CPUState *cs = env_cpu(env);
>>> -    uint32_t die_offset;
>>>        uint32_t limit;
>>>        uint32_t signature[3];
>>>        X86CPUTopoInfo topo_info;
>>> @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>                    int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
>>>                    int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
>>>                    if (cs->nr_cores > 1) {
>>> +                    int addressable_cores_offset =
>>> +                                                apicid_pkg_offset(&topo_info) -
>>> +                                                apicid_core_offset(&topo_info);
>>> +
>>>                        *eax &= ~0xFC000000;
>>> -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
>>> +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
>>
>> it should be ((1 << addressable_cores_offset) - 1) << 26
> 
> Good catch! The helper wrapped in a subsequent patch masks the error here.
> 
>>
>> I think naming it addressable_cores_width is better than
>> addressable_cores_offset. It's not offset because offset means the bit
>> position from bit 0.
> 
> I agree, "width" is better.
> 
>>
>> And we can get the width by another algorithm:
>>
>> int addressable_cores_width = apicid_core_width(&topo_info) +
>> apicid_die_width(&topo_info);
>> *eax |= ((1 << addressable_cores_width) - 1)) << 26;
> 
> This algorithm lacks flexibility because there will be more topology
> levels between package and core, such as the cluster being introduced...
> 
> Using "addressable_cores_width" is clear enough.
> 
>> 		
>>>                    }
>>>                    if (host_vcpus_per_cache > vcpus_per_socket) {
>>> +                    int pkg_offset = apicid_pkg_offset(&topo_info);
>>> +
>>>                        *eax &= ~0x3FFC000;
>>> -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
>>> +                    *eax |= (1 << (pkg_offset - 1)) << 14;
>>
>> Ditto, ((1 << pkg_offset) - 1) << 14
> 
> Thanks!
> 
>>
>> For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
>> intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
>> vcpus_per_cache (configured by users) to VM.
> 
> I tend to use a uniform calculation that is less confusing and easier to
> maintain. 

less confusing?

the original code is

	if (host_vcpus_per_cache > vcpus_per_socket) {
		*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
	}

and this patch is going to change it to

	if (host_vcpus_per_cache > vcpus_per_socket) {
		int pkg_offset = apicid_pkg_offset(&topo_info);
		*eax |= (1 << pkg_offset - 1)) << 14;
	}

Apparently, the former is clearer that everyone knows what is wants to 
do is "when guest's total vcpus_per_socket is even smaller than host's 
vcpu_per_cache, using guest's configuration". While the latter is more 
confusing.

> Since this field encodes "Maximum number of addressable IDs",
> OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
> only know that L3 is shared at the package level.

It doesn't matter with L3. What the code want to fulfill is that,

host_vcpus_per_cache is the actual number of LPs that share this level 
of cache. While vcpus_per_socket is the maximum numbere of LPs that can 
share a cache (at any level) in guest. When guest's maximum number is 
even smaller than host's, use guest's value.

> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
@ 2024-01-14 14:31   ` Xiaoyao Li
  2024-01-15  3:40     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-14 14:31 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
> Intel CPUs.
> 
> After cache models have topology information, we can use
> CPUCacheInfo.share_level to decide which topology level to be encoded
> into CPUID[4].EAX[bits 25:14].
> 
> And since maximum_processor_id (original "num_apic_ids") is parsed
> based on cpu topology levels, which are verified when parsing smp, it's
> no need to check this value by "assert(num_apic_ids > 0)" again, so
> remove this assert.
> 
> Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
> helper to make the code cleaner.
> 
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v1:
>   * Use "enum CPUTopoLevel share_level" as the parameter in
>     max_processor_ids_for_cache().
>   * Make cache_into_passthrough case also use
>     max_processor_ids_for_cache() and max_core_ids_in_package() to
>     encode CPUID[4]. (Yanan)
>   * Rename the title of this patch (the original is "i386: Use
>     CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
> ---
>   target/i386/cpu.c | 70 +++++++++++++++++++++++++++++------------------
>   1 file changed, 43 insertions(+), 27 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 81e07474acef..b23e8190dc68 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -235,22 +235,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo *cache)
>                          ((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
>                          0 /* Invalid value */)
>   
> +static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo *topo_info,
> +                                            enum CPUTopoLevel share_level)

I prefer the name to max_lp_ids_share_the_cache()

> +{
> +    uint32_t num_ids = 0;
> +
> +    switch (share_level) {
> +    case CPU_TOPO_LEVEL_CORE:
> +        num_ids = 1 << apicid_core_offset(topo_info);
> +        break;
> +    case CPU_TOPO_LEVEL_DIE:
> +        num_ids = 1 << apicid_die_offset(topo_info);
> +        break;
> +    case CPU_TOPO_LEVEL_PACKAGE:
> +        num_ids = 1 << apicid_pkg_offset(topo_info);
> +        break;
> +    default:
> +        /*
> +         * Currently there is no use case for SMT and MODULE, so use
> +         * assert directly to facilitate debugging.
> +         */
> +        g_assert_not_reached();
> +    }
> +
> +    return num_ids - 1;

suggest to just return num_ids, and let the caller to do the -1 work.

> +}
> +
> +static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
> +{
> +    uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
> +                               apicid_core_offset(topo_info));
> +    return num_cores - 1;

ditto.

> +}
>   
>   /* Encode cache info for CPUID[4] */
>   static void encode_cache_cpuid4(CPUCacheInfo *cache,
> -                                int num_apic_ids, int num_cores,
> +                                X86CPUTopoInfo *topo_info,
>                                   uint32_t *eax, uint32_t *ebx,
>                                   uint32_t *ecx, uint32_t *edx)
>   {
>       assert(cache->size == cache->line_size * cache->associativity *
>                             cache->partitions * cache->sets);
>   
> -    assert(num_apic_ids > 0);
>       *eax = CACHE_TYPE(cache->type) |
>              CACHE_LEVEL(cache->level) |
>              (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
> -           ((num_cores - 1) << 26) |
> -           ((num_apic_ids - 1) << 14);
> +           (max_core_ids_in_package(topo_info) << 26) |
> +           (max_processor_ids_for_cache(topo_info, cache->share_level) << 14);

by the way, we can change the order of the two line. :)

>   
>       assert(cache->line_size > 0);
>       assert(cache->partitions > 0);
> @@ -6263,56 +6294,41 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
>   
>                   if (cores_per_pkg > 1) {
> -                    int addressable_cores_offset =
> -                                                apicid_pkg_offset(&topo_info) -
> -                                                apicid_core_offset(&topo_info);
> -
>                       *eax &= ~0xFC000000;
> -                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
> +                    *eax |= max_core_ids_in_package(&topo_info) << 26;
>                   }
>                   if (host_vcpus_per_cache > cpus_per_pkg) {
> -                    int pkg_offset = apicid_pkg_offset(&topo_info);
> -
>                       *eax &= ~0x3FFC000;
> -                    *eax |= (1 << (pkg_offset - 1)) << 14;
> +                    *eax |=
> +                        max_processor_ids_for_cache(&topo_info,
> +                                                CPU_TOPO_LEVEL_PACKAGE) << 14;
>                   }
>               }
>           } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
>               *eax = *ebx = *ecx = *edx = 0;
>           } else {
>               *eax = 0;
> -            int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
> -                                           apicid_core_offset(&topo_info);
> -            int core_offset, die_offset;
>   
>               switch (count) {
>               case 0: /* L1 dcache info */
> -                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> -                                    (1 << core_offset),
> -                                    (1 << addressable_cores_offset),
> +                                    &topo_info,
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 1: /* L1 icache info */
> -                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> -                                    (1 << core_offset),
> -                                    (1 << addressable_cores_offset),
> +                                    &topo_info,
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 2: /* L2 cache info */
> -                core_offset = apicid_core_offset(&topo_info);
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
> -                                    (1 << core_offset),
> -                                    (1 << addressable_cores_offset),
> +                                    &topo_info,
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 3: /* L3 cache info */
> -                die_offset = apicid_die_offset(&topo_info);
>                   if (cpu->enable_l3_cache) {
>                       encode_cache_cpuid4(env->cache_info_cpuid4.l3_cache,
> -                                        (1 << die_offset),
> -                                        (1 << addressable_cores_offset),
> +                                        &topo_info,
>                                           eax, ebx, ecx, edx);
>                       break;
>                   }



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
@ 2024-01-14 14:42   ` Xiaoyao Li
  2024-01-15  3:48     ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-14 14:42 UTC (permalink / raw)
  To: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti
  Cc: qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu,
	Babu Moger, Yongwei Ma

On 1/8/2024 4:27 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
> for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
> the number of sharing threads directly.
> 
>  From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
> means [1]:
> 
> The number of logical processors sharing this cache is the value of
> this field incremented by 1. To determine which logical processors are
> sharing a cache, determine a Share Id for each processor as follows:
> 
> ShareId = LocalApicId >> log2(NumSharingCache+1)
> 
> Logical processors with the same ShareId then share a cache. If
> NumSharingCache+1 is not a power of two, round it up to the next power
> of two.
> 
>  From the description above, the calculation of this field should be same
> as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
> APIC ID to calculate this field.
> 
> [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
>       Information

this patch can be dropped because we have next patch.

> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Reviewed-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>   * Rewrite the subject. (Babu)
>   * Delete the original "comment/help" expression, as this behavior is
>     confirmed for AMD CPUs. (Babu)
>   * Rename "num_apic_ids" (v3) to "num_sharing_cache" to match spec
>     definition. (Babu)
> 
> Changes since v1:
>   * Rename "l3_threads" to "num_apic_ids" in
>     encode_cache_cpuid8000001d(). (Yanan)
>   * Add the description of the original commit and add Cc.
> ---
>   target/i386/cpu.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index b23e8190dc68..8a4d72f6f760 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -483,7 +483,7 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
>                                          uint32_t *eax, uint32_t *ebx,
>                                          uint32_t *ecx, uint32_t *edx)
>   {
> -    uint32_t l3_threads;
> +    uint32_t num_sharing_cache;
>       assert(cache->size == cache->line_size * cache->associativity *
>                             cache->partitions * cache->sets);
>   
> @@ -492,13 +492,11 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
>   
>       /* L3 is shared among multiple cores */
>       if (cache->level == 3) {
> -        l3_threads = topo_info->modules_per_die *
> -                     topo_info->cores_per_module *
> -                     topo_info->threads_per_core;
> -        *eax |= (l3_threads - 1) << 14;
> +        num_sharing_cache = 1 << apicid_die_offset(topo_info);
>       } else {
> -        *eax |= ((topo_info->threads_per_core - 1) << 14);
> +        num_sharing_cache = 1 << apicid_core_offset(topo_info);
>       }
> +    *eax |= (num_sharing_cache - 1) << 14;
>   
>       assert(cache->line_size > 0);
>       assert(cache->partitions > 0);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-14 14:11       ` Xiaoyao Li
@ 2024-01-15  3:04         ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  3:04 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

Hi Xiaoyao,

On Sun, Jan 14, 2024 at 10:11:59PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 22:11:59 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>  topo in CPUID[4]
> 
> On 1/11/2024 4:43 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
> > > Date: Wed, 10 Jan 2024 17:31:28 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
> > >   topo in CPUID[4]
> > > 
> > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > > 
> > > > Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
> > > > CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
> > > > nearest power-of-2 integer.
> > > > 
> > > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > > using APIC ID offset (like L3 topology using 1 << die_offset [3]).
> > > > 
> > > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > > are associated with APIC ID. For example, in linux kernel, the field
> > > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
> > > 
> > > And for
> > > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > > matched with actual core numbers and it's calculated by:
> > > > "(1 << (pkg_offset - core_offset)) - 1".
> > > 
> > > could you elaborate it more? what is the value of actual core numbers on
> > > Alder lake P? and what is the pkg_offset and core_offset?
> > 
> > For example, the following's the CPUID dump of an ADL-S machine:
> > 
> > CPUID.04H:
> > 
> > 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> > 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> > 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> > 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> > 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> > 
> > 
> > CPUID.1FH:
> > 
> > 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> > 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> > 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> > 
> > The CPUID.04H:EAX[bits 31:26] is 63.
> >  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> > CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> > 
> > Thus we can verify that the above equation as:
> > 
> > 1 << (0x7 - 0x1) - 1 = 63.
> > 
> > "Maximum number of addressable IDs" refers to the maximum number of IDs
> > that can be enumerated in the APIC ID's topology layout, which does not
> > necessarily correspond to the actual number of topology domains.
> > 
> > > 
> > > > Therefore the offset of APIC ID should be preferred to calculate nearest
> > > > power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
> > > > 31:26]:
> > > > 1. d/i cache is shared in a core, 1 << core_offset should be used
> > > >      instand of "cs->nr_threads" in encode_cache_cpuid4() for
> > > 
> > > /s/instand/instead
> > 
> > Thanks!
> > 
> > > 
> > > >      CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
> > > > 2. L2 cache is supposed to be shared in a core as for now, thereby
> > > >      1 << core_offset should also be used instand of "cs->nr_threads" in
> > > 
> > > ditto
> > 
> > Okay.
> > 
> > > 
> > > >      encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
> > > > 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
> > > >      calculated with the bit width between the Package and SMT levels in
> > > >      the APIC ID (1 << (pkg_offset - core_offset) - 1).
> > > > 
> > > > In addition, use APIC ID offset to replace "pow2ceil()" for
> > > > cache_info_passthrough case.
> > > > 
> > > > [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
> > > > [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
> > > > [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
> > > > 
> > > > Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> > > > Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> > > > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > > > Tested-by: Babu Moger <babu.moger@amd.com>
> > > > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > Changes since v3:
> > > >    * Fix compile warnings. (Babu)
> > > >    * Fix spelling typo.
> > > > 
> > > > Changes since v1:
> > > >    * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
> > > >      case. (Yanan)
> > > >    * Split the L1 cache fix into a separate patch.
> > > >    * Rename the title of this patch (the original is "i386/cpu: Fix number
> > > >      of addressable IDs in CPUID.04H").
> > > > ---
> > > >    target/i386/cpu.c | 30 +++++++++++++++++++++++-------
> > > >    1 file changed, 23 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > > index 5a3678a789cf..c8d2a585723a 100644
> > > > --- a/target/i386/cpu.c
> > > > +++ b/target/i386/cpu.c
> > > > @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > >    {
> > > >        X86CPU *cpu = env_archcpu(env);
> > > >        CPUState *cs = env_cpu(env);
> > > > -    uint32_t die_offset;
> > > >        uint32_t limit;
> > > >        uint32_t signature[3];
> > > >        X86CPUTopoInfo topo_info;
> > > > @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > >                    int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> > > >                    int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
> > > >                    if (cs->nr_cores > 1) {
> > > > +                    int addressable_cores_offset =
> > > > +                                                apicid_pkg_offset(&topo_info) -
> > > > +                                                apicid_core_offset(&topo_info);
> > > > +
> > > >                        *eax &= ~0xFC000000;
> > > > -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
> > > > +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
> > > 
> > > it should be ((1 << addressable_cores_offset) - 1) << 26
> > 
> > Good catch! The helper wrapped in a subsequent patch masks the error here.
> > 
> > > 
> > > I think naming it addressable_cores_width is better than
> > > addressable_cores_offset. It's not offset because offset means the bit
> > > position from bit 0.
> > 
> > I agree, "width" is better.
> > 
> > > 
> > > And we can get the width by another algorithm:
> > > 
> > > int addressable_cores_width = apicid_core_width(&topo_info) +
> > > apicid_die_width(&topo_info);
> > > *eax |= ((1 << addressable_cores_width) - 1)) << 26;
> > 
> > This algorithm lacks flexibility because there will be more topology
> > levels between package and core, such as the cluster being introduced...
> > 
> > Using "addressable_cores_width" is clear enough.
> > 
> > > 		
> > > >                    }
> > > >                    if (host_vcpus_per_cache > vcpus_per_socket) {
> > > > +                    int pkg_offset = apicid_pkg_offset(&topo_info);
> > > > +
> > > >                        *eax &= ~0x3FFC000;
> > > > -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> > > > +                    *eax |= (1 << (pkg_offset - 1)) << 14;
> > > 
> > > Ditto, ((1 << pkg_offset) - 1) << 14
> > 
> > Thanks!
> > 
> > > 
> > > For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
> > > intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
> > > vcpus_per_cache (configured by users) to VM.
> > 
> > I tend to use a uniform calculation that is less confusing and easier to
> > maintain.
> 
> less confusing?
> 
> the original code is
> 
> 	if (host_vcpus_per_cache > vcpus_per_socket) {
> 		*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> 	}
> 
> and this patch is going to change it to
> 
> 	if (host_vcpus_per_cache > vcpus_per_socket) {
> 		int pkg_offset = apicid_pkg_offset(&topo_info);
> 		*eax |= (1 << pkg_offset - 1)) << 14;
> 	}
> 
> Apparently, the former is clearer that everyone knows what is wants to do is
> "when guest's total vcpus_per_socket is even smaller than host's
> vcpu_per_cache, using guest's configuration". While the latter is more
> confusing.

IMO, the only differences are the variable naming and the way the
details are encoded, what is actually trying to be expressed is the
same - both set the cache topology at the package level.

There is no reason to use two encoding ways for the same field, and
it'll be a code maintenance disaster.

I can add comment here to allay your concern.

> 
> > Since this field encodes "Maximum number of addressable IDs",
> > OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
> > only know that L3 is shared at the package level.
> 
> It doesn't matter with L3. What the code want to fulfill is that,

Yes, I misremembered here.

> 
> host_vcpus_per_cache is the actual number of LPs that share this level of
> cache. While vcpus_per_socket is the maximum numbere of LPs that can share a
> cache (at any level) in guest. When guest's maximum number is even smaller
> than host's, use guest's value.
> 

From the Guest's view, the cache is shared at package level. In hardware,
this one field only reflects the topology level and does not accurately
reflect the number of sharing CPUs.

So, we just need to make it clear that in this case the Guest cache
topology level is package.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
  2024-01-11  6:04   ` Xiaoyao Li
@ 2024-01-15  3:25   ` Yuan Yao
  2024-01-15  4:09     ` Zhao Liu
  1 sibling, 1 reply; 68+ messages in thread
From: Yuan Yao @ 2024-01-15  3:25 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On Mon, Jan 08, 2024 at 04:27:19PM +0800, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
>
> Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> handle platforms with Module level enumerated via CPUID.1F.
>
> Expose the module level in CPUID[0x1F] if the machine has more than 1
> modules.
>
> (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
> configurations in "-smp".)
>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> Changes since v3:
>  * New patch to expose module level in 0x1F.
>  * Add Tested-by tag from Yongwei.
> ---
>  target/i386/cpu.c     | 12 +++++++++++-
>  target/i386/cpu.h     |  2 ++
>  target/i386/kvm/kvm.c |  2 +-
>  3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 294ca6b8947a..a2d39d2198b6 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
>          return 1;
>      case CPU_TOPO_LEVEL_CORE:
>          return topo_info->threads_per_core;
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return topo_info->threads_per_core * topo_info->cores_per_module;
>      case CPU_TOPO_LEVEL_DIE:
>          return topo_info->threads_per_core * topo_info->cores_per_module *
>                 topo_info->modules_per_die;
> @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
>          return 0;
>      case CPU_TOPO_LEVEL_CORE:
>          return apicid_core_offset(topo_info);
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return apicid_module_offset(topo_info);
>      case CPU_TOPO_LEVEL_DIE:
>          return apicid_die_offset(topo_info);
>      case CPU_TOPO_LEVEL_PACKAGE:
> @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
>          return CPUID_1F_ECX_TOPO_LEVEL_SMT;
>      case CPU_TOPO_LEVEL_CORE:
>          return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> +    case CPU_TOPO_LEVEL_MODULE:
> +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
>      case CPU_TOPO_LEVEL_DIE:
>          return CPUID_1F_ECX_TOPO_LEVEL_DIE;
>      default:
> @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
>          if (env->nr_dies > 1) {
>              set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
>          }
> +
> +        if (env->nr_modules > 1) {
> +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
> +        }
>      }
>
>      *ecx = count & 0xff;
> @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>          break;
>      case 0x1F:
>          /* V2 Extended Topology Enumeration Leaf */
> -        if (topo_info.dies_per_pkg < 2) {
> +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {

A question:
Is the original checking necessary ?
The 0x1f exists even on cpu w/o modules/dies topology on bare metal, I tried
on EMR:

// leaf 0
0x00000000 0x00: eax=0x00000020 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69

// leaf 0x1f
0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
0x0000001f 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004

// leaf 0xb
0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
0x0000000b 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004

So here leads to different cpu behavior from bare metal, even in case
of "-cpu host".

In SDM Vol2, cpudid instruction section:

" CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel
recommends using leaf 1FH when available rather than leaf
0BH and ensuring that any leaf 0BH algorithms are updated to
support leaf 1FH. "

My understanding: if 0x1f is existed (leaf 0.eax >= 0x1f)
then it should have same values in lp/core level as 0xb.

>              *eax = *ebx = *ecx = *edx = 0;
>              break;
>          }
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index eecd30bde92b..97b290e10576 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
>      CPU_TOPO_LEVEL_INVALID,
>      CPU_TOPO_LEVEL_SMT,
>      CPU_TOPO_LEVEL_CORE,
> +    CPU_TOPO_LEVEL_MODULE,
>      CPU_TOPO_LEVEL_DIE,
>      CPU_TOPO_LEVEL_PACKAGE,
>      CPU_TOPO_LEVEL_MAX,
> @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
>  #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
>  #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
>  #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
> +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
>  #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
>
>  /* MSR Feature Bits */
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4ce80555b45c..e5ddb214cb36 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>              break;
>          }
>          case 0x1f:
> -            if (env->nr_dies < 2) {
> +            if (env->nr_modules < 2 && env->nr_dies < 2) {
>                  break;
>              }
>              /* fallthrough */
> --
> 2.34.1
>
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-14 13:49   ` Xiaoyao Li
@ 2024-01-15  3:27     ` Zhao Liu
  2024-01-15  4:18       ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  3:27 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 21:49:18 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > 
> > Introduce cluster-id other than module-id to be consistent with
> > CpuInstanceProperties.cluster-id, and this avoids the confusion
> > of parameter names when hotplugging.
> 
> I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
> It introduces confusion around the code.

There is a precedent: generic "socket" v.s. i386 "package".

The direct definition of cluster is the level that is above the "core"
and shares the hardware resources including L2. In this sense, arm's
cluster is the same as x86's module.

Though different arches have different naming styles, but QEMU's generic
code still need the uniform topology hierarchy.

> 
> s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
> add a module level for x86 instead of reusing cluster.
> 
> (This is also what I want to reply to the cover letter.)
> 
> [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/

These two new levels have the clear topological hierarchy relationship
and don't duplicate existing ones.

"book" or "drawer" may correspond to intel's "cluster".

Maybe, in the future, we could support for arch-specific alias topologies
in -smp.

Thanks,
Zhao

> 
> > Following the legacy smp check rules, also add the cluster_id validity
> > into x86_cpu_pre_plug().
> > 
> > Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v6:
> >   * Update the comment when check cluster-id. Since there's no
> >     v8.2, the cluster-id support should at least start from v9.0.
> > 
> > Changes since v5:
> >   * Update the comment when check cluster-id. Since current QEMU is
> >     v8.2, the cluster-id support should at least start from v8.3.
> > 
> > Changes since v3:
> >   * Use the imperative in the commit message. (Babu)
> > ---
> >   hw/i386/x86.c     | 33 +++++++++++++++++++++++++--------
> >   target/i386/cpu.c |  2 ++
> >   target/i386/cpu.h |  1 +
> >   3 files changed, 28 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index 5269aae3a5c2..1c1d368614ee 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -329,6 +329,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >               cpu->die_id = 0;
> >           }
> > +        /*
> > +         * cluster-id was optional in QEMU 9.0 and older, so keep it optional
> > +         * if there's only one cluster per die.
> > +         */
> > +        if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
> > +            cpu->cluster_id = 0;
> > +        }
> > +
> >           if (cpu->socket_id < 0) {
> >               error_setg(errp, "CPU socket-id is not set");
> >               return;
> > @@ -345,6 +353,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >                          cpu->die_id, ms->smp.dies - 1);
> >               return;
> >           }
> > +        if (cpu->cluster_id < 0) {
> > +            error_setg(errp, "CPU cluster-id is not set");
> > +            return;
> > +        } else if (cpu->cluster_id > ms->smp.clusters - 1) {
> > +            error_setg(errp, "Invalid CPU cluster-id: %u must be in range 0:%u",
> > +                       cpu->cluster_id, ms->smp.clusters - 1);
> > +            return;
> > +        }
> >           if (cpu->core_id < 0) {
> >               error_setg(errp, "CPU core-id is not set");
> >               return;
> > @@ -364,16 +380,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >           topo_ids.pkg_id = cpu->socket_id;
> >           topo_ids.die_id = cpu->die_id;
> > +        topo_ids.module_id = cpu->cluster_id;
> >           topo_ids.core_id = cpu->core_id;
> >           topo_ids.smt_id = cpu->thread_id;
> > -
> > -        /*
> > -         * TODO: This is the temporary initialization for topo_ids.module_id to
> > -         * avoid "maybe-uninitialized" compilation errors. Will remove when
> > -         * X86CPU supports cluster_id.
> > -         */
> > -        topo_ids.module_id = 0;
> > -
> >           cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
> >       }
> > @@ -418,6 +427,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >       }
> >       cpu->die_id = topo_ids.die_id;
> > +    if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
> > +        error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
> > +            " 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
> > +            topo_ids.module_id);
> > +        return;
> > +    }
> > +    cpu->cluster_id = topo_ids.module_id;
> > +
> >       if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
> >           error_setg(errp, "property core-id: %u doesn't match set apic-id:"
> >               " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index a2d39d2198b6..498a4be62b40 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -7909,12 +7909,14 @@ static Property x86_cpu_properties[] = {
> >       DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
> >       DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
> >       DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
> > +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
> >       DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
> >       DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
> >   #else
> >       DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
> >       DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
> >       DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
> > +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
> >       DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
> >       DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
> >   #endif
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 97b290e10576..009950b87203 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -2057,6 +2057,7 @@ struct ArchCPU {
> >       int32_t node_id; /* NUMA node this CPU belongs to */
> >       int32_t socket_id;
> >       int32_t die_id;
> > +    int32_t cluster_id;
> >       int32_t core_id;
> >       int32_t thread_id;
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-14 14:31   ` Xiaoyao Li
@ 2024-01-15  3:40     ` Zhao Liu
  2024-01-15  4:25       ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  3:40 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Sun, Jan 14, 2024 at 10:31:50PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 22:31:50 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode
>  CPUID[4]
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
> > Intel CPUs.
> > 
> > After cache models have topology information, we can use
> > CPUCacheInfo.share_level to decide which topology level to be encoded
> > into CPUID[4].EAX[bits 25:14].
> > 
> > And since maximum_processor_id (original "num_apic_ids") is parsed
> > based on cpu topology levels, which are verified when parsing smp, it's
> > no need to check this value by "assert(num_apic_ids > 0)" again, so
> > remove this assert.
> > 
> > Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
> > helper to make the code cleaner.
> > 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v1:
> >   * Use "enum CPUTopoLevel share_level" as the parameter in
> >     max_processor_ids_for_cache().
> >   * Make cache_into_passthrough case also use
> >     max_processor_ids_for_cache() and max_core_ids_in_package() to
> >     encode CPUID[4]. (Yanan)
> >   * Rename the title of this patch (the original is "i386: Use
> >     CPUCacheInfo.share_level to encode CPUID[4].EAX[bits 25:14]").
> > ---
> >   target/i386/cpu.c | 70 +++++++++++++++++++++++++++++------------------
> >   1 file changed, 43 insertions(+), 27 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 81e07474acef..b23e8190dc68 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -235,22 +235,53 @@ static uint8_t cpuid2_cache_descriptor(CPUCacheInfo *cache)
> >                          ((t) == UNIFIED_CACHE) ? CACHE_TYPE_UNIFIED : \
> >                          0 /* Invalid value */)
> > +static uint32_t max_processor_ids_for_cache(X86CPUTopoInfo *topo_info,
> > +                                            enum CPUTopoLevel share_level)
> 
> I prefer the name to max_lp_ids_share_the_cache()

Yes, lp is more accurate.

> 
> > +{
> > +    uint32_t num_ids = 0;
> > +
> > +    switch (share_level) {
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        num_ids = 1 << apicid_core_offset(topo_info);
> > +        break;
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        num_ids = 1 << apicid_die_offset(topo_info);
> > +        break;
> > +    case CPU_TOPO_LEVEL_PACKAGE:
> > +        num_ids = 1 << apicid_pkg_offset(topo_info);
> > +        break;
> > +    default:
> > +        /*
> > +         * Currently there is no use case for SMT and MODULE, so use
> > +         * assert directly to facilitate debugging.
> > +         */
> > +        g_assert_not_reached();
> > +    }
> > +
> > +    return num_ids - 1;
> 
> suggest to just return num_ids, and let the caller to do the -1 work.

Emm, SDM calls the whole "num_ids - 1" (CPUID.0x4.EAX[bits 14-25]) as
"maximum number of addressable IDs for logical processors sharing this
cache"...

So if this helper just names "num_ids" as max_lp_ids_share_the_cache,
I'm not sure there would be ambiguity here?

> 
> > +}
> > +
> > +static uint32_t max_core_ids_in_package(X86CPUTopoInfo *topo_info)
> > +{
> > +    uint32_t num_cores = 1 << (apicid_pkg_offset(topo_info) -
> > +                               apicid_core_offset(topo_info));
> > +    return num_cores - 1;
> 
> ditto.
> 
> > +}
> >   /* Encode cache info for CPUID[4] */
> >   static void encode_cache_cpuid4(CPUCacheInfo *cache,
> > -                                int num_apic_ids, int num_cores,
> > +                                X86CPUTopoInfo *topo_info,
> >                                   uint32_t *eax, uint32_t *ebx,
> >                                   uint32_t *ecx, uint32_t *edx)
> >   {
> >       assert(cache->size == cache->line_size * cache->associativity *
> >                             cache->partitions * cache->sets);
> > -    assert(num_apic_ids > 0);
> >       *eax = CACHE_TYPE(cache->type) |
> >              CACHE_LEVEL(cache->level) |
> >              (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
> > -           ((num_cores - 1) << 26) |
> > -           ((num_apic_ids - 1) << 14);
> > +           (max_core_ids_in_package(topo_info) << 26) |
> > +           (max_processor_ids_for_cache(topo_info, cache->share_level) << 14);
> 
> by the way, we can change the order of the two line. :)

Yes!

Thanks,
Zhao

> 
> >       assert(cache->line_size > 0);
> >       assert(cache->partitions > 0);
> > @@ -6263,56 +6294,41 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >                   int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> >                   if (cores_per_pkg > 1) {
> > -                    int addressable_cores_offset =
> > -                                                apicid_pkg_offset(&topo_info) -
> > -                                                apicid_core_offset(&topo_info);
> > -
> >                       *eax &= ~0xFC000000;
> > -                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
> > +                    *eax |= max_core_ids_in_package(&topo_info) << 26;
> >                   }
> >                   if (host_vcpus_per_cache > cpus_per_pkg) {
> > -                    int pkg_offset = apicid_pkg_offset(&topo_info);
> > -
> >                       *eax &= ~0x3FFC000;
> > -                    *eax |= (1 << (pkg_offset - 1)) << 14;
> > +                    *eax |=
> > +                        max_processor_ids_for_cache(&topo_info,
> > +                                                CPU_TOPO_LEVEL_PACKAGE) << 14;
> >                   }
> >               }
> >           } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
> >               *eax = *ebx = *ecx = *edx = 0;
> >           } else {
> >               *eax = 0;
> > -            int addressable_cores_offset = apicid_pkg_offset(&topo_info) -
> > -                                           apicid_core_offset(&topo_info);
> > -            int core_offset, die_offset;
> >               switch (count) {
> >               case 0: /* L1 dcache info */
> > -                core_offset = apicid_core_offset(&topo_info);
> >                   encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> > -                                    (1 << core_offset),
> > -                                    (1 << addressable_cores_offset),
> > +                                    &topo_info,
> >                                       eax, ebx, ecx, edx);
> >                   break;
> >               case 1: /* L1 icache info */
> > -                core_offset = apicid_core_offset(&topo_info);
> >                   encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> > -                                    (1 << core_offset),
> > -                                    (1 << addressable_cores_offset),
> > +                                    &topo_info,
> >                                       eax, ebx, ecx, edx);
> >                   break;
> >               case 2: /* L2 cache info */
> > -                core_offset = apicid_core_offset(&topo_info);
> >                   encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
> > -                                    (1 << core_offset),
> > -                                    (1 << addressable_cores_offset),
> > +                                    &topo_info,
> >                                       eax, ebx, ecx, edx);
> >                   break;
> >               case 3: /* L3 cache info */
> > -                die_offset = apicid_die_offset(&topo_info);
> >                   if (cpu->enable_l3_cache) {
> >                       encode_cache_cpuid4(env->cache_info_cpuid4.l3_cache,
> > -                                        (1 << die_offset),
> > -                                        (1 << addressable_cores_offset),
> > +                                        &topo_info,
> >                                           eax, ebx, ecx, edx);
> >                       break;
> >                   }
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-14 14:42   ` Xiaoyao Li
@ 2024-01-15  3:48     ` Zhao Liu
  2024-01-15  4:27       ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  3:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Sun, Jan 14, 2024 at 10:42:41PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 22:42:41 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for
>  CPUID[0x8000001D].EAX[bits 25:14]
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
> > for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
> > the number of sharing threads directly.
> > 
> >  From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
> > means [1]:
> > 
> > The number of logical processors sharing this cache is the value of
> > this field incremented by 1. To determine which logical processors are
> > sharing a cache, determine a Share Id for each processor as follows:
> > 
> > ShareId = LocalApicId >> log2(NumSharingCache+1)
> > 
> > Logical processors with the same ShareId then share a cache. If
> > NumSharingCache+1 is not a power of two, round it up to the next power
> > of two.
> > 
> >  From the description above, the calculation of this field should be same
> > as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
> > APIC ID to calculate this field.
> > 
> > [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
> >       Information
> 
> this patch can be dropped because we have next patch.

This patch is mainly used to explicitly emphasize the change in encoding
way and compliance with AMD spec... I didn't tested on AMD machine, so
the more granular patch would make it easier for the community to review
and test.

Thanks,
Zhao

> 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Reviewed-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * Rewrite the subject. (Babu)
> >   * Delete the original "comment/help" expression, as this behavior is
> >     confirmed for AMD CPUs. (Babu)
> >   * Rename "num_apic_ids" (v3) to "num_sharing_cache" to match spec
> >     definition. (Babu)
> > 
> > Changes since v1:
> >   * Rename "l3_threads" to "num_apic_ids" in
> >     encode_cache_cpuid8000001d(). (Yanan)
> >   * Add the description of the original commit and add Cc.
> > ---
> >   target/i386/cpu.c | 10 ++++------
> >   1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index b23e8190dc68..8a4d72f6f760 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -483,7 +483,7 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
> >                                          uint32_t *eax, uint32_t *ebx,
> >                                          uint32_t *ecx, uint32_t *edx)
> >   {
> > -    uint32_t l3_threads;
> > +    uint32_t num_sharing_cache;
> >       assert(cache->size == cache->line_size * cache->associativity *
> >                             cache->partitions * cache->sets);
> > @@ -492,13 +492,11 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
> >       /* L3 is shared among multiple cores */
> >       if (cache->level == 3) {
> > -        l3_threads = topo_info->modules_per_die *
> > -                     topo_info->cores_per_module *
> > -                     topo_info->threads_per_core;
> > -        *eax |= (l3_threads - 1) << 14;
> > +        num_sharing_cache = 1 << apicid_die_offset(topo_info);
> >       } else {
> > -        *eax |= ((topo_info->threads_per_core - 1) << 14);
> > +        num_sharing_cache = 1 << apicid_core_offset(topo_info);
> >       }
> > +    *eax |= (num_sharing_cache - 1) << 14;
> >       assert(cache->line_size > 0);
> >       assert(cache->partitions > 0);
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-11  8:43     ` Zhao Liu
  2024-01-14 14:11       ` Xiaoyao Li
@ 2024-01-15  3:51       ` Xiaoyao Li
  2024-01-15  4:16         ` Zhao Liu
  1 sibling, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  3:51 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

On 1/11/2024 4:43 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
>> Date: Wed, 10 Jan 2024 17:31:28 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>>   topo in CPUID[4]
>>
>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>> From: Zhao Liu <zhao1.liu@intel.com>
>>>
>>> Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
>>> CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
>>> nearest power-of-2 integer.
>>>
>>> The nearest power-of-2 integer can be calculated by pow2ceil() or by
>>> using APIC ID offset (like L3 topology using 1 << die_offset [3]).
>>>
>>> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
>>> are associated with APIC ID. For example, in linux kernel, the field
>>> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
>>
>> And for
>>> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
>>> matched with actual core numbers and it's calculated by:
>>> "(1 << (pkg_offset - core_offset)) - 1".
>>
>> could you elaborate it more? what is the value of actual core numbers on
>> Alder lake P? and what is the pkg_offset and core_offset?
> 
> For example, the following's the CPUID dump of an ADL-S machine:
> 
> CPUID.04H:
> 
> 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> 
> 
> CPUID.1FH:
> 
> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> 
> The CPUID.04H:EAX[bits 31:26] is 63.
>  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> 
> Thus we can verify that the above equation as:
> 
> 1 << (0x7 - 0x1) - 1 = 63.
> 
> "Maximum number of addressable IDs" refers to the maximum number of IDs
> that can be enumerated in the APIC ID's topology layout, which does not
> necessarily correspond to the actual number of topology domains.
> 

you still don't tell how many core numbers on Alder lake P.

I guess the number is far smaller than 64, which is not matched with (63 
+ 1)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs
  2024-01-14 12:42   ` Xiaoyao Li
@ 2024-01-15  3:52     ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  3:52 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Sun, Jan 14, 2024 at 08:42:00PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 20:42:00 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > 
> > Add module_id member in X86CPUTopoIDs.
> > 
> > module_id can be parsed from APIC ID, so also update APIC ID parsing
> > rule to support module level. With this support, the conversions with
> > module level between X86CPUTopoIDs, X86CPUTopoInfo and APIC ID are
> > completed.
> > 
> > module_id can be also generated from cpu topology, and before i386
> > supports "clusters" in smp, the default "clusters per die" is only 1,
> > thus the module_id generated in this way is 0, so that it will not
> > conflict with the module_id generated by APIC ID.
> > 
> > Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v1:
> >   * Merge the patch "i386: Update APIC ID parsing rule to support module
> >     level" into this one. (Yanan)
> >   * Move the apicid_module_width() and apicid_module_offset() support
> >     into the previous modules_per_die related patch. (Yanan)
> > ---
> >   hw/i386/x86.c              | 28 +++++++++++++++++++++-------
> >   include/hw/i386/topology.h | 17 +++++++++++++----
> >   2 files changed, 34 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index 85b847ac7914..5269aae3a5c2 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -315,11 +315,11 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >       /*
> >        * If APIC ID is not set,
> > -     * set it based on socket/die/core/thread properties.
> > +     * set it based on socket/die/cluster/core/thread properties.
> >        */
> >       if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> > -        int max_socket = (ms->smp.max_cpus - 1) /
> > -                                smp_threads / smp_cores / ms->smp.dies;
> > +        int max_socket = (ms->smp.max_cpus - 1) / smp_threads / smp_cores /
> > +                                ms->smp.clusters / ms->smp.dies;
> >           /*
> >            * die-id was optional in QEMU 4.0 and older, so keep it optional
> > @@ -366,17 +366,27 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >           topo_ids.die_id = cpu->die_id;
> >           topo_ids.core_id = cpu->core_id;
> >           topo_ids.smt_id = cpu->thread_id;
> > +
> > +        /*
> > +         * TODO: This is the temporary initialization for topo_ids.module_id to
> > +         * avoid "maybe-uninitialized" compilation errors. Will remove when
> > +         * X86CPU supports cluster_id.
> > +         */
> > +        topo_ids.module_id = 0;
> > 
> 
> if you put patch 10 before this patch, then we don't need this trick.

Then, we need another trick to resolve "cpu->cluster_id = topo_ids.module_id;" 
in patch 10. ;-)

Thanks,
Zhao

> 
> >           cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
> >       }
> >       cpu_slot = x86_find_cpu_slot(MACHINE(x86ms), cpu->apic_id, &idx);
> >       if (!cpu_slot) {
> >           x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> > +
> >           error_setg(errp,
> > -            "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
> > -            " APIC ID %" PRIu32 ", valid index range 0:%d",
> > -            topo_ids.pkg_id, topo_ids.die_id, topo_ids.core_id, topo_ids.smt_id,
> > -            cpu->apic_id, ms->possible_cpus->len - 1);
> > +            "Invalid CPU [socket: %u, die: %u, module: %u, core: %u, thread: %u]"
> > +            " with APIC ID %" PRIu32 ", valid index range 0:%d",
> > +            topo_ids.pkg_id, topo_ids.die_id, topo_ids.module_id,
> > +            topo_ids.core_id, topo_ids.smt_id, cpu->apic_id,
> > +            ms->possible_cpus->len - 1);
> >           return;
> >       }
> > @@ -502,6 +512,10 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
> >               ms->possible_cpus->cpus[i].props.has_die_id = true;
> >               ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
> >           }
> > +        if (ms->smp.clusters > 1) {
> > +            ms->possible_cpus->cpus[i].props.has_cluster_id = true;
> > +            ms->possible_cpus->cpus[i].props.cluster_id = topo_ids.module_id;
> > +        }
> >           ms->possible_cpus->cpus[i].props.has_core_id = true;
> >           ms->possible_cpus->cpus[i].props.core_id = topo_ids.core_id;
> >           ms->possible_cpus->cpus[i].props.has_thread_id = true;
> > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > index 517e51768c13..ed1f3d6c1d5e 100644
> > --- a/include/hw/i386/topology.h
> > +++ b/include/hw/i386/topology.h
> > @@ -50,6 +50,7 @@ typedef uint32_t apic_id_t;
> >   typedef struct X86CPUTopoIDs {
> >       unsigned pkg_id;
> >       unsigned die_id;
> > +    unsigned module_id;
> >       unsigned core_id;
> >       unsigned smt_id;
> >   } X86CPUTopoIDs;
> > @@ -127,6 +128,7 @@ static inline apic_id_t x86_apicid_from_topo_ids(X86CPUTopoInfo *topo_info,
> >   {
> >       return (topo_ids->pkg_id  << apicid_pkg_offset(topo_info)) |
> >              (topo_ids->die_id  << apicid_die_offset(topo_info)) |
> > +           (topo_ids->module_id << apicid_module_offset(topo_info)) |
> >              (topo_ids->core_id << apicid_core_offset(topo_info)) |
> >              topo_ids->smt_id;
> >   }
> > @@ -140,12 +142,16 @@ static inline void x86_topo_ids_from_idx(X86CPUTopoInfo *topo_info,
> >                                            X86CPUTopoIDs *topo_ids)
> >   {
> >       unsigned nr_dies = topo_info->dies_per_pkg;
> > -    unsigned nr_cores = topo_info->cores_per_module *
> > -                        topo_info->modules_per_die;
> > +    unsigned nr_modules = topo_info->modules_per_die;
> > +    unsigned nr_cores = topo_info->cores_per_module;
> >       unsigned nr_threads = topo_info->threads_per_core;
> > -    topo_ids->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
> > -    topo_ids->die_id = cpu_index / (nr_cores * nr_threads) % nr_dies;
> > +    topo_ids->pkg_id = cpu_index / (nr_dies * nr_modules *
> > +                       nr_cores * nr_threads);
> > +    topo_ids->die_id = cpu_index / (nr_modules * nr_cores *
> > +                       nr_threads) % nr_dies;
> > +    topo_ids->module_id = cpu_index / (nr_cores * nr_threads) %
> > +                          nr_modules;
> >       topo_ids->core_id = cpu_index / nr_threads % nr_cores;
> >       topo_ids->smt_id = cpu_index % nr_threads;
> >   }
> > @@ -163,6 +169,9 @@ static inline void x86_topo_ids_from_apicid(apic_id_t apicid,
> >       topo_ids->core_id =
> >               (apicid >> apicid_core_offset(topo_info)) &
> >               ~(0xFFFFFFFFUL << apicid_core_width(topo_info));
> > +    topo_ids->module_id =
> > +            (apicid >> apicid_module_offset(topo_info)) &
> > +            ~(0xFFFFFFFFUL << apicid_module_width(topo_info));
> >       topo_ids->die_id =
> >               (apicid >> apicid_die_offset(topo_info)) &
> >               ~(0xFFFFFFFFUL << apicid_die_width(topo_info));
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  3:25   ` Yuan Yao
@ 2024-01-15  4:09     ` Zhao Liu
  2024-01-15  4:34       ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  4:09 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Yuan,

On Mon, Jan 15, 2024 at 11:25:24AM +0800, Yuan Yao wrote:
> Date: Mon, 15 Jan 2024 11:25:24 +0800
> From: Yuan Yao <yuan.yao@linux.intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> On Mon, Jan 08, 2024 at 04:27:19PM +0800, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> >
> > Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> > erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> > handle platforms with Module level enumerated via CPUID.1F.
> >
> > Expose the module level in CPUID[0x1F] if the machine has more than 1
> > modules.
> >
> > (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
> > configurations in "-smp".)
> >
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >  * New patch to expose module level in 0x1F.
> >  * Add Tested-by tag from Yongwei.
> > ---
> >  target/i386/cpu.c     | 12 +++++++++++-
> >  target/i386/cpu.h     |  2 ++
> >  target/i386/kvm/kvm.c |  2 +-
> >  3 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 294ca6b8947a..a2d39d2198b6 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> >          return 1;
> >      case CPU_TOPO_LEVEL_CORE:
> >          return topo_info->threads_per_core;
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_module;
> >      case CPU_TOPO_LEVEL_DIE:
> >          return topo_info->threads_per_core * topo_info->cores_per_module *
> >                 topo_info->modules_per_die;
> > @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> >          return 0;
> >      case CPU_TOPO_LEVEL_CORE:
> >          return apicid_core_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return apicid_module_offset(topo_info);
> >      case CPU_TOPO_LEVEL_DIE:
> >          return apicid_die_offset(topo_info);
> >      case CPU_TOPO_LEVEL_PACKAGE:
> > @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> >          return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> >      case CPU_TOPO_LEVEL_CORE:
> >          return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> > +    case CPU_TOPO_LEVEL_MODULE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
> >      case CPU_TOPO_LEVEL_DIE:
> >          return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> >      default:
> > @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> >          if (env->nr_dies > 1) {
> >              set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> >          }
> > +
> > +        if (env->nr_modules > 1) {
> > +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
> > +        }
> >      }
> >
> >      *ecx = count & 0xff;
> > @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >          break;
> >      case 0x1F:
> >          /* V2 Extended Topology Enumeration Leaf */
> > -        if (topo_info.dies_per_pkg < 2) {
> > +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
> 
> A question:
> Is the original checking necessary ?
> The 0x1f exists even on cpu w/o modules/dies topology on bare metal, I tried
> on EMR:
> 
> // leaf 0
> 0x00000000 0x00: eax=0x00000020 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
> 
> // leaf 0x1f
> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
> 0x0000001f 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
> 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004
> 
> // leaf 0xb
> 0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
> 0x0000000b 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
> 0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004

The 0x1f is introduced for CascadeLake-AP with die level. And yes the
newer mahcines all have this leaf.

> 
> So here leads to different cpu behavior from bare metal, even in case
> of "-cpu host".
> 
> In SDM Vol2, cpudid instruction section:
> 
> " CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel
> recommends using leaf 1FH when available rather than leaf
> 0BH and ensuring that any leaf 0BH algorithms are updated to
> support leaf 1FH. "
> 
> My understanding: if 0x1f is existed (leaf 0.eax >= 0x1f)
> then it should have same values in lp/core level as 0xb.

Yes, I think it's time to move to default 0x1f.

The compatibility issue can be solved by a cpuid-0x1f option similar to
cpuid-0xb. I'll cook a patch after this patch series.

Thanks,
Zhao

> 
> >              *eax = *ebx = *ecx = *edx = 0;
> >              break;
> >          }
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index eecd30bde92b..97b290e10576 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
> >      CPU_TOPO_LEVEL_INVALID,
> >      CPU_TOPO_LEVEL_SMT,
> >      CPU_TOPO_LEVEL_CORE,
> > +    CPU_TOPO_LEVEL_MODULE,
> >      CPU_TOPO_LEVEL_DIE,
> >      CPU_TOPO_LEVEL_PACKAGE,
> >      CPU_TOPO_LEVEL_MAX,
> > @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
> >  #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
> >  #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
> >  #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
> > +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
> >  #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
> >
> >  /* MSR Feature Bits */
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 4ce80555b45c..e5ddb214cb36 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >              break;
> >          }
> >          case 0x1f:
> > -            if (env->nr_dies < 2) {
> > +            if (env->nr_modules < 2 && env->nr_dies < 2) {
> >                  break;
> >              }
> >              /* fallthrough */
> > --
> > 2.34.1
> >
> >


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
  2024-01-15  3:51       ` Xiaoyao Li
@ 2024-01-15  4:16         ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  4:16 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Robert Hoo,
	Babu Moger, Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 11:51:05AM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 11:51:05 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>  topo in CPUID[4]
> 
> On 1/11/2024 4:43 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
> > > Date: Wed, 10 Jan 2024 17:31:28 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
> > >   topo in CPUID[4]
> > > 
> > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > > 
> > > > Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
> > > > CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
> > > > nearest power-of-2 integer.
> > > > 
> > > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > > using APIC ID offset (like L3 topology using 1 << die_offset [3]).
> > > > 
> > > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > > are associated with APIC ID. For example, in linux kernel, the field
> > > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
> > > 
> > > And for
> > > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > > matched with actual core numbers and it's calculated by:
> > > > "(1 << (pkg_offset - core_offset)) - 1".
> > > 
> > > could you elaborate it more? what is the value of actual core numbers on
> > > Alder lake P? and what is the pkg_offset and core_offset?
> > 
> > For example, the following's the CPUID dump of an ADL-S machine:
> > 
> > CPUID.04H:
> > 
> > 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> > 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> > 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> > 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> > 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> > 
> > 
> > CPUID.1FH:
> > 
> > 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> > 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> > 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> > 
> > The CPUID.04H:EAX[bits 31:26] is 63.
> >  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> > CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> > 
> > Thus we can verify that the above equation as:
> > 
> > 1 << (0x7 - 0x1) - 1 = 63.
> > 
> > "Maximum number of addressable IDs" refers to the maximum number of IDs
> > that can be enumerated in the APIC ID's topology layout, which does not
> > necessarily correspond to the actual number of topology domains.
> > 
> 
> you still don't tell how many core numbers on Alder lake P.
> 
> I guess the number is far smaller than 64, which is not matched with (63 +
> 1)
> 

There're 8 P cores (with 2 threads per P core) + 4 E cores (with 1
thread per E core) for this machine (ADL-S).

Thus this field only shows the theoretical size of the id space and does
not reflect the actual cores numbers.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-15  3:27     ` Zhao Liu
@ 2024-01-15  4:18       ` Xiaoyao Li
  2024-01-15  5:59         ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  4:18 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 11:27 AM, Zhao Liu wrote:
> On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
>> Date: Sun, 14 Jan 2024 21:49:18 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>
>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
>>>
>>> Introduce cluster-id other than module-id to be consistent with
>>> CpuInstanceProperties.cluster-id, and this avoids the confusion
>>> of parameter names when hotplugging.
>>
>> I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
>> It introduces confusion around the code.
> 
> There is a precedent: generic "socket" v.s. i386 "package".

It's not the same thing. "socket" vs "package" is just software people 
and hardware people chose different name. It's just different naming issue.

however, here it's reusing name issue while 'cluster' has been defined 
for x86. It does introduce confusion.

> The direct definition of cluster is the level that is above the "core"
> and shares the hardware resources including L2. In this sense, arm's
> cluster is the same as x86's module.

then, what about intel implements tile level in the future? why ARM's 
'cluster' is mapped to 'module', but not 'tile' ?

reusing 'cluster' for 'module' is just a bad idea.

> Though different arches have different naming styles, but QEMU's generic
> code still need the uniform topology hierarchy.

generic code can provide as many topology levels as it can. each ARCH 
can choose to use the ones it supports.

e.g.,

in qapi/machine.json, it says,

# The ordering from highest/coarsest to lowest/finest is:
# @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
#
# Different architectures support different subsets of topology
# containers.
#
# For example, s390x does not have clusters and dies, and the socket
# is the parent container of cores.

we can update it to

# The ordering from highest/coarsest to lowest/finest is:
# @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
# @threads.
#
# Different architectures support different subsets of topology
# containers.
#
# For example, s390x does not have clusters and dies, and the socket
# is the parent container of cores.
#
# For example, x86 does not have drawers and books, and does not support
# cluster.

even if cluster of x86 is supported someday in the future, we can remove 
the ordering requirement from above description.

>>
>> s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
>> add a module level for x86 instead of reusing cluster.
>>
>> (This is also what I want to reply to the cover letter.)
>>
>> [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
> 
> These two new levels have the clear topological hierarchy relationship
> and don't duplicate existing ones.
> 
> "book" or "drawer" may correspond to intel's "cluster".
> 
> Maybe, in the future, we could support for arch-specific alias topologies
> in -smp.

I don't think we need alias, reusing 'cluster' for 'module' doesn't gain 
any benefit except avoid adding a new field in SMPconfiguration. All the 
other cluster code is ARM specific and x86 cannot share.

I don't think it's a problem to add 'module' to SMPconfiguration.

> Thanks,
> Zhao
> 
>>
>>> Following the legacy smp check rules, also add the cluster_id validity
>>> into x86_cpu_pre_plug().
>>>
>>> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
>>> Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
>>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>>> Tested-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>> Changes since v6:
>>>    * Update the comment when check cluster-id. Since there's no
>>>      v8.2, the cluster-id support should at least start from v9.0.
>>>
>>> Changes since v5:
>>>    * Update the comment when check cluster-id. Since current QEMU is
>>>      v8.2, the cluster-id support should at least start from v8.3.
>>>
>>> Changes since v3:
>>>    * Use the imperative in the commit message. (Babu)
>>> ---
>>>    hw/i386/x86.c     | 33 +++++++++++++++++++++++++--------
>>>    target/i386/cpu.c |  2 ++
>>>    target/i386/cpu.h |  1 +
>>>    3 files changed, 28 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>>> index 5269aae3a5c2..1c1d368614ee 100644
>>> --- a/hw/i386/x86.c
>>> +++ b/hw/i386/x86.c
>>> @@ -329,6 +329,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>>                cpu->die_id = 0;
>>>            }
>>> +        /*
>>> +         * cluster-id was optional in QEMU 9.0 and older, so keep it optional
>>> +         * if there's only one cluster per die.
>>> +         */
>>> +        if (cpu->cluster_id < 0 && ms->smp.clusters == 1) {
>>> +            cpu->cluster_id = 0;
>>> +        }
>>> +
>>>            if (cpu->socket_id < 0) {
>>>                error_setg(errp, "CPU socket-id is not set");
>>>                return;
>>> @@ -345,6 +353,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>>                           cpu->die_id, ms->smp.dies - 1);
>>>                return;
>>>            }
>>> +        if (cpu->cluster_id < 0) {
>>> +            error_setg(errp, "CPU cluster-id is not set");
>>> +            return;
>>> +        } else if (cpu->cluster_id > ms->smp.clusters - 1) {
>>> +            error_setg(errp, "Invalid CPU cluster-id: %u must be in range 0:%u",
>>> +                       cpu->cluster_id, ms->smp.clusters - 1);
>>> +            return;
>>> +        }
>>>            if (cpu->core_id < 0) {
>>>                error_setg(errp, "CPU core-id is not set");
>>>                return;
>>> @@ -364,16 +380,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>>            topo_ids.pkg_id = cpu->socket_id;
>>>            topo_ids.die_id = cpu->die_id;
>>> +        topo_ids.module_id = cpu->cluster_id;
>>>            topo_ids.core_id = cpu->core_id;
>>>            topo_ids.smt_id = cpu->thread_id;
>>> -
>>> -        /*
>>> -         * TODO: This is the temporary initialization for topo_ids.module_id to
>>> -         * avoid "maybe-uninitialized" compilation errors. Will remove when
>>> -         * X86CPU supports cluster_id.
>>> -         */
>>> -        topo_ids.module_id = 0;
>>> -
>>>            cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
>>>        }
>>> @@ -418,6 +427,14 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>>        }
>>>        cpu->die_id = topo_ids.die_id;
>>> +    if (cpu->cluster_id != -1 && cpu->cluster_id != topo_ids.module_id) {
>>> +        error_setg(errp, "property cluster-id: %u doesn't match set apic-id:"
>>> +            " 0x%x (cluster-id: %u)", cpu->cluster_id, cpu->apic_id,
>>> +            topo_ids.module_id);
>>> +        return;
>>> +    }
>>> +    cpu->cluster_id = topo_ids.module_id;
>>> +
>>>        if (cpu->core_id != -1 && cpu->core_id != topo_ids.core_id) {
>>>            error_setg(errp, "property core-id: %u doesn't match set apic-id:"
>>>                " 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index a2d39d2198b6..498a4be62b40 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -7909,12 +7909,14 @@ static Property x86_cpu_properties[] = {
>>>        DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, 0),
>>>        DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, 0),
>>>        DEFINE_PROP_INT32("core-id", X86CPU, core_id, 0),
>>> +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, 0),
>>>        DEFINE_PROP_INT32("die-id", X86CPU, die_id, 0),
>>>        DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, 0),
>>>    #else
>>>        DEFINE_PROP_UINT32("apic-id", X86CPU, apic_id, UNASSIGNED_APIC_ID),
>>>        DEFINE_PROP_INT32("thread-id", X86CPU, thread_id, -1),
>>>        DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
>>> +    DEFINE_PROP_INT32("cluster-id", X86CPU, cluster_id, -1),
>>>        DEFINE_PROP_INT32("die-id", X86CPU, die_id, -1),
>>>        DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
>>>    #endif
>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>> index 97b290e10576..009950b87203 100644
>>> --- a/target/i386/cpu.h
>>> +++ b/target/i386/cpu.h
>>> @@ -2057,6 +2057,7 @@ struct ArchCPU {
>>>        int32_t node_id; /* NUMA node this CPU belongs to */
>>>        int32_t socket_id;
>>>        int32_t die_id;
>>> +    int32_t cluster_id;
>>>        int32_t core_id;
>>>        int32_t thread_id;
>>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-15  3:40     ` Zhao Liu
@ 2024-01-15  4:25       ` Xiaoyao Li
  2024-01-15  6:25         ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  4:25 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 11:40 AM, Zhao Liu wrote:
>>> +{
>>> +    uint32_t num_ids = 0;
>>> +
>>> +    switch (share_level) {
>>> +    case CPU_TOPO_LEVEL_CORE:
>>> +        num_ids = 1 << apicid_core_offset(topo_info);
>>> +        break;
>>> +    case CPU_TOPO_LEVEL_DIE:
>>> +        num_ids = 1 << apicid_die_offset(topo_info);
>>> +        break;
>>> +    case CPU_TOPO_LEVEL_PACKAGE:
>>> +        num_ids = 1 << apicid_pkg_offset(topo_info);
>>> +        break;
>>> +    default:
>>> +        /*
>>> +         * Currently there is no use case for SMT and MODULE, so use
>>> +         * assert directly to facilitate debugging.
>>> +         */
>>> +        g_assert_not_reached();
>>> +    }
>>> +
>>> +    return num_ids - 1;
>> suggest to just return num_ids, and let the caller to do the -1 work.
> Emm, SDM calls the whole "num_ids - 1" (CPUID.0x4.EAX[bits 14-25]) as
> "maximum number of addressable IDs for logical processors sharing this
> cache"...
> 
> So if this helper just names "num_ids" as max_lp_ids_share_the_cache,
> I'm not sure there would be ambiguity here?

I don't think it will.

if this function is going to used anywhere else, people will need to 
keep in mind to do +1 stuff to get the actual number.

leaving the -1 trick to where CPUID value gets encoded. let's make this 
function generic.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-15  3:48     ` Zhao Liu
@ 2024-01-15  4:27       ` Xiaoyao Li
  2024-01-15 14:54         ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  4:27 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 11:48 AM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Sun, Jan 14, 2024 at 10:42:41PM +0800, Xiaoyao Li wrote:
>> Date: Sun, 14 Jan 2024 22:42:41 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for
>>   CPUID[0x8000001D].EAX[bits 25:14]
>>
>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>> From: Zhao Liu <zhao1.liu@intel.com>
>>>
>>> The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
>>> for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
>>> the number of sharing threads directly.
>>>
>>>   From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
>>> means [1]:
>>>
>>> The number of logical processors sharing this cache is the value of
>>> this field incremented by 1. To determine which logical processors are
>>> sharing a cache, determine a Share Id for each processor as follows:
>>>
>>> ShareId = LocalApicId >> log2(NumSharingCache+1)
>>>
>>> Logical processors with the same ShareId then share a cache. If
>>> NumSharingCache+1 is not a power of two, round it up to the next power
>>> of two.
>>>
>>>   From the description above, the calculation of this field should be same
>>> as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
>>> APIC ID to calculate this field.
>>>
>>> [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
>>>        Information
>>
>> this patch can be dropped because we have next patch.
> 
> This patch is mainly used to explicitly emphasize the change in encoding
> way and compliance with AMD spec... I didn't tested on AMD machine, so
> the more granular patch would make it easier for the community to review
> and test.

then please move this patch ahead, e.g., after patch 2.

> Thanks,
> Zhao
> 
>>
>>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>>> Reviewed-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>> Changes since v3:
>>>    * Rewrite the subject. (Babu)
>>>    * Delete the original "comment/help" expression, as this behavior is
>>>      confirmed for AMD CPUs. (Babu)
>>>    * Rename "num_apic_ids" (v3) to "num_sharing_cache" to match spec
>>>      definition. (Babu)
>>>
>>> Changes since v1:
>>>    * Rename "l3_threads" to "num_apic_ids" in
>>>      encode_cache_cpuid8000001d(). (Yanan)
>>>    * Add the description of the original commit and add Cc.
>>> ---
>>>    target/i386/cpu.c | 10 ++++------
>>>    1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index b23e8190dc68..8a4d72f6f760 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -483,7 +483,7 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
>>>                                           uint32_t *eax, uint32_t *ebx,
>>>                                           uint32_t *ecx, uint32_t *edx)
>>>    {
>>> -    uint32_t l3_threads;
>>> +    uint32_t num_sharing_cache;
>>>        assert(cache->size == cache->line_size * cache->associativity *
>>>                              cache->partitions * cache->sets);
>>> @@ -492,13 +492,11 @@ static void encode_cache_cpuid8000001d(CPUCacheInfo *cache,
>>>        /* L3 is shared among multiple cores */
>>>        if (cache->level == 3) {
>>> -        l3_threads = topo_info->modules_per_die *
>>> -                     topo_info->cores_per_module *
>>> -                     topo_info->threads_per_core;
>>> -        *eax |= (l3_threads - 1) << 14;
>>> +        num_sharing_cache = 1 << apicid_die_offset(topo_info);
>>>        } else {
>>> -        *eax |= ((topo_info->threads_per_core - 1) << 14);
>>> +        num_sharing_cache = 1 << apicid_core_offset(topo_info);
>>>        }
>>> +    *eax |= (num_sharing_cache - 1) << 14;
>>>        assert(cache->line_size > 0);
>>>        assert(cache->partitions > 0);
>>



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  4:09     ` Zhao Liu
@ 2024-01-15  4:34       ` Xiaoyao Li
  2024-01-15  5:20         ` Yuan Yao
  2024-01-15  6:12         ` Zhao Liu
  0 siblings, 2 replies; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  4:34 UTC (permalink / raw)
  To: Zhao Liu, Yuan Yao
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 12:09 PM, Zhao Liu wrote:
> Hi Yuan,
> 
> On Mon, Jan 15, 2024 at 11:25:24AM +0800, Yuan Yao wrote:
>> Date: Mon, 15 Jan 2024 11:25:24 +0800
>> From: Yuan Yao <yuan.yao@linux.intel.com>
>> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
>>
>> On Mon, Jan 08, 2024 at 04:27:19PM +0800, Zhao Liu wrote:
>>> From: Zhao Liu <zhao1.liu@intel.com>
>>>
>>> Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
>>> erroneous smp_num_siblings on Intel Hybrid platforms") is able to
>>> handle platforms with Module level enumerated via CPUID.1F.
>>>
>>> Expose the module level in CPUID[0x1F] if the machine has more than 1
>>> modules.
>>>
>>> (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
>>> configurations in "-smp".)
>>>
>>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>>> Tested-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>> Changes since v3:
>>>   * New patch to expose module level in 0x1F.
>>>   * Add Tested-by tag from Yongwei.
>>> ---
>>>   target/i386/cpu.c     | 12 +++++++++++-
>>>   target/i386/cpu.h     |  2 ++
>>>   target/i386/kvm/kvm.c |  2 +-
>>>   3 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index 294ca6b8947a..a2d39d2198b6 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
>>>           return 1;
>>>       case CPU_TOPO_LEVEL_CORE:
>>>           return topo_info->threads_per_core;
>>> +    case CPU_TOPO_LEVEL_MODULE:
>>> +        return topo_info->threads_per_core * topo_info->cores_per_module;
>>>       case CPU_TOPO_LEVEL_DIE:
>>>           return topo_info->threads_per_core * topo_info->cores_per_module *
>>>                  topo_info->modules_per_die;
>>> @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
>>>           return 0;
>>>       case CPU_TOPO_LEVEL_CORE:
>>>           return apicid_core_offset(topo_info);
>>> +    case CPU_TOPO_LEVEL_MODULE:
>>> +        return apicid_module_offset(topo_info);
>>>       case CPU_TOPO_LEVEL_DIE:
>>>           return apicid_die_offset(topo_info);
>>>       case CPU_TOPO_LEVEL_PACKAGE:
>>> @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
>>>           return CPUID_1F_ECX_TOPO_LEVEL_SMT;
>>>       case CPU_TOPO_LEVEL_CORE:
>>>           return CPUID_1F_ECX_TOPO_LEVEL_CORE;
>>> +    case CPU_TOPO_LEVEL_MODULE:
>>> +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
>>>       case CPU_TOPO_LEVEL_DIE:
>>>           return CPUID_1F_ECX_TOPO_LEVEL_DIE;
>>>       default:
>>> @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
>>>           if (env->nr_dies > 1) {
>>>               set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
>>>           }
>>> +
>>> +        if (env->nr_modules > 1) {
>>> +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
>>> +        }
>>>       }
>>>
>>>       *ecx = count & 0xff;
>>> @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>           break;
>>>       case 0x1F:
>>>           /* V2 Extended Topology Enumeration Leaf */
>>> -        if (topo_info.dies_per_pkg < 2) {
>>> +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
>>
>> A question:
>> Is the original checking necessary ?
>> The 0x1f exists even on cpu w/o modules/dies topology on bare metal, I tried
>> on EMR:
>>
>> // leaf 0
>> 0x00000000 0x00: eax=0x00000020 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
>>
>> // leaf 0x1f
>> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
>> 0x0000001f 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
>> 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004
>>
>> // leaf 0xb
>> 0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
>> 0x0000000b 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
>> 0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004
> 
> The 0x1f is introduced for CascadeLake-AP with die level. And yes the
> newer mahcines all have this leaf.
> 
>>
>> So here leads to different cpu behavior from bare metal, even in case
>> of "-cpu host".
>>
>> In SDM Vol2, cpudid instruction section:
>>
>> " CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel
>> recommends using leaf 1FH when available rather than leaf
>> 0BH and ensuring that any leaf 0BH algorithms are updated to
>> support leaf 1FH. "
>>
>> My understanding: if 0x1f is existed (leaf 0.eax >= 0x1f)
>> then it should have same values in lp/core level as 0xb.

No. leaf 0x1f reports the same values in lp/core leve as leaf 0xb only 
when the machine supports these two levels. If the machine supports more 
levels, they will be different.

e.g., the data on one Alder lake:

0x0000000b 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x00000006
    0x0000000b 0x01: eax=0x00000007 ebx=0x00000004 ecx=0x00000201 
edx=0x00000006
    0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 
edx=0x00000006

0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x00000006
    0x0000001f 0x01: eax=0x00000003 ebx=0x00000004 ecx=0x00000201 
edx=0x00000006
    0x0000001f 0x02: eax=0x00000007 ebx=0x00000004 ecx=0x00000302 
edx=0x00000006
    0x0000001f 0x03: eax=0x00000000 ebx=0x00000000 ecx=0x00000003 
edx=0x00000006


> Yes, I think it's time to move to default 0x1f.

we don't need to do so until it's necessary.

> The compatibility issue can be solved by a cpuid-0x1f option similar to
> cpuid-0xb. I'll cook a patch after this patch series.
> 
> Thanks,
> Zhao
> 
>>
>>>               *eax = *ebx = *ecx = *edx = 0;
>>>               break;
>>>           }
>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>> index eecd30bde92b..97b290e10576 100644
>>> --- a/target/i386/cpu.h
>>> +++ b/target/i386/cpu.h
>>> @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
>>>       CPU_TOPO_LEVEL_INVALID,
>>>       CPU_TOPO_LEVEL_SMT,
>>>       CPU_TOPO_LEVEL_CORE,
>>> +    CPU_TOPO_LEVEL_MODULE,
>>>       CPU_TOPO_LEVEL_DIE,
>>>       CPU_TOPO_LEVEL_PACKAGE,
>>>       CPU_TOPO_LEVEL_MAX,
>>> @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
>>>   #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
>>>   #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
>>>   #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
>>> +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
>>>   #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
>>>
>>>   /* MSR Feature Bits */
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index 4ce80555b45c..e5ddb214cb36 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>>               break;
>>>           }
>>>           case 0x1f:
>>> -            if (env->nr_dies < 2) {
>>> +            if (env->nr_modules < 2 && env->nr_dies < 2) {
>>>                   break;
>>>               }
>>>               /* fallthrough */
>>> --
>>> 2.34.1
>>>
>>>
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  4:34       ` Xiaoyao Li
@ 2024-01-15  5:20         ` Yuan Yao
  2024-01-15  6:20           ` Zhao Liu
  2024-01-15  6:12         ` Zhao Liu
  1 sibling, 1 reply; 68+ messages in thread
From: Yuan Yao @ 2024-01-15  5:20 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Zhao Liu, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
> On 1/15/2024 12:09 PM, Zhao Liu wrote:
> > Hi Yuan,
> >
> > On Mon, Jan 15, 2024 at 11:25:24AM +0800, Yuan Yao wrote:
> > > Date: Mon, 15 Jan 2024 11:25:24 +0800
> > > From: Yuan Yao <yuan.yao@linux.intel.com>
> > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > >
> > > On Mon, Jan 08, 2024 at 04:27:19PM +0800, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > >
> > > > Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
> > > > erroneous smp_num_siblings on Intel Hybrid platforms") is able to
> > > > handle platforms with Module level enumerated via CPUID.1F.
> > > >
> > > > Expose the module level in CPUID[0x1F] if the machine has more than 1
> > > > modules.
> > > >
> > > > (Tested CPU topology in CPUID[0x1F] leaf with various die/cluster
> > > > configurations in "-smp".)
> > > >
> > > > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > > > Tested-by: Babu Moger <babu.moger@amd.com>
> > > > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > Changes since v3:
> > > >   * New patch to expose module level in 0x1F.
> > > >   * Add Tested-by tag from Yongwei.
> > > > ---
> > > >   target/i386/cpu.c     | 12 +++++++++++-
> > > >   target/i386/cpu.h     |  2 ++
> > > >   target/i386/kvm/kvm.c |  2 +-
> > > >   3 files changed, 14 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > > index 294ca6b8947a..a2d39d2198b6 100644
> > > > --- a/target/i386/cpu.c
> > > > +++ b/target/i386/cpu.c
> > > > @@ -277,6 +277,8 @@ static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> > > >           return 1;
> > > >       case CPU_TOPO_LEVEL_CORE:
> > > >           return topo_info->threads_per_core;
> > > > +    case CPU_TOPO_LEVEL_MODULE:
> > > > +        return topo_info->threads_per_core * topo_info->cores_per_module;
> > > >       case CPU_TOPO_LEVEL_DIE:
> > > >           return topo_info->threads_per_core * topo_info->cores_per_module *
> > > >                  topo_info->modules_per_die;
> > > > @@ -297,6 +299,8 @@ static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> > > >           return 0;
> > > >       case CPU_TOPO_LEVEL_CORE:
> > > >           return apicid_core_offset(topo_info);
> > > > +    case CPU_TOPO_LEVEL_MODULE:
> > > > +        return apicid_module_offset(topo_info);
> > > >       case CPU_TOPO_LEVEL_DIE:
> > > >           return apicid_die_offset(topo_info);
> > > >       case CPU_TOPO_LEVEL_PACKAGE:
> > > > @@ -316,6 +320,8 @@ static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> > > >           return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> > > >       case CPU_TOPO_LEVEL_CORE:
> > > >           return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> > > > +    case CPU_TOPO_LEVEL_MODULE:
> > > > +        return CPUID_1F_ECX_TOPO_LEVEL_MODULE;
> > > >       case CPU_TOPO_LEVEL_DIE:
> > > >           return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> > > >       default:
> > > > @@ -347,6 +353,10 @@ static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> > > >           if (env->nr_dies > 1) {
> > > >               set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> > > >           }
> > > > +
> > > > +        if (env->nr_modules > 1) {
> > > > +            set_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap);
> > > > +        }
> > > >       }
> > > >
> > > >       *ecx = count & 0xff;
> > > > @@ -6394,7 +6404,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > >           break;
> > > >       case 0x1F:
> > > >           /* V2 Extended Topology Enumeration Leaf */
> > > > -        if (topo_info.dies_per_pkg < 2) {
> > > > +        if (topo_info.modules_per_die < 2 && topo_info.dies_per_pkg < 2) {
> > >
> > > A question:
> > > Is the original checking necessary ?
> > > The 0x1f exists even on cpu w/o modules/dies topology on bare metal, I tried
> > > on EMR:
> > >
> > > // leaf 0
> > > 0x00000000 0x00: eax=0x00000020 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
> > >
> > > // leaf 0x1f
> > > 0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
> > > 0x0000001f 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
> > > 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004
> > >
> > > // leaf 0xb
> > > 0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000004
> > > 0x0000000b 0x01: eax=0x00000007 ebx=0x00000080 ecx=0x00000201 edx=0x00000004
> > > 0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000004
> >
> > The 0x1f is introduced for CascadeLake-AP with die level. And yes the
> > newer mahcines all have this leaf.
> >
> > >
> > > So here leads to different cpu behavior from bare metal, even in case
> > > of "-cpu host".
> > >
> > > In SDM Vol2, cpudid instruction section:
> > >
> > > " CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel
> > > recommends using leaf 1FH when available rather than leaf
> > > 0BH and ensuring that any leaf 0BH algorithms are updated to
> > > support leaf 1FH. "
> > >
> > > My understanding: if 0x1f is existed (leaf 0.eax >= 0x1f)
> > > then it should have same values in lp/core level as 0xb.
>
> No. leaf 0x1f reports the same values in lp/core leve as leaf 0xb only when
> the machine supports these two levels. If the machine supports more levels,
> they will be different.
>
> e.g., the data on one Alder lake:
>
> 0x0000000b 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x00000006
> 0x0000000b 0x01: eax=0x00000007 ebx=0x00000004 ecx=0x00000201 edx=0x00000006
> 0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000006
>
> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x00000006
> 0x0000001f 0x01: eax=0x00000003 ebx=0x00000004 ecx=0x00000201 edx=0x00000006
> 0x0000001f 0x02: eax=0x00000007 ebx=0x00000004 ecx=0x00000302 edx=0x00000006
> 0x0000001f 0x03: eax=0x00000000 ebx=0x00000000 ecx=0x00000003 edx=0x00000006

Ah, so my understanding is incorrect on this.

I tried on one raptor lake i5-i335U, which also hybrid soc but doesn't have
module level, in this case 0x1f and 0xb have same values in core/lp level.

>
>
> > Yes, I think it's time to move to default 0x1f.
>
> we don't need to do so until it's necessary.
>
> > The compatibility issue can be solved by a cpuid-0x1f option similar to
> > cpuid-0xb. I'll cook a patch after this patch series.
> >
> > Thanks,
> > Zhao
> >
> > >
> > > >               *eax = *ebx = *ecx = *edx = 0;
> > > >               break;
> > > >           }
> > > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > > > index eecd30bde92b..97b290e10576 100644
> > > > --- a/target/i386/cpu.h
> > > > +++ b/target/i386/cpu.h
> > > > @@ -1018,6 +1018,7 @@ enum CPUTopoLevel {
> > > >       CPU_TOPO_LEVEL_INVALID,
> > > >       CPU_TOPO_LEVEL_SMT,
> > > >       CPU_TOPO_LEVEL_CORE,
> > > > +    CPU_TOPO_LEVEL_MODULE,
> > > >       CPU_TOPO_LEVEL_DIE,
> > > >       CPU_TOPO_LEVEL_PACKAGE,
> > > >       CPU_TOPO_LEVEL_MAX,
> > > > @@ -1032,6 +1033,7 @@ enum CPUTopoLevel {
> > > >   #define CPUID_1F_ECX_TOPO_LEVEL_INVALID  CPUID_B_ECX_TOPO_LEVEL_INVALID
> > > >   #define CPUID_1F_ECX_TOPO_LEVEL_SMT      CPUID_B_ECX_TOPO_LEVEL_SMT
> > > >   #define CPUID_1F_ECX_TOPO_LEVEL_CORE     CPUID_B_ECX_TOPO_LEVEL_CORE
> > > > +#define CPUID_1F_ECX_TOPO_LEVEL_MODULE   3
> > > >   #define CPUID_1F_ECX_TOPO_LEVEL_DIE      5
> > > >
> > > >   /* MSR Feature Bits */
> > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > index 4ce80555b45c..e5ddb214cb36 100644
> > > > --- a/target/i386/kvm/kvm.c
> > > > +++ b/target/i386/kvm/kvm.c
> > > > @@ -1913,7 +1913,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
> > > >               break;
> > > >           }
> > > >           case 0x1f:
> > > > -            if (env->nr_dies < 2) {
> > > > +            if (env->nr_modules < 2 && env->nr_dies < 2) {
> > > >                   break;
> > > >               }
> > > >               /* fallthrough */
> > > > --
> > > > 2.34.1
> > > >
> > > >
> >
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-15  4:18       ` Xiaoyao Li
@ 2024-01-15  5:59         ` Zhao Liu
  2024-01-15  7:45           ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  5:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daudé, Yanan Wang

(Also cc "machine core" maintainers.)

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 12:18:17 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> 
> On 1/15/2024 11:27 AM, Zhao Liu wrote:
> > On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
> > > Date: Sun, 14 Jan 2024 21:49:18 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > 
> > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > > > 
> > > > Introduce cluster-id other than module-id to be consistent with
> > > > CpuInstanceProperties.cluster-id, and this avoids the confusion
> > > > of parameter names when hotplugging.
> > > 
> > > I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
> > > It introduces confusion around the code.
> > 
> > There is a precedent: generic "socket" v.s. i386 "package".
> 
> It's not the same thing. "socket" vs "package" is just software people and
> hardware people chose different name. It's just different naming issue.

No, it's a similar issue. Same physical device, different name only.

Furthermore, the topology was introduced for resource layout and silicon
fabrication, and similar design ideas and fabrication processes are fairly
consistent across common current arches. Therefore, it is possible to
abstract similar topological hierarchies for different arches.

> 
> however, here it's reusing name issue while 'cluster' has been defined for
> x86. It does introduce confusion.

There's nothing fundamentally different between the x86 module and the
generic cluster, is there? This is the reason that I don't agree with
introducing "modules" in -smp.

> 
> > The direct definition of cluster is the level that is above the "core"
> > and shares the hardware resources including L2. In this sense, arm's
> > cluster is the same as x86's module.
> 
> then, what about intel implements tile level in the future? why ARM's
> 'cluster' is mapped to 'module', but not 'tile' ?

This depends on the actual need.

Module (for x86) and cluster (in general) are similar, and tile (for x86)
is used for L3 in practice, so I use module rather than tile to map
generic clusters.

And, it should be noted that x86 module is mapped to the generic cluster,
not to ARM's. It's just that currently only ARM is using the clusters
option in -smp.

I believe QEMU provides the abstract and unified topology hierarchies in
-smp, not the arch-specific hierarchies.

> 
> reusing 'cluster' for 'module' is just a bad idea.
> 
> > Though different arches have different naming styles, but QEMU's generic
> > code still need the uniform topology hierarchy.
> 
> generic code can provide as many topology levels as it can. each ARCH can
> choose to use the ones it supports.
> 
> e.g.,
> 
> in qapi/machine.json, it says,
> 
> # The ordering from highest/coarsest to lowest/finest is:
> # @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.

This ordering is well-defined...

> #
> # Different architectures support different subsets of topology
> # containers.
> #
> # For example, s390x does not have clusters and dies, and the socket
> # is the parent container of cores.
> 
> we can update it to
> 
> # The ordering from highest/coarsest to lowest/finest is:
> # @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
> # @threads.

...but here it's impossible to figure out why cluster is above module,
and even I can't come up with the difference between cluster and module.

> #
> # Different architectures support different subsets of topology
> # containers.
> #
> # For example, s390x does not have clusters and dies, and the socket
> # is the parent container of cores.
> #
> # For example, x86 does not have drawers and books, and does not support
> # cluster.
> 
> even if cluster of x86 is supported someday in the future, we can remove the
> ordering requirement from above description.

x86's cluster is above the package.

To reserve this name for x86, we can't have the well-defined topology
ordering.

But topology ordering is necessary in generic code, and many
calculations depend on the topology ordering.

> 
> > > 
> > > s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
> > > add a module level for x86 instead of reusing cluster.
> > > 
> > > (This is also what I want to reply to the cover letter.)
> > > 
> > > [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
> > 
> > These two new levels have the clear topological hierarchy relationship
> > and don't duplicate existing ones.
> > 
> > "book" or "drawer" may correspond to intel's "cluster".
> > 
> > Maybe, in the future, we could support for arch-specific alias topologies
> > in -smp.
> 
> I don't think we need alias, reusing 'cluster' for 'module' doesn't gain any
> benefit except avoid adding a new field in SMPconfiguration. All the other
> cluster code is ARM specific and x86 cannot share.

The point is that there is no difference between intel module and general
cluster...Considering only the naming issue, even AMD has the "complex" to
correspond to the Intel's "module".

> 
> I don't think it's a problem to add 'module' to SMPconfiguration.

Adding an option is simple, but however, it is not conducive to the
topology maintenance of QEMU, reusing the existing generic structure
should be the first consideration except when the new level is
fundamentally different.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  6:12         ` Zhao Liu
@ 2024-01-15  6:11           ` Xiaoyao Li
  2024-01-15  6:35             ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  6:11 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Yuan Yao, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 2:12 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
>> Date: Mon, 15 Jan 2024 12:34:12 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
>>
>>> Yes, I think it's time to move to default 0x1f.
>>
>> we don't need to do so until it's necessary.
> 
> Recent and future machines all support 0x1f, and at least SDM has
> emphasized the preferred use of 0x1f.

The preference is the guideline for software e.g., OS. QEMU doesn't need 
to emulate cpuid leaf 0x1f to guest if there is only smt and core level. 
because in this case, they are exactly the same in leaf 0xb and 0x1f. we 
don't need to bother advertising the duplicate data.

> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  4:34       ` Xiaoyao Li
  2024-01-15  5:20         ` Yuan Yao
@ 2024-01-15  6:12         ` Zhao Liu
  2024-01-15  6:11           ` Xiaoyao Li
  1 sibling, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  6:12 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Yuan Yao, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 12:34:12 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> > Yes, I think it's time to move to default 0x1f.
> 
> we don't need to do so until it's necessary.

Recent and future machines all support 0x1f, and at least SDM has
emphasized the preferred use of 0x1f.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  5:20         ` Yuan Yao
@ 2024-01-15  6:20           ` Zhao Liu
  2024-01-15  6:57             ` Yuan Yao
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  6:20 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Xiaoyao Li, Eduardo Habkost, Marcel Apfelbaum,
	Michael S . Tsirkin, Richard Henderson, Paolo Bonzini,
	Marcelo Tosatti, qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding,
	Zhao Liu, Babu Moger, Yongwei Ma

On Mon, Jan 15, 2024 at 01:20:22PM +0800, Yuan Yao wrote:
> Date: Mon, 15 Jan 2024 13:20:22 +0800
> From: Yuan Yao <yuan.yao@linux.intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> Ah, so my understanding is incorrect on this.
> 
> I tried on one raptor lake i5-i335U, which also hybrid soc but doesn't have
> module level, in this case 0x1f and 0xb have same values in core/lp level.

Some socs have modules/dies but they don't expose them in 0x1f.

If the soc only expose thread/core levels in 0x1f, then its 0x1f is same
as 0x0b. Otherwise, it will have more subleaves and different
values.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-15  4:25       ` Xiaoyao Li
@ 2024-01-15  6:25         ` Zhao Liu
  2024-01-15  7:00           ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  6:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 12:25:19PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 12:25:19 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode
>  CPUID[4]
> 
> On 1/15/2024 11:40 AM, Zhao Liu wrote:
> > > > +{
> > > > +    uint32_t num_ids = 0;
> > > > +
> > > > +    switch (share_level) {
> > > > +    case CPU_TOPO_LEVEL_CORE:
> > > > +        num_ids = 1 << apicid_core_offset(topo_info);
> > > > +        break;
> > > > +    case CPU_TOPO_LEVEL_DIE:
> > > > +        num_ids = 1 << apicid_die_offset(topo_info);
> > > > +        break;
> > > > +    case CPU_TOPO_LEVEL_PACKAGE:
> > > > +        num_ids = 1 << apicid_pkg_offset(topo_info);
> > > > +        break;
> > > > +    default:
> > > > +        /*
> > > > +         * Currently there is no use case for SMT and MODULE, so use
> > > > +         * assert directly to facilitate debugging.
> > > > +         */
> > > > +        g_assert_not_reached();
> > > > +    }
> > > > +
> > > > +    return num_ids - 1;
> > > suggest to just return num_ids, and let the caller to do the -1 work.
> > Emm, SDM calls the whole "num_ids - 1" (CPUID.0x4.EAX[bits 14-25]) as
> > "maximum number of addressable IDs for logical processors sharing this
> > cache"...
> > 
> > So if this helper just names "num_ids" as max_lp_ids_share_the_cache,
> > I'm not sure there would be ambiguity here?
> 
> I don't think it will.
> 
> if this function is going to used anywhere else, people will need to keep in
> mind to do +1 stuff to get the actual number.
> 
> leaving the -1 trick to where CPUID value gets encoded. let's make this
> function generic.

This helper is the complete pattern to get addressable IDs, this is to
say, the "- 1" is also the part of this calculation.

Its own meaning is self-consistent and generic enough to meet the common
definitions of AMD and Intel.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  6:11           ` Xiaoyao Li
@ 2024-01-15  6:35             ` Zhao Liu
  2024-01-15  7:16               ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  6:35 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Yuan Yao, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On Mon, Jan 15, 2024 at 02:11:17PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 14:11:17 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> On 1/15/2024 2:12 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
> > > Date: Mon, 15 Jan 2024 12:34:12 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > > 
> > > > Yes, I think it's time to move to default 0x1f.
> > > 
> > > we don't need to do so until it's necessary.
> > 
> > Recent and future machines all support 0x1f, and at least SDM has
> > emphasized the preferred use of 0x1f.
> 
> The preference is the guideline for software e.g., OS. QEMU doesn't need to
> emulate cpuid leaf 0x1f to guest if there is only smt and core level.

Please, QEMU is emulating hardware not writing software. Is there any
reason why we shouldn't emulate new and generic hardware behaviors and
stick with the old ones?

> because in this case, they are exactly the same in leaf 0xb and 0x1f. we don't
> need to bother advertising the duplicate data.

You can't "define" the same 0x0b and 0x1f as duplicates. SDM doesn't
have such the definition.

Regards,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  6:20           ` Zhao Liu
@ 2024-01-15  6:57             ` Yuan Yao
  2024-01-15  7:20               ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Yuan Yao @ 2024-01-15  6:57 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Xiaoyao Li, Eduardo Habkost, Marcel Apfelbaum,
	Michael S . Tsirkin, Richard Henderson, Paolo Bonzini,
	Marcelo Tosatti, qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding,
	Zhao Liu, Babu Moger, Yongwei Ma

On Mon, Jan 15, 2024 at 02:20:20PM +0800, Zhao Liu wrote:
> On Mon, Jan 15, 2024 at 01:20:22PM +0800, Yuan Yao wrote:
> > Date: Mon, 15 Jan 2024 13:20:22 +0800
> > From: Yuan Yao <yuan.yao@linux.intel.com>
> > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> >
> > Ah, so my understanding is incorrect on this.
> >
> > I tried on one raptor lake i5-i335U, which also hybrid soc but doesn't have
> > module level, in this case 0x1f and 0xb have same values in core/lp level.
>
> Some socs have modules/dies but they don't expose them in 0x1f.

Here they don't expose because from hardware level they can't or possible
software level configuration (i.e. disable some cores in bios) ?

>
> If the soc only expose thread/core levels in 0x1f, then its 0x1f is same
> as 0x0b. Otherwise, it will have more subleaves and different
> values.
>
> Thanks,
> Zhao
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-15  6:25         ` Zhao Liu
@ 2024-01-15  7:00           ` Xiaoyao Li
  2024-01-15 14:55             ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  7:00 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 2:25 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Mon, Jan 15, 2024 at 12:25:19PM +0800, Xiaoyao Li wrote:
>> Date: Mon, 15 Jan 2024 12:25:19 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode
>>   CPUID[4]
>>
>> On 1/15/2024 11:40 AM, Zhao Liu wrote:
>>>>> +{
>>>>> +    uint32_t num_ids = 0;
>>>>> +
>>>>> +    switch (share_level) {
>>>>> +    case CPU_TOPO_LEVEL_CORE:
>>>>> +        num_ids = 1 << apicid_core_offset(topo_info);
>>>>> +        break;
>>>>> +    case CPU_TOPO_LEVEL_DIE:
>>>>> +        num_ids = 1 << apicid_die_offset(topo_info);
>>>>> +        break;
>>>>> +    case CPU_TOPO_LEVEL_PACKAGE:
>>>>> +        num_ids = 1 << apicid_pkg_offset(topo_info);
>>>>> +        break;
>>>>> +    default:
>>>>> +        /*
>>>>> +         * Currently there is no use case for SMT and MODULE, so use
>>>>> +         * assert directly to facilitate debugging.
>>>>> +         */
>>>>> +        g_assert_not_reached();
>>>>> +    }
>>>>> +
>>>>> +    return num_ids - 1;
>>>> suggest to just return num_ids, and let the caller to do the -1 work.
>>> Emm, SDM calls the whole "num_ids - 1" (CPUID.0x4.EAX[bits 14-25]) as
>>> "maximum number of addressable IDs for logical processors sharing this
>>> cache"...
>>>
>>> So if this helper just names "num_ids" as max_lp_ids_share_the_cache,
>>> I'm not sure there would be ambiguity here?
>>
>> I don't think it will.
>>
>> if this function is going to used anywhere else, people will need to keep in
>> mind to do +1 stuff to get the actual number.
>>
>> leaving the -1 trick to where CPUID value gets encoded. let's make this
>> function generic.
> 
> This helper is the complete pattern to get addressable IDs, this is to
> say, the "- 1" is also the part of this calculation.
>
> Its own meaning is self-consistent and generic enough to meet the common
> definitions of AMD and Intel.

OK. I stop bikeshedding on it.

> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  6:35             ` Zhao Liu
@ 2024-01-15  7:16               ` Xiaoyao Li
  2024-01-15 15:46                 ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  7:16 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Yuan Yao, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

On 1/15/2024 2:35 PM, Zhao Liu wrote:
> On Mon, Jan 15, 2024 at 02:11:17PM +0800, Xiaoyao Li wrote:
>> Date: Mon, 15 Jan 2024 14:11:17 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
>>
>> On 1/15/2024 2:12 PM, Zhao Liu wrote:
>>> Hi Xiaoyao,
>>>
>>> On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
>>>> Date: Mon, 15 Jan 2024 12:34:12 +0800
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
>>>>
>>>>> Yes, I think it's time to move to default 0x1f.
>>>>
>>>> we don't need to do so until it's necessary.
>>>
>>> Recent and future machines all support 0x1f, and at least SDM has
>>> emphasized the preferred use of 0x1f.
>>
>> The preference is the guideline for software e.g., OS. QEMU doesn't need to
>> emulate cpuid leaf 0x1f to guest if there is only smt and core level.
> 
> Please, QEMU is emulating hardware not writing software.

what I want to conveyed was that, SDM is teaching software how to probe 
the cpu topology, not suggesting VMM how to advertise cpu topology to 
guest.


> Is there any
> reason why we shouldn't emulate new and generic hardware behaviors and
> stick with the old ones?

I didn't say we shouldn't, but we don't need to do it if it's unnecessary.

if cpuid 0x1f is advertised to guest by default, it will also introduce 
the inconsistence. Old product doesn't have cpuid 0x1f, but using QEMU 
to emualte an old product, it has.

sure we can have code to fix it, that only expose 0x1f to new enough cpu 
model. But it just make thing complicated.

>> because in this case, they are exactly the same in leaf 0xb and 0x1f. we don't
>> need to bother advertising the duplicate data.
> 
> You can't "define" the same 0x0b and 0x1f as duplicates. SDM doesn't
> have such the definition.

for QEMU, they are duplicate data that need to be maintained and need to 
be passed to KVM by KVM_SET_CPUID. For guest, it's also unnecessary, 
because it doesn't provide any additional information with cpuid leaf 1f.

SDM keeps cpuid 0xb is for backwards compatibility.

> Regards,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  6:57             ` Yuan Yao
@ 2024-01-15  7:20               ` Zhao Liu
  2024-01-15  9:03                 ` Yuan Yao
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15  7:20 UTC (permalink / raw)
  To: Yuan Yao
  Cc: Xiaoyao Li, Eduardo Habkost, Marcel Apfelbaum,
	Michael S . Tsirkin, Richard Henderson, Paolo Bonzini,
	Marcelo Tosatti, qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding,
	Zhao Liu, Babu Moger, Yongwei Ma

On Mon, Jan 15, 2024 at 02:57:30PM +0800, Yuan Yao wrote:
> Date: Mon, 15 Jan 2024 14:57:30 +0800
> From: Yuan Yao <yuan.yao@linux.intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> On Mon, Jan 15, 2024 at 02:20:20PM +0800, Zhao Liu wrote:
> > On Mon, Jan 15, 2024 at 01:20:22PM +0800, Yuan Yao wrote:
> > > Date: Mon, 15 Jan 2024 13:20:22 +0800
> > > From: Yuan Yao <yuan.yao@linux.intel.com>
> > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > >
> > > Ah, so my understanding is incorrect on this.
> > >
> > > I tried on one raptor lake i5-i335U, which also hybrid soc but doesn't have
> > > module level, in this case 0x1f and 0xb have same values in core/lp level.
> >
> > Some socs have modules/dies but they don't expose them in 0x1f.
> 
> Here they don't expose because from hardware level they can't or possible
> software level configuration (i.e. disable some cores in bios) ?
>

This leaf is decided at hardware level. Whether or not which levels are exposed
sometimes depends if there is the topology-related feature, but there is no clear
rule (just as in the ADL family neither ADL-S/P exposes modules, while ADL-N
exposes modules).

Regards,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-15  5:59         ` Zhao Liu
@ 2024-01-15  7:45           ` Xiaoyao Li
  2024-01-15 15:18             ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-15  7:45 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daudé, Yanan Wang

On 1/15/2024 1:59 PM, Zhao Liu wrote:
> (Also cc "machine core" maintainers.)
> 
> Hi Xiaoyao,
> 
> On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:
>> Date: Mon, 15 Jan 2024 12:18:17 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>
>> On 1/15/2024 11:27 AM, Zhao Liu wrote:
>>> On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
>>>> Date: Sun, 14 Jan 2024 21:49:18 +0800
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>>>
>>>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>>>> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
>>>>>
>>>>> Introduce cluster-id other than module-id to be consistent with
>>>>> CpuInstanceProperties.cluster-id, and this avoids the confusion
>>>>> of parameter names when hotplugging.
>>>>
>>>> I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
>>>> It introduces confusion around the code.
>>>
>>> There is a precedent: generic "socket" v.s. i386 "package".
>>
>> It's not the same thing. "socket" vs "package" is just software people and
>> hardware people chose different name. It's just different naming issue.
> 
> No, it's a similar issue. Same physical device, different name only.
> 
> Furthermore, the topology was introduced for resource layout and silicon
> fabrication, and similar design ideas and fabrication processes are fairly
> consistent across common current arches. Therefore, it is possible to
> abstract similar topological hierarchies for different arches.
> 
>>
>> however, here it's reusing name issue while 'cluster' has been defined for
>> x86. It does introduce confusion.
> 
> There's nothing fundamentally different between the x86 module and the
> generic cluster, is there? This is the reason that I don't agree with
> introducing "modules" in -smp.

generic cluster just means the cluster of processors, i.e, a group of 
cpus/lps. It is just a middle level between die and core.

It can be the module level in intel, or tile level. Further, if per die 
lp number increases in the future, there might be more middle levels in 
intel between die and core. Then at that time, how to decide what level 
should cluster be mapped to?

>>
>>> The direct definition of cluster is the level that is above the "core"
>>> and shares the hardware resources including L2. In this sense, arm's
>>> cluster is the same as x86's module.
>>
>> then, what about intel implements tile level in the future? why ARM's
>> 'cluster' is mapped to 'module', but not 'tile' ?
> 
> This depends on the actual need.
> 
> Module (for x86) and cluster (in general) are similar, and tile (for x86)
> is used for L3 in practice, so I use module rather than tile to map
> generic cluster.
 >
> And, it should be noted that x86 module is mapped to the generic cluster,
> not to ARM's. It's just that currently only ARM is using the clusters
> option in -smp.
> 
> I believe QEMU provides the abstract and unified topology hierarchies in
> -smp, not the arch-specific hierarchies.
> 
>>
>> reusing 'cluster' for 'module' is just a bad idea.
>>
>>> Though different arches have different naming styles, but QEMU's generic
>>> code still need the uniform topology hierarchy.
>>
>> generic code can provide as many topology levels as it can. each ARCH can
>> choose to use the ones it supports.
>>
>> e.g.,
>>
>> in qapi/machine.json, it says,
>>
>> # The ordering from highest/coarsest to lowest/finest is:
>> # @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
> 
> This ordering is well-defined...
> 
>> #
>> # Different architectures support different subsets of topology
>> # containers.
>> #
>> # For example, s390x does not have clusters and dies, and the socket
>> # is the parent container of cores.
>>
>> we can update it to
>>
>> # The ordering from highest/coarsest to lowest/finest is:
>> # @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
>> # @threads.
> 
> ...but here it's impossible to figure out why cluster is above module,
> and even I can't come up with the difference between cluster and module.
> 
>> #
>> # Different architectures support different subsets of topology
>> # containers.
>> #
>> # For example, s390x does not have clusters and dies, and the socket
>> # is the parent container of cores.
>> #
>> # For example, x86 does not have drawers and books, and does not support
>> # cluster.
>>
>> even if cluster of x86 is supported someday in the future, we can remove the
>> ordering requirement from above description.
> 
> x86's cluster is above the package.
> 
> To reserve this name for x86, we can't have the well-defined topology
> ordering.
> 
> But topology ordering is necessary in generic code, and many
> calculations depend on the topology ordering.

could you point me to the code?

>>
>>>>
>>>> s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
>>>> add a module level for x86 instead of reusing cluster.
>>>>
>>>> (This is also what I want to reply to the cover letter.)
>>>>
>>>> [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
>>>
>>> These two new levels have the clear topological hierarchy relationship
>>> and don't duplicate existing ones.
>>>
>>> "book" or "drawer" may correspond to intel's "cluster".
>>>
>>> Maybe, in the future, we could support for arch-specific alias topologies
>>> in -smp.
>>
>> I don't think we need alias, reusing 'cluster' for 'module' doesn't gain any
>> benefit except avoid adding a new field in SMPconfiguration. All the other
>> cluster code is ARM specific and x86 cannot share.
> 
> The point is that there is no difference between intel module and general
> cluster...Considering only the naming issue, even AMD has the "complex" to
> correspond to the Intel's "module".

does complex of AMD really match with intel module? L3 cache is shared 
in one complex, while L2 cache is shared in one module for now.

>>
>> I don't think it's a problem to add 'module' to SMPconfiguration.
> 
> Adding an option is simple, but however, it is not conducive to the
> topology maintenance of QEMU, reusing the existing generic structure
> should be the first consideration except when the new level is
> fundamentally different.
> 
> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  7:20               ` Zhao Liu
@ 2024-01-15  9:03                 ` Yuan Yao
  0 siblings, 0 replies; 68+ messages in thread
From: Yuan Yao @ 2024-01-15  9:03 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Xiaoyao Li, Eduardo Habkost, Marcel Apfelbaum,
	Michael S . Tsirkin, Richard Henderson, Paolo Bonzini,
	Marcelo Tosatti, qemu-devel, kvm, Zhenyu Wang, Zhuocheng Ding,
	Zhao Liu, Babu Moger, Yongwei Ma

On Mon, Jan 15, 2024 at 03:20:37PM +0800, Zhao Liu wrote:
> On Mon, Jan 15, 2024 at 02:57:30PM +0800, Yuan Yao wrote:
> > Date: Mon, 15 Jan 2024 14:57:30 +0800
> > From: Yuan Yao <yuan.yao@linux.intel.com>
> > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> >
> > On Mon, Jan 15, 2024 at 02:20:20PM +0800, Zhao Liu wrote:
> > > On Mon, Jan 15, 2024 at 01:20:22PM +0800, Yuan Yao wrote:
> > > > Date: Mon, 15 Jan 2024 13:20:22 +0800
> > > > From: Yuan Yao <yuan.yao@linux.intel.com>
> > > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > > >
> > > > Ah, so my understanding is incorrect on this.
> > > >
> > > > I tried on one raptor lake i5-i335U, which also hybrid soc but doesn't have
> > > > module level, in this case 0x1f and 0xb have same values in core/lp level.
> > >
> > > Some socs have modules/dies but they don't expose them in 0x1f.
> >
> > Here they don't expose because from hardware level they can't or possible
> > software level configuration (i.e. disable some cores in bios) ?
> >
>
> This leaf is decided at hardware level. Whether or not which levels are exposed
> sometimes depends if there is the topology-related feature, but there is no clear
> rule (just as in the ADL family neither ADL-S/P exposes modules, while ADL-N
> exposes modules).

I see, thanks for your information!

>
> Regards,
> Zhao
>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14]
  2024-01-15  4:27       ` Xiaoyao Li
@ 2024-01-15 14:54         ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15 14:54 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 12:27:43PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 12:27:43 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for
>  CPUID[0x8000001D].EAX[bits 25:14]
> 
> On 1/15/2024 11:48 AM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Sun, Jan 14, 2024 at 10:42:41PM +0800, Xiaoyao Li wrote:
> > > Date: Sun, 14 Jan 2024 22:42:41 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 15/16] i386: Use offsets get NumSharingCache for
> > >   CPUID[0x8000001D].EAX[bits 25:14]
> > > 
> > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > > 
> > > > The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information
> > > > for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
> > > > the number of sharing threads directly.
> > > > 
> > > >   From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
> > > > means [1]:
> > > > 
> > > > The number of logical processors sharing this cache is the value of
> > > > this field incremented by 1. To determine which logical processors are
> > > > sharing a cache, determine a Share Id for each processor as follows:
> > > > 
> > > > ShareId = LocalApicId >> log2(NumSharingCache+1)
> > > > 
> > > > Logical processors with the same ShareId then share a cache. If
> > > > NumSharingCache+1 is not a power of two, round it up to the next power
> > > > of two.
> > > > 
> > > >   From the description above, the calculation of this field should be same
> > > > as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
> > > > APIC ID to calculate this field.
> > > > 
> > > > [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
> > > >        Information
> > > 
> > > this patch can be dropped because we have next patch.
> > 
> > This patch is mainly used to explicitly emphasize the change in encoding
> > way and compliance with AMD spec... I didn't tested on AMD machine, so
> > the more granular patch would make it easier for the community to review
> > and test.
> 
> then please move this patch ahead, e.g., after patch 2.
>

OK. Thanks!
-Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4]
  2024-01-15  7:00           ` Xiaoyao Li
@ 2024-01-15 14:55             ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15 14:55 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 03:00:25PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 15:00:25 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode
>  CPUID[4]
> 
> On 1/15/2024 2:25 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Mon, Jan 15, 2024 at 12:25:19PM +0800, Xiaoyao Li wrote:
> > > Date: Mon, 15 Jan 2024 12:25:19 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode
> > >   CPUID[4]
> > > 
> > > On 1/15/2024 11:40 AM, Zhao Liu wrote:
> > > > > > +{
> > > > > > +    uint32_t num_ids = 0;
> > > > > > +
> > > > > > +    switch (share_level) {
> > > > > > +    case CPU_TOPO_LEVEL_CORE:
> > > > > > +        num_ids = 1 << apicid_core_offset(topo_info);
> > > > > > +        break;
> > > > > > +    case CPU_TOPO_LEVEL_DIE:
> > > > > > +        num_ids = 1 << apicid_die_offset(topo_info);
> > > > > > +        break;
> > > > > > +    case CPU_TOPO_LEVEL_PACKAGE:
> > > > > > +        num_ids = 1 << apicid_pkg_offset(topo_info);
> > > > > > +        break;
> > > > > > +    default:
> > > > > > +        /*
> > > > > > +         * Currently there is no use case for SMT and MODULE, so use
> > > > > > +         * assert directly to facilitate debugging.
> > > > > > +         */
> > > > > > +        g_assert_not_reached();
> > > > > > +    }
> > > > > > +
> > > > > > +    return num_ids - 1;
> > > > > suggest to just return num_ids, and let the caller to do the -1 work.
> > > > Emm, SDM calls the whole "num_ids - 1" (CPUID.0x4.EAX[bits 14-25]) as
> > > > "maximum number of addressable IDs for logical processors sharing this
> > > > cache"...
> > > > 
> > > > So if this helper just names "num_ids" as max_lp_ids_share_the_cache,
> > > > I'm not sure there would be ambiguity here?
> > > 
> > > I don't think it will.
> > > 
> > > if this function is going to used anywhere else, people will need to keep in
> > > mind to do +1 stuff to get the actual number.
> > > 
> > > leaving the -1 trick to where CPUID value gets encoded. let's make this
> > > function generic.
> > 
> > This helper is the complete pattern to get addressable IDs, this is to
> > say, the "- 1" is also the part of this calculation.
> > 
> > Its own meaning is self-consistent and generic enough to meet the common
> > definitions of AMD and Intel.
> 
> OK. I stop bikeshedding on it.
>

Thanks for your review ;-).

Regards,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-15  7:45           ` Xiaoyao Li
@ 2024-01-15 15:18             ` Zhao Liu
  2024-01-16 16:40               ` Xiaoyao Li
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-15 15:18 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daudé, Yanan Wang

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 03:45:58PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 15:45:58 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> 
> On 1/15/2024 1:59 PM, Zhao Liu wrote:
> > (Also cc "machine core" maintainers.)
> > 
> > Hi Xiaoyao,
> > 
> > On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:
> > > Date: Mon, 15 Jan 2024 12:18:17 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > 
> > > On 1/15/2024 11:27 AM, Zhao Liu wrote:
> > > > On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
> > > > > Date: Sun, 14 Jan 2024 21:49:18 +0800
> > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > > > 
> > > > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > > > From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > > > > > 
> > > > > > Introduce cluster-id other than module-id to be consistent with
> > > > > > CpuInstanceProperties.cluster-id, and this avoids the confusion
> > > > > > of parameter names when hotplugging.
> > > > > 
> > > > > I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
> > > > > It introduces confusion around the code.
> > > > 
> > > > There is a precedent: generic "socket" v.s. i386 "package".
> > > 
> > > It's not the same thing. "socket" vs "package" is just software people and
> > > hardware people chose different name. It's just different naming issue.
> > 
> > No, it's a similar issue. Same physical device, different name only.
> > 
> > Furthermore, the topology was introduced for resource layout and silicon
> > fabrication, and similar design ideas and fabrication processes are fairly
> > consistent across common current arches. Therefore, it is possible to
> > abstract similar topological hierarchies for different arches.
> > 
> > > 
> > > however, here it's reusing name issue while 'cluster' has been defined for
> > > x86. It does introduce confusion.
> > 
> > There's nothing fundamentally different between the x86 module and the
> > generic cluster, is there? This is the reason that I don't agree with
> > introducing "modules" in -smp.
> 
> generic cluster just means the cluster of processors, i.e, a group of
> cpus/lps. It is just a middle level between die and core.

Not sure if you mean the "cluster" device for TCG GDB? "cluster" device
is different with "cluster" option in -smp.

When Yanan introduced the "cluster" option in -smp, he mentioned that it
is for sharing L2 and L3 tags, which roughly corresponds to our module.

> 
> It can be the module level in intel, or tile level. Further, if per die lp
> number increases in the future, there might be more middle levels in intel
> between die and core. Then at that time, how to decide what level should
> cluster be mapped to?

Currently, there're 3 levels defined in SDM which are between die and
core: diegrp, tile and module. In our products, L2 is just sharing on the
module, so the intel's module and the general cluster are the best match.

There are no commercially available machines for the other levels yet,
so there's no way to ensure exactly what the future holds, but we should
try to avoid fragmentation of the topology hierarchy and try to maintain
the uniform and common topology hierarchies for QEMU.

Unless a new level for -smp is introduced in the future when an unsolvable
problem is raised.

> 
> > > 
> > > > The direct definition of cluster is the level that is above the "core"
> > > > and shares the hardware resources including L2. In this sense, arm's
> > > > cluster is the same as x86's module.
> > > 
> > > then, what about intel implements tile level in the future? why ARM's
> > > 'cluster' is mapped to 'module', but not 'tile' ?
> > 
> > This depends on the actual need.
> > 
> > Module (for x86) and cluster (in general) are similar, and tile (for x86)
> > is used for L3 in practice, so I use module rather than tile to map
> > generic cluster.
> >
> > And, it should be noted that x86 module is mapped to the generic cluster,
> > not to ARM's. It's just that currently only ARM is using the clusters
> > option in -smp.
> > 
> > I believe QEMU provides the abstract and unified topology hierarchies in
> > -smp, not the arch-specific hierarchies.
> > 
> > > 
> > > reusing 'cluster' for 'module' is just a bad idea.
> > > 
> > > > Though different arches have different naming styles, but QEMU's generic
> > > > code still need the uniform topology hierarchy.
> > > 
> > > generic code can provide as many topology levels as it can. each ARCH can
> > > choose to use the ones it supports.
> > > 
> > > e.g.,
> > > 
> > > in qapi/machine.json, it says,
> > > 
> > > # The ordering from highest/coarsest to lowest/finest is:
> > > # @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
> > 
> > This ordering is well-defined...
> > 
> > > #
> > > # Different architectures support different subsets of topology
> > > # containers.
> > > #
> > > # For example, s390x does not have clusters and dies, and the socket
> > > # is the parent container of cores.
> > > 
> > > we can update it to
> > > 
> > > # The ordering from highest/coarsest to lowest/finest is:
> > > # @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
> > > # @threads.
> > 
> > ...but here it's impossible to figure out why cluster is above module,
> > and even I can't come up with the difference between cluster and module.
> > 
> > > #
> > > # Different architectures support different subsets of topology
> > > # containers.
> > > #
> > > # For example, s390x does not have clusters and dies, and the socket
> > > # is the parent container of cores.
> > > #
> > > # For example, x86 does not have drawers and books, and does not support
> > > # cluster.
> > > 
> > > even if cluster of x86 is supported someday in the future, we can remove the
> > > ordering requirement from above description.
> > 
> > x86's cluster is above the package.
> > 
> > To reserve this name for x86, we can't have the well-defined topology
> > ordering.
> > 
> > But topology ordering is necessary in generic code, and many
> > calculations depend on the topology ordering.
> 
> could you point me to the code?

Yes, e.g., there're 2 helpers: machine_topo_get_cores_per_socket() and
machine_topo_get_threads_per_socket().

> 
> > > 
> > > > > 
> > > > > s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
> > > > > add a module level for x86 instead of reusing cluster.
> > > > > 
> > > > > (This is also what I want to reply to the cover letter.)
> > > > > 
> > > > > [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
> > > > 
> > > > These two new levels have the clear topological hierarchy relationship
> > > > and don't duplicate existing ones.
> > > > 
> > > > "book" or "drawer" may correspond to intel's "cluster".
> > > > 
> > > > Maybe, in the future, we could support for arch-specific alias topologies
> > > > in -smp.
> > > 
> > > I don't think we need alias, reusing 'cluster' for 'module' doesn't gain any
> > > benefit except avoid adding a new field in SMPconfiguration. All the other
> > > cluster code is ARM specific and x86 cannot share.
> > 
> > The point is that there is no difference between intel module and general
> > cluster...Considering only the naming issue, even AMD has the "complex" to
> > correspond to the Intel's "module".
> 
> does complex of AMD really match with intel module? L3 cache is shared in
> one complex, while L2 cache is shared in one module for now.

If then it could correspond to intel's tile, which is after all a level
below die.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
  2024-01-15  7:16               ` Xiaoyao Li
@ 2024-01-15 15:46                 ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-15 15:46 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Yuan Yao, Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 03:16:43PM +0800, Xiaoyao Li wrote:
> Date: Mon, 15 Jan 2024 15:16:43 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> 
> On 1/15/2024 2:35 PM, Zhao Liu wrote:
> > On Mon, Jan 15, 2024 at 02:11:17PM +0800, Xiaoyao Li wrote:
> > > Date: Mon, 15 Jan 2024 14:11:17 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > > 
> > > On 1/15/2024 2:12 PM, Zhao Liu wrote:
> > > > Hi Xiaoyao,
> > > > 
> > > > On Mon, Jan 15, 2024 at 12:34:12PM +0800, Xiaoyao Li wrote:
> > > > > Date: Mon, 15 Jan 2024 12:34:12 +0800
> > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > Subject: Re: [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F]
> > > > > 
> > > > > > Yes, I think it's time to move to default 0x1f.
> > > > > 
> > > > > we don't need to do so until it's necessary.
> > > > 
> > > > Recent and future machines all support 0x1f, and at least SDM has
> > > > emphasized the preferred use of 0x1f.
> > > 
> > > The preference is the guideline for software e.g., OS. QEMU doesn't need to
> > > emulate cpuid leaf 0x1f to guest if there is only smt and core level.
> > 
> > Please, QEMU is emulating hardware not writing software.
> 
> what I want to conveyed was that, SDM is teaching software how to probe the
> cpu topology, not suggesting VMM how to advertise cpu topology to guest.
> 

This reflects the hardware's behavioral tendency. Additionally, due to SDM
related suggestion about 0x1f preference, lots of new software may rely on
0x1f, making 0x1f as the default enabling leaf helps to enhance Guest
compatibility.

> 
> > Is there any
> > reason why we shouldn't emulate new and generic hardware behaviors and
> > stick with the old ones?
> 
> I didn't say we shouldn't, but we don't need to do it if it's unnecessary.

Probably never going to deprecate 0x0b, and 0x1f is in fact replacing 0x0b,
kind of like a timing issue, when should 0x1f be enabled by default?

Maybe for some new CPU models or -host, we can start making it as default.
This eliminates the difference in the CPU topology enumeration interface
between Host and Guest. What do you think?

> 
> if cpuid 0x1f is advertised to guest by default, it will also introduce the
> inconsistence. Old product doesn't have cpuid 0x1f, but using QEMU to
> emualte an old product, it has.

Yes, this is the similar case as 0x0b. Old machine doens't has 0x0b. And
QEMU uses cpuid-0xb option to resolve compatibility issue.

> 
> sure we can have code to fix it, that only expose 0x1f to new enough cpu
> model. But it just make thing complicated.
> 
> > > because in this case, they are exactly the same in leaf 0xb and 0x1f. we don't
> > > need to bother advertising the duplicate data.
> > 
> > You can't "define" the same 0x0b and 0x1f as duplicates. SDM doesn't
> > have such the definition.
> 
> for QEMU, they are duplicate data that need to be maintained and need to be
> passed to KVM by KVM_SET_CPUID. For guest, it's also unnecessary, because it
> doesn't provide any additional information with cpuid leaf 1f.

I understand your concerns. The benefit is to follow the behavior of the
new hardware and spec recommendations, on new machines people are going
to be more accustomed to using 0x1f to get topology, and VMs on new
machines that don't have 0x1f will tend to get confused.

I could start by having a look at if we could synchronize Host in -host
to enable 0x1f. If there isn't too much block, -host is an acceptable
starting point, after all, there are no additional compatibility issues
for this case. ;-)

Thanks,
Zhao

> 
> SDM keeps cpuid 0xb is for backwards compatibility.
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-15 15:18             ` Zhao Liu
@ 2024-01-16 16:40               ` Xiaoyao Li
  2024-01-19  7:59                 ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Xiaoyao Li @ 2024-01-16 16:40 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daudé, Yanan Wang

On 1/15/2024 11:18 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Mon, Jan 15, 2024 at 03:45:58PM +0800, Xiaoyao Li wrote:
>> Date: Mon, 15 Jan 2024 15:45:58 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>
>> On 1/15/2024 1:59 PM, Zhao Liu wrote:
>>> (Also cc "machine core" maintainers.)
>>>u
>>> Hi Xiaoyao,
>>>
>>> On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:
>>>> Date: Mon, 15 Jan 2024 12:18:17 +0800
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>>>
>>>> On 1/15/2024 11:27 AM, Zhao Liu wrote:
>>>>> On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
>>>>>> Date: Sun, 14 Jan 2024 21:49:18 +0800
>>>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
>>>>>>
>>>>>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>>>>>> From: Zhuocheng Ding <zhuocheng.ding@intel.com>
>>>>>>>
>>>>>>> Introduce cluster-id other than module-id to be consistent with
>>>>>>> CpuInstanceProperties.cluster-id, and this avoids the confusion
>>>>>>> of parameter names when hotplugging.
>>>>>>
>>>>>> I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
>>>>>> It introduces confusion around the code.
>>>>>
>>>>> There is a precedent: generic "socket" v.s. i386 "package".
>>>>
>>>> It's not the same thing. "socket" vs "package" is just software people and
>>>> hardware people chose different name. It's just different naming issue.
>>>
>>> No, it's a similar issue. Same physical device, different name only.
>>>
>>> Furthermore, the topology was introduced for resource layout and silicon
>>> fabrication, and similar design ideas and fabrication processes are fairly
>>> consistent across common current arches. Therefore, it is possible to
>>> abstract similar topological hierarchies for different arches.
>>>
>>>>
>>>> however, here it's reusing name issue while 'cluster' has been defined for
>>>> x86. It does introduce confusion.
>>>
>>> There's nothing fundamentally different between the x86 module and the
>>> generic cluster, is there? This is the reason that I don't agree with
>>> introducing "modules" in -smp.
>>
>> generic cluster just means the cluster of processors, i.e, a group of
>> cpus/lps. It is just a middle level between die and core.
> 
> Not sure if you mean the "cluster" device for TCG GDB? "cluster" device
> is different with "cluster" option in -smp.

No, I just mean the word 'cluster'. And I thought what you called 
"generic cluster" means "a cluster of logical processors"

Below I quote the description of Yanan's commit 864c3b5c32f0:

     A cluster generally means a group of CPU cores which share L2 cache
     or other mid-level resources, and it is the shared resources that
     is used to improve scheduler's behavior. From the point of view of
     the size range, it's between CPU die and CPU core. For example, on
     some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
     and 4 CPU cores in each cluster. The 4 CPU cores share a separate
     L2 cache and a L3 cache tag, which brings cache affinity advantage.

What I get from it, is, cluster is just a middle level between CPU die 
and CPU core. The cpu cores inside one cluster shares some mid-level 
resource. L2 cache is just one example of the shared mid-level resource. 
So it can be either module level, or tile level in x86, or even the 
diegrp level you mentioned below.

> When Yanan introduced the "cluster" option in -smp, he mentioned that it
> is for sharing L2 and L3 tags, which roughly corresponds to our module.
> 
>>
>> It can be the module level in intel, or tile level. Further, if per die lp
>> number increases in the future, there might be more middle levels in intel
>> between die and core. Then at that time, how to decide what level should
>> cluster be mapped to?
> 
> Currently, there're 3 levels defined in SDM which are between die and
> core: diegrp, tile and module. In our products, L2 is just sharing on the
> module, so the intel's module and the general cluster are the best match.

you said 'generic cluster' a lot of times. But from my point of view, 
you are referring to current ARM's cluster instead of *generic* cluster.

Anyway, cluster is just a mid-level between die and core. We should not 
associate it any specific resource. A resource is shared in what level 
can change, e.g., initially L3 cache is shared in a physical package. 
When multi-die got supported, L3 cache is shared in one die. Now, on AMD 
product, L3 cache is shared in one complex, and one die can have 2 
complexs thus 2 separate L3 cache in one die.

It doesn't matter calling it cluster, or module, or xyz. It is just a 
name to represent a cpu topology level between die and core. What 
matters is, once it gets accepted, it becomes formal ABI for users that 
'cluster' means 'module' for x86 users. This is definitely a big 
confusion for people. Maybe people try to figure out why, and find the 
reason is that 'cluster' means the level at which L2 cache is shared and 
that's just the module level in x86 shares L2 cache. Maybe in the 
future, "L2 is shared in module" get changed just like the example I 
give for L3 above. Then, that's really the big confusion, and all this 
become the "historic reason" that cluster is chosen to represent module 
in x86.

> There are no commercially available machines for the other levels yet,
> so there's no way to ensure exactly what the future holds, but we should
> try to avoid fragmentation of the topology hierarchy and try to maintain
> the uniform and common topology hierarchies for QEMU.
> 
> Unless a new level for -smp is introduced in the future when an unsolvable
> problem is raised.
> 
>>
>>>>
>>>>> The direct definition of cluster is the level that is above the "core"
>>>>> and shares the hardware resources including L2. In this sense, arm's
>>>>> cluster is the same as x86's module.
>>>>
>>>> then, what about intel implements tile level in the future? why ARM's
>>>> 'cluster' is mapped to 'module', but not 'tile' ?
>>>
>>> This depends on the actual need.
>>>
>>> Module (for x86) and cluster (in general) are similar, and tile (for x86)
>>> is used for L3 in practice, so I use module rather than tile to map
>>> generic cluster.
>>>
>>> And, it should be noted that x86 module is mapped to the generic cluster,
>>> not to ARM's. It's just that currently only ARM is using the clusters
>>> option in -smp.
>>>
>>> I believe QEMU provides the abstract and unified topology hierarchies in
>>> -smp, not the arch-specific hierarchies.
>>>
>>>>
>>>> reusing 'cluster' for 'module' is just a bad idea.
>>>>
>>>>> Though different arches have different naming styles, but QEMU's generic
>>>>> code still need the uniform topology hierarchy.
>>>>
>>>> generic code can provide as many topology levels as it can. each ARCH can
>>>> choose to use the ones it supports.
>>>>
>>>> e.g.,
>>>>
>>>> in qapi/machine.json, it says,
>>>>
>>>> # The ordering from highest/coarsest to lowest/finest is:
>>>> # @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
>>>
>>> This ordering is well-defined...
>>>
>>>> #
>>>> # Different architectures support different subsets of topology
>>>> # containers.
>>>> #
>>>> # For example, s390x does not have clusters and dies, and the socket
>>>> # is the parent container of cores.
>>>>
>>>> we can update it to
>>>>
>>>> # The ordering from highest/coarsest to lowest/finest is:
>>>> # @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
>>>> # @threads.
>>>
>>> ...but here it's impossible to figure out why cluster is above module,
>>> and even I can't come up with the difference between cluster and module.
>>>
>>>> #
>>>> # Different architectures support different subsets of topology
>>>> # containers.
>>>> #
>>>> # For example, s390x does not have clusters and dies, and the socket
>>>> # is the parent container of cores.
>>>> #
>>>> # For example, x86 does not have drawers and books, and does not support
>>>> # cluster.
>>>>
>>>> even if cluster of x86 is supported someday in the future, we can remove the
>>>> ordering requirement from above description.
>>>
>>> x86's cluster is above the package.
>>>
>>> To reserve this name for x86, we can't have the well-defined topology
>>> ordering.
>>>
>>> But topology ordering is necessary in generic code, and many
>>> calculations depend on the topology ordering.
>>
>> could you point me to the code?
> 
> Yes, e.g., there're 2 helpers: machine_topo_get_cores_per_socket() and
> machine_topo_get_threads_per_socket().

I see. these two helpers are fragile, that they need to be updated every 
time new level between core and socket is introduced.

Anyway, we can ensure the order for each ARCH, that the valid levels for 
any ARCH are ordered. e.g., we have

@drawers, @books, @sockets, @dies, @clusters, @module, @cores, @threads

defined,

for s390, the valid levels are

  @drawers, @books, @sockets, @cores, @threads

for arm, the valid levels are

  @sockets, @dies, @clusters, @cores, @threads
  (I'm not sure if die is supported for ARM?)

for x86, the valid levels are

  @sockets, @dies, @module, @cores, @threads

All of them are ordered. those unsupported level in each ARCH just get 
value 1. It won't have any issue in the calculation for the default 
value, but you provided two functions may not be lucky. anyway, they can 
be fixed at the time when we really go this approach.

>>
>>>>
>>>>>>
>>>>>> s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
>>>>>> add a module level for x86 instead of reusing cluster.
>>>>>>
>>>>>> (This is also what I want to reply to the cover letter.)
>>>>>>
>>>>>> [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
>>>>>
>>>>> These two new levels have the clear topological hierarchy relationship
>>>>> and don't duplicate existing ones.
>>>>>
>>>>> "book" or "drawer" may correspond to intel's "cluster".
>>>>>
>>>>> Maybe, in the future, we could support for arch-specific alias topologies
>>>>> in -smp.
>>>>
>>>> I don't think we need alias, reusing 'cluster' for 'module' doesn't gain any
>>>> benefit except avoid adding a new field in SMPconfiguration. All the other
>>>> cluster code is ARM specific and x86 cannot share.
>>>
>>> The point is that there is no difference between intel module and general
>>> cluster...Considering only the naming issue, even AMD has the "complex" to
>>> correspond to the Intel's "module".
>>
>> does complex of AMD really match with intel module? L3 cache is shared in
>> one complex, while L2 cache is shared in one module for now.
> 
> If then it could correspond to intel's tile, which is after all a level
> below die.

So if AMD wants to add complex in smp topology, where should complex 
level get put? between die and cluster?

> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-16 16:40               ` Xiaoyao Li
@ 2024-01-19  7:59                 ` Zhao Liu
  2024-01-26  3:37                   ` Zhao Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Zhao Liu @ 2024-01-19  7:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daudé, Yanan Wang

On Wed, Jan 17, 2024 at 12:40:12AM +0800, Xiaoyao Li wrote:
> Date: Wed, 17 Jan 2024 00:40:12 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> 
> On 1/15/2024 11:18 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Mon, Jan 15, 2024 at 03:45:58PM +0800, Xiaoyao Li wrote:
> > > Date: Mon, 15 Jan 2024 15:45:58 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > 
> > > On 1/15/2024 1:59 PM, Zhao Liu wrote:
> > > > (Also cc "machine core" maintainers.)
> > > > u
> > > > Hi Xiaoyao,
> > > > 
> > > > On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:
> > > > > Date: Mon, 15 Jan 2024 12:18:17 +0800
> > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > > > 
> > > > > On 1/15/2024 11:27 AM, Zhao Liu wrote:
> > > > > > On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:
> > > > > > > Date: Sun, 14 Jan 2024 21:49:18 +0800
> > > > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > > > Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
> > > > > > > 
> > > > > > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > > > > > From: Zhuocheng Ding <zhuocheng.ding@intel.com>
> > > > > > > > 
> > > > > > > > Introduce cluster-id other than module-id to be consistent with
> > > > > > > > CpuInstanceProperties.cluster-id, and this avoids the confusion
> > > > > > > > of parameter names when hotplugging.
> > > > > > > 
> > > > > > > I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
> > > > > > > It introduces confusion around the code.
> > > > > > 
> > > > > > There is a precedent: generic "socket" v.s. i386 "package".
> > > > > 
> > > > > It's not the same thing. "socket" vs "package" is just software people and
> > > > > hardware people chose different name. It's just different naming issue.
> > > > 
> > > > No, it's a similar issue. Same physical device, different name only.
> > > > 
> > > > Furthermore, the topology was introduced for resource layout and silicon
> > > > fabrication, and similar design ideas and fabrication processes are fairly
> > > > consistent across common current arches. Therefore, it is possible to
> > > > abstract similar topological hierarchies for different arches.
> > > > 
> > > > > 
> > > > > however, here it's reusing name issue while 'cluster' has been defined for
> > > > > x86. It does introduce confusion.
> > > > 
> > > > There's nothing fundamentally different between the x86 module and the
> > > > generic cluster, is there? This is the reason that I don't agree with
> > > > introducing "modules" in -smp.
> > > 
> > > generic cluster just means the cluster of processors, i.e, a group of
> > > cpus/lps. It is just a middle level between die and core.
> > 
> > Not sure if you mean the "cluster" device for TCG GDB? "cluster" device
> > is different with "cluster" option in -smp.
> 
> No, I just mean the word 'cluster'. And I thought what you called "generic
> cluster" means "a cluster of logical processors"
> 
> Below I quote the description of Yanan's commit 864c3b5c32f0:
> 
>     A cluster generally means a group of CPU cores which share L2 cache
>     or other mid-level resources, and it is the shared resources that
>     is used to improve scheduler's behavior. From the point of view of
>     the size range, it's between CPU die and CPU core. For example, on
>     some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
>     and 4 CPU cores in each cluster. The 4 CPU cores share a separate
>     L2 cache and a L3 cache tag, which brings cache affinity advantage.
> 
> What I get from it, is, cluster is just a middle level between CPU die and
> CPU core.

Here the words "a group of CPU" is not the software concept, but a hardware
topology.

> The cpu cores inside one cluster shares some mid-level resource.
> L2 cache is just one example of the shared mid-level resource. So it can be
> either module level, or tile level in x86, or even the diegrp level you
> mentioned below.

In actual hardware design, ARM's cluster is close to intel's module.

I'm not seeing any examples of clusters being similar to tile or diegrp.
Even within intel, the hardware architects have agreed definitions for each
level.

> 
> > When Yanan introduced the "cluster" option in -smp, he mentioned that it
> > is for sharing L2 and L3 tags, which roughly corresponds to our module.
> > 
> > > 
> > > It can be the module level in intel, or tile level. Further, if per die lp
> > > number increases in the future, there might be more middle levels in intel
> > > between die and core. Then at that time, how to decide what level should
> > > cluster be mapped to?
> > 
> > Currently, there're 3 levels defined in SDM which are between die and
> > core: diegrp, tile and module. In our products, L2 is just sharing on the
> > module, so the intel's module and the general cluster are the best match.
> 
> you said 'generic cluster' a lot of times. But from my point of view, you
> are referring to current ARM's cluster instead of *generic* cluster.

No, I'm always talking about the "cluster" in -smp, not ARM specific
cluster. ARM just maps its arch-specific cluster to the general one.

> 
> Anyway, cluster is just a mid-level between die and core. We should not
> associate it any specific resource. A resource is shared in what level can
> change, e.g., initially L3 cache is shared in a physical package. When
> multi-die got supported, L3 cache is shared in one die. Now, on AMD product,
> L3 cache is shared in one complex, and one die can have 2 complexs thus 2
> separate L3 cache in one die.
> It doesn't matter calling it cluster, or module, or xyz. It is just a name
> to represent a cpu topology level between die and core.

In the case of more complex topologies, QEMU may consider supporting
aliasing.

Vender can support topology aliases for its own arch, but there is no
possibility of discarding QEMU's unified topology hierarchy in favor of
building arch-specific hierarchies.

> What matters is,
> once it gets accepted, it becomes formal ABI for users that 'cluster' means
> 'module' for x86 users. This is definitely a big confusion for people. Maybe
> people try to figure out why, and find the reason is that 'cluster' means
> the level at which L2 cache is shared and that's just the module level in
> x86 shares L2 cache. Maybe in the future, "L2 is shared in module" get
> changed just like the example I give for L3 above.

My decision to map modules to clusters is based on existing and
future product topology characteristics, which are supported by hardware
practices. Of course, in the upcoming our cache topology patch series,
users can customize the cache topology hierarchy.

> Then, that's really the big confusion,

I don't think this will confuse the user. All details can be explained
clearly by document.

> and all this become the "historic reason" that cluster is
> chosen to represent module in x86.
> 
> > There are no commercially available machines for the other levels yet,
> > so there's no way to ensure exactly what the future holds, but we should
> > try to avoid fragmentation of the topology hierarchy and try to maintain
> > the uniform and common topology hierarchies for QEMU.
> > 
> > Unless a new level for -smp is introduced in the future when an unsolvable
> > problem is raised.
> > 
> > > 
> > > > > 
> > > > > > The direct definition of cluster is the level that is above the "core"
> > > > > > and shares the hardware resources including L2. In this sense, arm's
> > > > > > cluster is the same as x86's module.
> > > > > 
> > > > > then, what about intel implements tile level in the future? why ARM's
> > > > > 'cluster' is mapped to 'module', but not 'tile' ?
> > > > 
> > > > This depends on the actual need.
> > > > 
> > > > Module (for x86) and cluster (in general) are similar, and tile (for x86)
> > > > is used for L3 in practice, so I use module rather than tile to map
> > > > generic cluster.
> > > > 
> > > > And, it should be noted that x86 module is mapped to the generic cluster,
> > > > not to ARM's. It's just that currently only ARM is using the clusters
> > > > option in -smp.
> > > > 
> > > > I believe QEMU provides the abstract and unified topology hierarchies in
> > > > -smp, not the arch-specific hierarchies.
> > > > 
> > > > > 
> > > > > reusing 'cluster' for 'module' is just a bad idea.
> > > > > 
> > > > > > Though different arches have different naming styles, but QEMU's generic
> > > > > > code still need the uniform topology hierarchy.
> > > > > 
> > > > > generic code can provide as many topology levels as it can. each ARCH can
> > > > > choose to use the ones it supports.
> > > > > 
> > > > > e.g.,
> > > > > 
> > > > > in qapi/machine.json, it says,
> > > > > 
> > > > > # The ordering from highest/coarsest to lowest/finest is:
> > > > > # @drawers, @books, @sockets, @dies, @clusters, @cores, @threads.
> > > > 
> > > > This ordering is well-defined...
> > > > 
> > > > > #
> > > > > # Different architectures support different subsets of topology
> > > > > # containers.
> > > > > #
> > > > > # For example, s390x does not have clusters and dies, and the socket
> > > > > # is the parent container of cores.
> > > > > 
> > > > > we can update it to
> > > > > 
> > > > > # The ordering from highest/coarsest to lowest/finest is:
> > > > > # @drawers, @books, @sockets, @dies, @clusters, @module, @cores,
> > > > > # @threads.
> > > > 
> > > > ...but here it's impossible to figure out why cluster is above module,
> > > > and even I can't come up with the difference between cluster and module.
> > > > 
> > > > > #
> > > > > # Different architectures support different subsets of topology
> > > > > # containers.
> > > > > #
> > > > > # For example, s390x does not have clusters and dies, and the socket
> > > > > # is the parent container of cores.
> > > > > #
> > > > > # For example, x86 does not have drawers and books, and does not support
> > > > > # cluster.
> > > > > 
> > > > > even if cluster of x86 is supported someday in the future, we can remove the
> > > > > ordering requirement from above description.
> > > > 
> > > > x86's cluster is above the package.
> > > > 
> > > > To reserve this name for x86, we can't have the well-defined topology
> > > > ordering.
> > > > 
> > > > But topology ordering is necessary in generic code, and many
> > > > calculations depend on the topology ordering.
> > > 
> > > could you point me to the code?
> > 
> > Yes, e.g., there're 2 helpers: machine_topo_get_cores_per_socket() and
> > machine_topo_get_threads_per_socket().
> 
> I see. these two helpers are fragile, that they need to be updated every
> time new level between core and socket is introduced.
> 
> Anyway, we can ensure the order for each ARCH, that the valid levels for any
> ARCH are ordered. e.g., we have
> 
> @drawers, @books, @sockets, @dies, @clusters, @module, @cores, @threads

Sorry to repeat my previous objection: the order why cluster is above the
module can't be well-defined. This is more confusing. It is not possible
to add a new level without clearly defining the hierarchical relationship.

Different venders have their different names, there is no reason or
possibility to cram all the names of all the vender into this arrangement,
and it is not maintainable.

> 
> defined,
> 
> for s390, the valid levels are
> 
>  @drawers, @books, @sockets, @cores, @threads
> 
> for arm, the valid levels are
> 
>  @sockets, @dies, @clusters, @cores, @threads
>  (I'm not sure if die is supported for ARM?)
> 
> for x86, the valid levels are
> 
>  @sockets, @dies, @module, @cores, @threads
> 
> All of them are ordered. those unsupported level in each ARCH just get value
> 1. It won't have any issue in the calculation for the default value, but you
> provided two functions may not be lucky. anyway, they can be fixed at the
> time when we really go this approach.
> 
> > > 
> > > > > 
> > > > > > > 
> > > > > > > s390 just added 'drawer' and 'book' in cpu topology[1]. I think we can also
> > > > > > > add a module level for x86 instead of reusing cluster.
> > > > > > > 
> > > > > > > (This is also what I want to reply to the cover letter.)
> > > > > > > 
> > > > > > > [1] https://lore.kernel.org/qemu-devel/20231016183925.2384704-1-nsg@linux.ibm.com/
> > > > > > 
> > > > > > These two new levels have the clear topological hierarchy relationship
> > > > > > and don't duplicate existing ones.
> > > > > > 
> > > > > > "book" or "drawer" may correspond to intel's "cluster".
> > > > > > 
> > > > > > Maybe, in the future, we could support for arch-specific alias topologies
> > > > > > in -smp.
> > > > > 
> > > > > I don't think we need alias, reusing 'cluster' for 'module' doesn't gain any
> > > > > benefit except avoid adding a new field in SMPconfiguration. All the other
> > > > > cluster code is ARM specific and x86 cannot share.
> > > > 
> > > > The point is that there is no difference between intel module and general
> > > > cluster...Considering only the naming issue, even AMD has the "complex" to
> > > > correspond to the Intel's "module".
> > > 
> > > does complex of AMD really match with intel module? L3 cache is shared in
> > > one complex, while L2 cache is shared in one module for now.
> > 
> > If then it could correspond to intel's tile, which is after all a level
> > below die.
> 
> So if AMD wants to add complex in smp topology, where should complex level
> get put? between die and cluster?

That's just an example, and just showed that AMD and intel have naming
differences, even if they are both x86. We can't be happy with everyone.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level
  2024-01-11  3:19   ` Xiaoyao Li
  2024-01-11  9:07     ` Zhao Liu
@ 2024-01-23  9:56     ` Zhao Liu
  1 sibling, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-23  9:56 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma

Hi Xiaoyao,

On Thu, Jan 11, 2024 at 11:19:34AM +0800, Xiaoyao Li wrote:
> Date: Thu, 11 Jan 2024 11:19:34 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with
>  specific topology level
> 
> On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.
> > 
> > In fact, the specific topology level exposed in 0x1F depends on the
> > platform's support for extension levels (module, tile and die).
> > 
> > To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
> > with specific topology level.
> > 
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > Tested-by: Babu Moger <babu.moger@amd.com>
> > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > Changes since v3:
> >   * New patch to prepare to expose module level in 0x1F.
> >   * Move the CPUTopoLevel enumeration definition from "i386: Add cache
> >     topology info in CPUCacheInfo" to this patch. Note, to align with
> >     topology types in SDM, revert the name of CPU_TOPO_LEVEL_UNKNOW to
> >     CPU_TOPO_LEVEL_INVALID.
> > ---
> >   target/i386/cpu.c | 136 +++++++++++++++++++++++++++++++++++++---------
> >   target/i386/cpu.h |  15 +++++
> >   2 files changed, 126 insertions(+), 25 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index bc440477d13d..5c295c9a9e2d 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -269,6 +269,116 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
> >              (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
> >   }
> > +static uint32_t num_cpus_by_topo_level(X86CPUTopoInfo *topo_info,
> > +                                       enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return 1;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return topo_info->threads_per_core;
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_die;
> > +    case CPU_TOPO_LEVEL_PACKAGE:
> > +        return topo_info->threads_per_core * topo_info->cores_per_die *
> > +               topo_info->dies_per_pkg;
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static uint32_t apicid_offset_by_topo_level(X86CPUTopoInfo *topo_info,
> > +                                            enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return 0;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return apicid_core_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return apicid_die_offset(topo_info);
> > +    case CPU_TOPO_LEVEL_PACKAGE:
> > +        return apicid_pkg_offset(topo_info);
> > +    default:
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static uint32_t cpuid1f_topo_type(enum CPUTopoLevel topo_level)
> > +{
> > +    switch (topo_level) {
> > +    case CPU_TOPO_LEVEL_INVALID:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_INVALID;
> > +    case CPU_TOPO_LEVEL_SMT:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_SMT;
> > +    case CPU_TOPO_LEVEL_CORE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_CORE;
> > +    case CPU_TOPO_LEVEL_DIE:
> > +        return CPUID_1F_ECX_TOPO_LEVEL_DIE;
> > +    default:
> > +        /* Other types are not supported in QEMU. */
> > +        g_assert_not_reached();
> > +    }
> > +    return 0;
> > +}
> > +
> > +static void encode_topo_cpuid1f(CPUX86State *env, uint32_t count,
> > +                                X86CPUTopoInfo *topo_info,
> > +                                uint32_t *eax, uint32_t *ebx,
> > +                                uint32_t *ecx, uint32_t *edx)
> > +{
> > +    static DECLARE_BITMAP(topo_bitmap, CPU_TOPO_LEVEL_MAX);
> > +    X86CPU *cpu = env_archcpu(env);
> > +    unsigned long level, next_level;
> > +    uint32_t num_cpus_next_level, offset_next_level;
> 
> again, I dislike the name of cpus to represent the logical process or
> thread. we can call it, num_lps_next_level, or num_threads_next_level;
> 
> > +
> > +    /*
> > +     * Initialize the bitmap to decide which levels should be
> > +     * encoded in 0x1f.
> > +     */
> > +    if (!count) {
> 
> using static bitmap and initialize the bitmap on (count == 0), looks bad to
> me. It highly relies on the order of how encode_topo_cpuid1f() is called,
> and fragile.
> 
> Instead, we can maintain an array in CPUX86State, e.g.,

In my practice, I have found this way to be rather tricky since...

> 
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1904,6 +1904,8 @@ typedef struct CPUArchState {
> 
>      /* Number of dies within this CPU package. */
>      unsigned nr_dies;
> +
> +    unint8_t valid_cpu_topo[CPU_TOPO_LEVEL_MAX];
>  } CPUX86State;
> 

this array actually pre-binds the 0x1f subleaf to the topology level,
so this way brings difficulties to the array initialization stage...

> 
> and initialize it as below, when initializing the env
> 
> env->valid_cpu_topo[0] = CPU_TOPO_LEVEL_SMT;
> env->valid_cpu_topo[1] = CPU_TOPO_LEVEL_CORE;
> if (env->nr_dies > 1) {
> 	env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_DIE;
> }

... as here.

Based on 1f encoding rule, with module, we may need this logic like
this:

// If there's module, encode the module level at ECX=2.
if (env->nr_modules > 1) {
       env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_MODULE;
       if (env->nr_dies > 1) {
       		env->valid_cpu_topo[3] = CPU_TOPO_LEVEL_DIE;
       }
} else if (env->nr_dies > 1) { // Otherwise, encode die directly.
       env->valid_cpu_topo[2] = CPU_TOPO_LEVEL_DIE;
}

Such case-by-case checking lacks scalability, and if more levels are
supported in the future, such as tiles, the whole checking becomes
unclean. Am I understanding you correctly?

About the static bitmap, declaring it as static is an optimization.
Because the count (ECX, e.g., ECX=N) means the Nth topology levels,
if we didn't use static virable, we would need to iterate each time
to find the Nth level.

Since we know that the subleaf of 0x1f must be sequentially encoded,
the logic of this static code is always in effect.

What do you think?

Thanks,
Zhao

> 
> then in encode_topo_cpuid1f(), we can get level and next_level as
> 
> level = env->valid_cpu_topo[count];
> next_level = env->valid_cpu_topo[count + 1];
> 
> 
> > +        /* SMT and core levels are exposed in 0x1f leaf by default. */
> > +        set_bit(CPU_TOPO_LEVEL_SMT, topo_bitmap);
> > +        set_bit(CPU_TOPO_LEVEL_CORE, topo_bitmap);
> > +
> > +        if (env->nr_dies > 1) {
> > +            set_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> > +        }
> > +    }
> > +
> > +    *ecx = count & 0xff;
> > +    *edx = cpu->apic_id;
> > +
> > +    level = find_first_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX);
> > +    if (level == CPU_TOPO_LEVEL_MAX) {
> > +        num_cpus_next_level = 0;
> > +        offset_next_level = 0;
> > +
> > +        /* Encode CPU_TOPO_LEVEL_INVALID into the last subleaf of 0x1f. */
> > +        level = CPU_TOPO_LEVEL_INVALID;
> > +    } else {
> > +        next_level = find_next_bit(topo_bitmap, CPU_TOPO_LEVEL_MAX, level + 1);
> > +        if (next_level == CPU_TOPO_LEVEL_MAX) {
> > +            next_level = CPU_TOPO_LEVEL_PACKAGE;
> > +        }
> > +
> > +        num_cpus_next_level = num_cpus_by_topo_level(topo_info, next_level);
> > +        offset_next_level = apicid_offset_by_topo_level(topo_info, next_level);
> > +    }
> > +
> > +    *eax = offset_next_level;
> > +    *ebx = num_cpus_next_level;
> > +    *ecx |= cpuid1f_topo_type(level) << 8;
> > +
> > +    assert(!(*eax & ~0x1f));
> > +    *ebx &= 0xffff; /* The count doesn't need to be reliable. */
> > +    if (level != CPU_TOPO_LEVEL_MAX) {
> > +        clear_bit(level, topo_bitmap);
> > +    }
> > +}
> > +
> >   /* Encode cache info for CPUID[0x80000005].ECX or CPUID[0x80000005].EDX */
> >   static uint32_t encode_cache_cpuid80000005(CPUCacheInfo *cache)
> >   {
> > @@ -6284,31 +6394,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >               break;
> >           }
> > -        *ecx = count & 0xff;
> > -        *edx = cpu->apic_id;
> > -        switch (count) {
> > -        case 0:
> > -            *eax = apicid_core_offset(&topo_info);
> > -            *ebx = topo_info.threads_per_core;
> > -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_SMT << 8;
> > -            break;
> > -        case 1:
> > -            *eax = apicid_die_offset(&topo_info);
> > -            *ebx = topo_info.cores_per_die * topo_info.threads_per_core;
> > -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_CORE << 8;
> > -            break;
> > -        case 2:
> > -            *eax = apicid_pkg_offset(&topo_info);
> > -            *ebx = cpus_per_pkg;
> > -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_DIE << 8;
> > -            break;
> > -        default:
> > -            *eax = 0;
> > -            *ebx = 0;
> > -            *ecx |= CPUID_1F_ECX_TOPO_LEVEL_INVALID << 8;
> > -        }
> > -        assert(!(*eax & ~0x1f));
> > -        *ebx &= 0xffff; /* The count doesn't need to be reliable. */
> > +        encode_topo_cpuid1f(env, count, &topo_info, eax, ebx, ecx, edx);
> >           break;
> >       case 0xD: {
> >           /* Processor Extended State */
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index f47bad46db5e..9c78cfc3f322 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1008,6 +1008,21 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
> >   #define CPUID_MWAIT_IBE     (1U << 1) /* Interrupts can exit capability */
> >   #define CPUID_MWAIT_EMX     (1U << 0) /* enumeration supported */
> > +/*
> > + * CPUTopoLevel is the general i386 topology hierarchical representation,
> > + * ordered by increasing hierarchical relationship.
> > + * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
> > + * or AMD (CPUID[0x80000026]).
> > + */
> > +enum CPUTopoLevel {
> > +    CPU_TOPO_LEVEL_INVALID,
> > +    CPU_TOPO_LEVEL_SMT,
> > +    CPU_TOPO_LEVEL_CORE,
> > +    CPU_TOPO_LEVEL_DIE,
> > +    CPU_TOPO_LEVEL_PACKAGE,
> > +    CPU_TOPO_LEVEL_MAX,
> > +};
> > +
> >   /* CPUID[0xB].ECX level types */
> >   #define CPUID_B_ECX_TOPO_LEVEL_INVALID  0
> >   #define CPUID_B_ECX_TOPO_LEVEL_SMT      1
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU
  2024-01-19  7:59                 ` Zhao Liu
@ 2024-01-26  3:37                   ` Zhao Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Zhao Liu @ 2024-01-26  3:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Eduardo Habkost, Marcel Apfelbaum, Michael S . Tsirkin,
	Richard Henderson, Paolo Bonzini, Marcelo Tosatti, qemu-devel,
	kvm, Zhenyu Wang, Zhuocheng Ding, Zhao Liu, Babu Moger,
	Yongwei Ma, Philippe Mathieu-Daud�, Yanan Wang

Hi Xiaoyao,

> > > > generic cluster just means the cluster of processors, i.e, a group of
> > > > cpus/lps. It is just a middle level between die and core.
> > > 
> > > Not sure if you mean the "cluster" device for TCG GDB? "cluster" device
> > > is different with "cluster" option in -smp.
> > 
> > No, I just mean the word 'cluster'. And I thought what you called "generic
> > cluster" means "a cluster of logical processors"
> > 
> > Below I quote the description of Yanan's commit 864c3b5c32f0:
> > 
> >     A cluster generally means a group of CPU cores which share L2 cache
> >     or other mid-level resources, and it is the shared resources that
> >     is used to improve scheduler's behavior. From the point of view of
> >     the size range, it's between CPU die and CPU core. For example, on
> >     some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
> >     and 4 CPU cores in each cluster. The 4 CPU cores share a separate
> >     L2 cache and a L3 cache tag, which brings cache affinity advantage.
> > 
> > What I get from it, is, cluster is just a middle level between CPU die and
> > CPU core.
> 
> Here the words "a group of CPU" is not the software concept, but a hardware
> topology.

When I found this material:

https://www.kernel.org/doc/Documentation/devicetree/bindings/cpu/cpu-topology.txt

I realized the most essential difference between cluster and module is
that cluster supports nesting, i.e. it can have nesting clusters as a
layer of CPU topology.

Even though QEMU's description of cluster looked similar to module when
it was introduced, it is impossible to envision whether ARM/RISCV and
other device tree-based arches will continue to introduce nesting
clusters in the future.

To avoid potential conflicts, it would be better to introduce modules
for x86 to differentiate them from clusters.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2024-01-26  3:25 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
2024-01-10  9:31   ` Xiaoyao Li
2024-01-11  8:43     ` Zhao Liu
2024-01-14 14:11       ` Xiaoyao Li
2024-01-15  3:04         ` Zhao Liu
2024-01-15  3:51       ` Xiaoyao Li
2024-01-15  4:16         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
2024-01-10 11:52   ` Xiaoyao Li
2024-01-11  8:46     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] Zhao Liu
2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
2024-01-11  3:19   ` Xiaoyao Li
2024-01-11  9:07     ` Zhao Liu
2024-01-23  9:56     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State Zhao Liu
2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
2024-01-11  5:53   ` Xiaoyao Li
2024-01-11  9:18     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
2024-01-11  6:04   ` Xiaoyao Li
2024-01-11  9:21     ` Zhao Liu
2024-01-15  3:25   ` Yuan Yao
2024-01-15  4:09     ` Zhao Liu
2024-01-15  4:34       ` Xiaoyao Li
2024-01-15  5:20         ` Yuan Yao
2024-01-15  6:20           ` Zhao Liu
2024-01-15  6:57             ` Yuan Yao
2024-01-15  7:20               ` Zhao Liu
2024-01-15  9:03                 ` Yuan Yao
2024-01-15  6:12         ` Zhao Liu
2024-01-15  6:11           ` Xiaoyao Li
2024-01-15  6:35             ` Zhao Liu
2024-01-15  7:16               ` Xiaoyao Li
2024-01-15 15:46                 ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
2024-01-14 12:42   ` Xiaoyao Li
2024-01-15  3:52     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
2024-01-14 13:49   ` Xiaoyao Li
2024-01-15  3:27     ` Zhao Liu
2024-01-15  4:18       ` Xiaoyao Li
2024-01-15  5:59         ` Zhao Liu
2024-01-15  7:45           ` Xiaoyao Li
2024-01-15 15:18             ` Zhao Liu
2024-01-16 16:40               ` Xiaoyao Li
2024-01-19  7:59                 ` Zhao Liu
2024-01-26  3:37                   ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing Zhao Liu
2024-01-08  8:27 ` [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine Zhao Liu
2024-01-08  8:27 ` [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo Zhao Liu
2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
2024-01-14 14:31   ` Xiaoyao Li
2024-01-15  3:40     ` Zhao Liu
2024-01-15  4:25       ` Xiaoyao Li
2024-01-15  6:25         ` Zhao Liu
2024-01-15  7:00           ` Xiaoyao Li
2024-01-15 14:55             ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
2024-01-14 14:42   ` Xiaoyao Li
2024-01-15  3:48     ` Zhao Liu
2024-01-15  4:27       ` Xiaoyao Li
2024-01-15 14:54         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode " Zhao Liu
2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
2024-01-09  1:48   ` Zhao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).