public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Zhao Liu <zhao1.liu@linux.intel.com>
Cc: Eduardo Habkost <eduardo@habkost.net>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Zhenyu Wang <zhenyu.z.wang@intel.com>,
	Zhuocheng Ding <zhuocheng.ding@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Robert Hoo <robert.hu@linux.intel.com>,
	Babu Moger <babu.moger@amd.com>,
	Yongwei Ma <yongwei.ma@intel.com>
Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
Date: Sun, 14 Jan 2024 22:11:59 +0800	[thread overview]
Message-ID: <a2ee40c0-a198-41cd-86af-7ef52e6d591f@intel.com> (raw)
In-Reply-To: <ZZ+qGfykupOEFPA2@intel.com>

On 1/11/2024 4:43 PM, Zhao Liu wrote:
> Hi Xiaoyao,
> 
> On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
>> Date: Wed, 10 Jan 2024 17:31:28 +0800
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>>   topo in CPUID[4]
>>
>> On 1/8/2024 4:27 PM, Zhao Liu wrote:
>>> From: Zhao Liu <zhao1.liu@intel.com>
>>>
>>> Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
>>> CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
>>> nearest power-of-2 integer.
>>>
>>> The nearest power-of-2 integer can be calculated by pow2ceil() or by
>>> using APIC ID offset (like L3 topology using 1 << die_offset [3]).
>>>
>>> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
>>> are associated with APIC ID. For example, in linux kernel, the field
>>> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
>>
>> And for
>>> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
>>> matched with actual core numbers and it's calculated by:
>>> "(1 << (pkg_offset - core_offset)) - 1".
>>
>> could you elaborate it more? what is the value of actual core numbers on
>> Alder lake P? and what is the pkg_offset and core_offset?
> 
> For example, the following's the CPUID dump of an ADL-S machine:
> 
> CPUID.04H:
> 
> 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> 
> 
> CPUID.1FH:
> 
> 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> 
> The CPUID.04H:EAX[bits 31:26] is 63.
>  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> 
> Thus we can verify that the above equation as:
> 
> 1 << (0x7 - 0x1) - 1 = 63.
> 
> "Maximum number of addressable IDs" refers to the maximum number of IDs
> that can be enumerated in the APIC ID's topology layout, which does not
> necessarily correspond to the actual number of topology domains.
> 
>>
>>> Therefore the offset of APIC ID should be preferred to calculate nearest
>>> power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
>>> 31:26]:
>>> 1. d/i cache is shared in a core, 1 << core_offset should be used
>>>      instand of "cs->nr_threads" in encode_cache_cpuid4() for
>>
>> /s/instand/instead
> 
> Thanks!
> 
>>
>>>      CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
>>> 2. L2 cache is supposed to be shared in a core as for now, thereby
>>>      1 << core_offset should also be used instand of "cs->nr_threads" in
>>
>> ditto
> 
> Okay.
> 
>>
>>>      encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
>>> 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
>>>      calculated with the bit width between the Package and SMT levels in
>>>      the APIC ID (1 << (pkg_offset - core_offset) - 1).
>>>
>>> In addition, use APIC ID offset to replace "pow2ceil()" for
>>> cache_info_passthrough case.
>>>
>>> [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
>>> [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
>>> [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
>>>
>>> Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
>>> Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
>>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>>> Tested-by: Babu Moger <babu.moger@amd.com>
>>> Tested-by: Yongwei Ma <yongwei.ma@intel.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>> Changes since v3:
>>>    * Fix compile warnings. (Babu)
>>>    * Fix spelling typo.
>>>
>>> Changes since v1:
>>>    * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
>>>      case. (Yanan)
>>>    * Split the L1 cache fix into a separate patch.
>>>    * Rename the title of this patch (the original is "i386/cpu: Fix number
>>>      of addressable IDs in CPUID.04H").
>>> ---
>>>    target/i386/cpu.c | 30 +++++++++++++++++++++++-------
>>>    1 file changed, 23 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index 5a3678a789cf..c8d2a585723a 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>    {
>>>        X86CPU *cpu = env_archcpu(env);
>>>        CPUState *cs = env_cpu(env);
>>> -    uint32_t die_offset;
>>>        uint32_t limit;
>>>        uint32_t signature[3];
>>>        X86CPUTopoInfo topo_info;
>>> @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>>>                    int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
>>>                    int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
>>>                    if (cs->nr_cores > 1) {
>>> +                    int addressable_cores_offset =
>>> +                                                apicid_pkg_offset(&topo_info) -
>>> +                                                apicid_core_offset(&topo_info);
>>> +
>>>                        *eax &= ~0xFC000000;
>>> -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
>>> +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
>>
>> it should be ((1 << addressable_cores_offset) - 1) << 26
> 
> Good catch! The helper wrapped in a subsequent patch masks the error here.
> 
>>
>> I think naming it addressable_cores_width is better than
>> addressable_cores_offset. It's not offset because offset means the bit
>> position from bit 0.
> 
> I agree, "width" is better.
> 
>>
>> And we can get the width by another algorithm:
>>
>> int addressable_cores_width = apicid_core_width(&topo_info) +
>> apicid_die_width(&topo_info);
>> *eax |= ((1 << addressable_cores_width) - 1)) << 26;
> 
> This algorithm lacks flexibility because there will be more topology
> levels between package and core, such as the cluster being introduced...
> 
> Using "addressable_cores_width" is clear enough.
> 
>> 		
>>>                    }
>>>                    if (host_vcpus_per_cache > vcpus_per_socket) {
>>> +                    int pkg_offset = apicid_pkg_offset(&topo_info);
>>> +
>>>                        *eax &= ~0x3FFC000;
>>> -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
>>> +                    *eax |= (1 << (pkg_offset - 1)) << 14;
>>
>> Ditto, ((1 << pkg_offset) - 1) << 14
> 
> Thanks!
> 
>>
>> For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
>> intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
>> vcpus_per_cache (configured by users) to VM.
> 
> I tend to use a uniform calculation that is less confusing and easier to
> maintain. 

less confusing?

the original code is

	if (host_vcpus_per_cache > vcpus_per_socket) {
		*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
	}

and this patch is going to change it to

	if (host_vcpus_per_cache > vcpus_per_socket) {
		int pkg_offset = apicid_pkg_offset(&topo_info);
		*eax |= (1 << pkg_offset - 1)) << 14;
	}

Apparently, the former is clearer that everyone knows what is wants to 
do is "when guest's total vcpus_per_socket is even smaller than host's 
vcpu_per_cache, using guest's configuration". While the latter is more 
confusing.

> Since this field encodes "Maximum number of addressable IDs",
> OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
> only know that L3 is shared at the package level.

It doesn't matter with L3. What the code want to fulfill is that,

host_vcpus_per_cache is the actual number of LPs that share this level 
of cache. While vcpus_per_socket is the maximum numbere of LPs that can 
share a cache (at any level) in guest. When guest's maximum number is 
even smaller than host's, use guest's value.

> Thanks,
> Zhao
> 


  reply	other threads:[~2024-01-14 14:12 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
2024-01-10  9:31   ` Xiaoyao Li
2024-01-11  8:43     ` Zhao Liu
2024-01-14 14:11       ` Xiaoyao Li [this message]
2024-01-15  3:04         ` Zhao Liu
2024-01-15  3:51       ` Xiaoyao Li
2024-01-15  4:16         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
2024-01-10 11:52   ` Xiaoyao Li
2024-01-11  8:46     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] Zhao Liu
2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
2024-01-11  3:19   ` Xiaoyao Li
2024-01-11  9:07     ` Zhao Liu
2024-01-23  9:56     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State Zhao Liu
2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
2024-01-11  5:53   ` Xiaoyao Li
2024-01-11  9:18     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
2024-01-11  6:04   ` Xiaoyao Li
2024-01-11  9:21     ` Zhao Liu
2024-01-15  3:25   ` Yuan Yao
2024-01-15  4:09     ` Zhao Liu
2024-01-15  4:34       ` Xiaoyao Li
2024-01-15  5:20         ` Yuan Yao
2024-01-15  6:20           ` Zhao Liu
2024-01-15  6:57             ` Yuan Yao
2024-01-15  7:20               ` Zhao Liu
2024-01-15  9:03                 ` Yuan Yao
2024-01-15  6:12         ` Zhao Liu
2024-01-15  6:11           ` Xiaoyao Li
2024-01-15  6:35             ` Zhao Liu
2024-01-15  7:16               ` Xiaoyao Li
2024-01-15 15:46                 ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
2024-01-14 12:42   ` Xiaoyao Li
2024-01-15  3:52     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
2024-01-14 13:49   ` Xiaoyao Li
2024-01-15  3:27     ` Zhao Liu
2024-01-15  4:18       ` Xiaoyao Li
2024-01-15  5:59         ` Zhao Liu
2024-01-15  7:45           ` Xiaoyao Li
2024-01-15 15:18             ` Zhao Liu
2024-01-16 16:40               ` Xiaoyao Li
2024-01-19  7:59                 ` Zhao Liu
2024-01-26  3:37                   ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing Zhao Liu
2024-01-08  8:27 ` [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine Zhao Liu
2024-01-08  8:27 ` [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo Zhao Liu
2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
2024-01-14 14:31   ` Xiaoyao Li
2024-01-15  3:40     ` Zhao Liu
2024-01-15  4:25       ` Xiaoyao Li
2024-01-15  6:25         ` Zhao Liu
2024-01-15  7:00           ` Xiaoyao Li
2024-01-15 14:55             ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
2024-01-14 14:42   ` Xiaoyao Li
2024-01-15  3:48     ` Zhao Liu
2024-01-15  4:27       ` Xiaoyao Li
2024-01-15 14:54         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode " Zhao Liu
2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
2024-01-09  1:48   ` Zhao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2ee40c0-a198-41cd-86af-7ef52e6d591f@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=babu.moger@amd.com \
    --cc=eduardo@habkost.net \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=robert.hu@linux.intel.com \
    --cc=yongwei.ma@intel.com \
    --cc=zhao1.liu@intel.com \
    --cc=zhao1.liu@linux.intel.com \
    --cc=zhenyu.z.wang@intel.com \
    --cc=zhuocheng.ding@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox