From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Chuang Xu <xuchuangxclwt@bytedance.com>
Cc: pbonzini@redhat.com, imammedo@redhat.com,
xieyongji@bytedance.com, chaiwen.cc@bytedance.com,
zhao1.liu@intel.com, qemu-stable@nongnu.org,
Guixiong Wei <weiguixiong@bytedance.com>,
Yipeng Yin <yinyipeng@bytedance.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v6] i386/cpu: fixup number of addressable IDs for logical processors in the physical package
Date: Mon, 14 Oct 2024 09:32:45 +0800 [thread overview]
Message-ID: <34bc76ae-4d8f-430e-a6f5-5e4c73606644@intel.com> (raw)
In-Reply-To: <a48fcd78-d1c4-4359-bc18-d04147a93f50@intel.com>
On 10/14/2024 8:36 AM, Xiaoyao Li wrote:
> On 10/12/2024 5:35 PM, Chuang Xu wrote:
>>
>> On 10/12/24 下午4:21, Xiaoyao Li wrote:
>>> On 10/9/2024 11:56 AM, Chuang Xu wrote:
>>>> When QEMU is started with:
>>>> -cpu host,migratable=on,host-cache-info=on,l3-cache=off
>>>> -smp 180,sockets=2,dies=1,cores=45,threads=2
>>>>
>>>> On Intel platform:
>>>> CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for
>>>> logical processors in the physical package".
>>>>
>>>> When executing "cpuid -1 -l 1 -r" in the guest, we obtain a value of
>>>> 90 for
>>>> CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally,
>>>> executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for
>>>> CPUID.04H.EAX[31:26], which matches the expected result.
>>>>
>>>> As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2
>>>> integer,
>>>> we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
>>>> integer too. Otherwise we may encounter unexpected results in guest.
>>>>
>>>> For example, when QEMU is started with CLI above and xtopology is
>>>> disabled,
>>>> guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/
>>>> (1+CPUID.04H.EAX[31:26]) to
>>>> calculate threads-per-core in detect_ht(). Then guest will get "90/
>>>> (1+63)=1"
>>>> as the result, even though threads-per-core should actually be 2.
>>>
>>> It's kernel's bug instead.
>>>
>>> In 1.5.3 "Sub ID Extraction Parameters for initial APIC ID" of "Intel
>>> 64 Architecture Processor Topology Enumeration" [1], it is
>>>
>>> - SMT_Mask_Width = Log2(RoundToNearestPof2(CPUID.1:EBX[23:16])/
>>> (CPUID.(EAX=4,ECX=0):EAX[31:26]) + 1))
>>>
>>> The value of CPUID.1:EBX[23:16] needs to be *rounded* to the neartest
>>> power-of-two integer instead of itself being the power-of-two.
>>>
>>> This also is consistency with the SDM, where the comment for bit
>>> 23-16 of CPUID.1:EBX is:
>>>
>>> The nearest power-of-2 integer that is not smaller than EBX[23:16] is
>>> the number of unique initial APIC IDs reserved for addressing
>>> different logical processors in a physical package.
>>>
>>> What I read from this is, the nearest power-of-2 integer that is not
>>> smaller than EBX[23:16] is a different thing than EBX[23:16]. i.e.,
>>
>> Yes, when I read sdm, I also thought it was a kernel bug. But on my
>> 192ht spr host, the value of CPUID.1:EBX[23:16] was indeed rounded up
>>
>> to the nearest power of 2 by the hardware. After communicating with
>> Intel technical support staff, we thought that perhaps the description
>> in sdm
>>
>> is not so accurate, and rounding up CPUID.1:EBX[23:16] to the power of
>> 2 in qemu may be more in line with the hardware behavior.
>
> I think above justification is important. We need to justify our changes
> with the fact and correct reason.
>
> I somehow agree to set EBX[23:16] to a value of power-of-2, because the
> 1.5.3 "Sub ID Extraction Parameters for initial APIC ID" of "Intel 64
> Architecture Processor Topology Enumeration" spec says
>
> CPUID.1:EBX[23:16] represents the maximum number of addressable IDs
> (initial APIC ID) that can be assigned to logical processors in a
> physical package. The value may not be the same as the number of
> logical processors that are present in the hardware of a physical
> package.
>
> It uses the word "may not".
>
> However, the justification of the change cannot be "it leads to
> unexpected results in guest" because the guest implementation is not
> correct.
FYI, latest linux already fix the issue, it calculates the shift via
tscan->ebx1_nproc_shift = get_count_order(ebx.nproc);
>>>
>>> - EBX[23:16]: Maximum number of addressable IDs for logical processors
>>> in this physical package
>>>
>>> - pow2ceil(EBX[23:16]): the number of unique initial APIC IDs reserved
>>> for addressing different logical processors in a physical package.
>>>
>>> [1] https://cdrdv2-public.intel.com/759067/intel-64-architecture-
>>> processor-topology-enumeration.pdf
>>>
>>>> And on AMD platform:
>>>> CPUID.01H.EBX[23:16] is defined as "Logical processor count". Current
>>>> result meets our expectation.
>>>>
>>>> So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2
>>>> integer
>>>> only for Intel platform to solve the unexpected result.
>>>>
>>>> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>>>> Acked-by: Igor Mammedov <imammedo@redhat.com>
>>>> Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
>>>> Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
>>>> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
>>>> ---
>>>> target/i386/cpu.c | 10 +++++++++-
>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>> index ff227a8c5c..641d4577b0 100644
>>>> --- a/target/i386/cpu.c
>>>> +++ b/target/i386/cpu.c
>>>> @@ -6462,7 +6462,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
>>>> index, uint32_t count,
>>>> }
>>>> *edx = env->features[FEAT_1_EDX];
>>>> if (threads_per_pkg > 1) {
>>>> - *ebx |= threads_per_pkg << 16;
>>>> + /*
>>>> + * AMD requires logical processor count, but Intel
>>>> needs maximum
>>>> + * number of addressable IDs for logical processors per
>>>> package.
>>>> + */
>>>> + if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) {
>>>> + *ebx |= threads_per_pkg << 16;
>>>> + } else {
>>>> + *ebx |= 1 << apicid_pkg_offset(&topo_info) << 16;
>>>> + }
>>>> *edx |= CPUID_HT;
>>>> }
>>>> if (!cpu->enable_pmu) {
>>>
>
>
next prev parent reply other threads:[~2024-10-14 1:33 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 3:56 [PATCH v6] i386/cpu: fixup number of addressable IDs for logical processors in the physical package Chuang Xu
2024-10-09 4:21 ` Zhao Liu
2024-10-12 7:13 ` Xiaoyao Li
2024-10-12 8:10 ` Chuang Xu
2024-10-12 8:32 ` Xiaoyao Li
2024-10-12 8:56 ` Zhao Liu
2024-10-12 8:21 ` Xiaoyao Li
2024-10-12 9:28 ` Zhao Liu
2024-10-12 9:35 ` Chuang Xu
2024-10-14 0:36 ` Xiaoyao Li
2024-10-14 1:32 ` Xiaoyao Li [this message]
2024-10-14 3:36 ` Zhao Liu
2024-10-17 8:18 ` Xiaoyao Li
2024-10-17 9:03 ` Zhao Liu
2024-10-28 16:07 ` Xiaoyao Li
2024-12-03 7:33 ` Zhao Liu
2024-12-03 15:04 ` Xiaoyao Li
2024-12-03 15:35 ` Zhao Liu
2024-12-03 15:29 ` Daniel P. Berrangé
2024-12-03 7:36 ` Zhao Liu
2024-12-03 7:29 ` Chuang Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34bc76ae-4d8f-430e-a6f5-5e4c73606644@intel.com \
--to=xiaoyao.li@intel.com \
--cc=chaiwen.cc@bytedance.com \
--cc=imammedo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
--cc=weiguixiong@bytedance.com \
--cc=xieyongji@bytedance.com \
--cc=xuchuangxclwt@bytedance.com \
--cc=yinyipeng@bytedance.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).