qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Zhao Liu <zhao1.liu@intel.com>
Cc: Chuang Xu <xuchuangxclwt@bytedance.com>,
	pbonzini@redhat.com, imammedo@redhat.com,
	xieyongji@bytedance.com, chaiwen.cc@bytedance.com,
	qemu-stable@nongnu.org, Guixiong Wei <weiguixiong@bytedance.com>,
	Yipeng Yin <yinyipeng@bytedance.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v6] i386/cpu: fixup number of addressable IDs for logical processors in the physical package
Date: Thu, 17 Oct 2024 16:18:06 +0800	[thread overview]
Message-ID: <bbcfcbbd-1666-4e97-ae18-f47202d89009@intel.com> (raw)
In-Reply-To: <ZwyRsq4EIooifRvb@intel.com>

On 10/14/2024 11:36 AM, Zhao Liu wrote:
>>>> On 10/9/2024 11:56 AM, Chuang Xu wrote:
>>>>> When QEMU is started with:
>>>>> -cpu host,migratable=on,host-cache-info=on,l3-cache=off
>>>>> -smp 180,sockets=2,dies=1,cores=45,threads=2
>>>>>
>>>>> On Intel platform:
>>>>> CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for
>>>>> logical processors in the physical package".
>>>>>
>>>>> When executing "cpuid -1 -l 1 -r" in the guest, we obtain a
>>>>> value of 90 for
>>>>> CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally,
>>>>> executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for
>>>>> CPUID.04H.EAX[31:26], which matches the expected result.
>>>>>
>>>>> As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2 integer,
>>>>> we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
>>>>> integer too. Otherwise we may encounter unexpected results in guest.
>>>>>
>>>>> For example, when QEMU is started with CLI above and xtopology
>>>>> is disabled,
>>>>> guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/
>>>>> (1+CPUID.04H.EAX[31:26]) to
>>>>> calculate threads-per-core in detect_ht(). Then guest will get
>>>>> "90/ (1+63)=1"
>>>>> as the result, even though threads-per-core should actually be 2.
>>>>
>>>> It's kernel's bug instead.
>>>>
>>>> In 1.5.3 "Sub ID Extraction Parameters for initial APIC ID" of
>>>> "Intel 64 Architecture Processor Topology Enumeration" [1], it is
>>>>
>>>>    - SMT_Mask_Width = Log2(RoundToNearestPof2(CPUID.1:EBX[23:16])/
>>>> (CPUID.(EAX=4,ECX=0):EAX[31:26]) + 1))
>>>>
>>>> The value of CPUID.1:EBX[23:16] needs to be *rounded* to the
>>>> neartest power-of-two integer instead of itself being the
>>>> power-of-two.
>>>>
>>>> This also is consistency with the SDM, where the comment for bit
>>>> 23-16 of CPUID.1:EBX is:
>>>>
>>>>    The nearest power-of-2 integer that is not smaller than EBX[23:16] is
>>>>    the number of unique initial APIC IDs reserved for addressing
>>>>    different logical processors in a physical package.
>>>>
>>>> What I read from this is, the nearest power-of-2 integer that is not
>>>> smaller than EBX[23:16] is a different thing than EBX[23:16]. i.e.,
>>>
>>> Yes, when I read sdm, I also thought it was a kernel bug. But on my
>>> 192ht spr host, the value of CPUID.1:EBX[23:16] was indeed rounded up
>>>
>>> to the nearest power of 2 by the hardware. After communicating with
>>> Intel technical support staff, we thought that perhaps the description
>>> in sdm
>>>
>>> is not so accurate, and rounding up CPUID.1:EBX[23:16] to the power of 2
>>> in qemu may be more in line with the hardware behavior.
>>
>> I think above justification is important. We need to justify our changes
>> with the fact and correct reason.
>>
>> I somehow agree to set EBX[23:16] to a value of power-of-2, because the
>> 1.5.3 "Sub ID Extraction Parameters for initial APIC ID" of "Intel 64
>> Architecture Processor Topology Enumeration" spec says
>>
>>      CPUID.1:EBX[23:16] represents the maximum number of addressable IDs
>>      (initial APIC ID) that can be assigned to logical processors in a
>>      physical package. The value may not be the same as the number of
>>      logical processors that are present in the hardware of a physical
>>      package.
>>
>> It uses the word "may not".
> 
> IMO, I don't quite understand your confusion regarding this. I've already
> explained the meaning of addressable ID, and the spec you referenced also
> clarifies its significance. The reason for this modification is not
> because of the two words "may not".
> 
> Whether it is "be" or "not be" the same as the number of logical
> processors, the essence is that due to topology, the actual number of
> initial IDs that can be accommodated in the APIC ID may exceed the number
> of logical processors.

I have the confusion because no matter from SDM:

   Bit 23-16: Maximum number of addressable IDs for logical processors in
              this physical package*

   * The nearest power-of-2 integer that is not smaller than EBX[23:16]
     is the number of unique initial APIC IDs reserved for addressing
     different logical processors in a physical package.

or from "Intel 64 Architecture Processor Topology Enumeration" spec,(Jan 
2018, revision 1.1), 1.5.3 "sub ID Extraction Parameters for Inital APIC ID"

   RoundToNearestPof2(CPUID.1:EBX[23:16])

or from "Intel 64 Architecture Processor Topology Enumeration" 
spec,(April 2023, revision 2.0), 1.6.1 Legacy Extraction Algorithm

https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf

   "MaximumLogicalProcessorIDsPerPackage" is calculated by rounding
    CPUID.01H.EBX[23:16] to nearest power of 2.

what I read from them is that EBX[23:16] is a different thing than 
pow2ceil(EBX[23:16]) and EBX[23:16] doesn't need to be power-of-2, but 
the patch are trying to make it power-of-2.

Then I consult it with Intel internal architect. I was told that 
EBX[23:16] used to be that software was to round to the next power of 2. 
However, software had issues a long time ago because applications could 
then compute the wrong power of 2 based on APIC ID holes or some 
applications would use it directly (without round it up to power-of-2).
So intel became to report exact power-of-2 and this behavior is not 
documented.


  reply	other threads:[~2024-10-17  8:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09  3:56 [PATCH v6] i386/cpu: fixup number of addressable IDs for logical processors in the physical package Chuang Xu
2024-10-09  4:21 ` Zhao Liu
2024-10-12  7:13 ` Xiaoyao Li
2024-10-12  8:10   ` Chuang Xu
2024-10-12  8:32     ` Xiaoyao Li
2024-10-12  8:56       ` Zhao Liu
2024-10-12  8:21 ` Xiaoyao Li
2024-10-12  9:28   ` Zhao Liu
2024-10-12  9:35   ` Chuang Xu
2024-10-14  0:36     ` Xiaoyao Li
2024-10-14  1:32       ` Xiaoyao Li
2024-10-14  3:36       ` Zhao Liu
2024-10-17  8:18         ` Xiaoyao Li [this message]
2024-10-17  9:03           ` Zhao Liu
2024-10-28 16:07             ` Xiaoyao Li
2024-12-03  7:33               ` Zhao Liu
2024-12-03 15:04                 ` Xiaoyao Li
2024-12-03 15:35                   ` Zhao Liu
2024-12-03 15:29                 ` Daniel P. Berrangé
2024-12-03  7:36 ` Zhao Liu
2024-12-03  7:29   ` Chuang Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbcfcbbd-1666-4e97-ae18-f47202d89009@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=chaiwen.cc@bytedance.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=weiguixiong@bytedance.com \
    --cc=xieyongji@bytedance.com \
    --cc=xuchuangxclwt@bytedance.com \
    --cc=yinyipeng@bytedance.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).