Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Zhao Liu <zhao1.liu@intel.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Chuang Xu <xuchuangxclwt@bytedance.com>,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	xieyongji@bytedance.com, chaiwen.cc@bytedance.com,
	qemu-stable@nongnu.org, Guixiong Wei <weiguixiong@bytedance.com>,
	Yipeng Yin <yinyipeng@bytedance.com>,
	Zhao Liu <zhao1.liu@intel.com>
Subject: Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package
Date: Wed, 25 Sep 2024 16:49:19 +0800	[thread overview]
Message-ID: <ZvPOj82NvTbGlxsV@intel.com> (raw)
In-Reply-To: <20240920130827.751ccfb1@imammedo.users.ipa.redhat.com>

On Fri, Sep 20, 2024 at 01:08:27PM +0200, Igor Mammedov wrote:
> Date: Fri, 20 Sep 2024 13:08:27 +0200
> From: Igor Mammedov <imammedo@redhat.com>
> Subject: Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for
>  logical processors in the physical package
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu)
> 
> On Fri, 20 Sep 2024 02:29:46 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
> 
> > Hi Chuang and Igor,
> > 
> > Sorry for late reply,
> > 
> > On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> > > Date: Wed, 18 Sep 2024 21:18:15 +0800
> > > From: Chuang Xu <xuchuangxclwt@bytedance.com>
> > > Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
> > >  processors in the physical package
> > > X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> > > 
> > > When QEMU is started with:
> > > -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> > > -smp 180,sockets=2,dies=1,cores=45,threads=2
> > > 
> > > Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> > > CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> > > execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> > > CPUID.04H.EAX[31:26] as expected.
> > > 
> > > As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> > > we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> > > integer too. Otherwise we may encounter unexpected results in guest.
> > > 
> > > For example, when QEMU is started with CLI above and xtopology is disabled,
> > > guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> > > calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> > > as the result, even though theads-per-core should actually be 2.
> > > 
> > > So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> > > to solve the unexpected result.
> > > 
> > > Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
> > > Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
> > > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
> > > ---
> > >  target/i386/cpu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 4c2e6f3a71..3710ae5283 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > >          }
> > >          *edx = env->features[FEAT_1_EDX];
> > >          if (threads_per_pkg > 1) {
> > > -            *ebx |= threads_per_pkg << 16;
> > > +            *ebx |= pow2ceil(threads_per_pkg) << 16;  
> > 
> > Yes, the fix is right.
> > 
> > About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
> > ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
> > introduced the new way to calculate.
> > 
> > The pow2ceil() works for current SMP topology, but may be wrong on
> > hybrid topology, as the reason I listed in the commit message:
> > 
> > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).  
> > 
> > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > are associated with APIC ID. For example, in linux kernel, the field
> > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > matched with actual core numbers and it's calculated by:
> > > "(1 << (pkg_offset - core_offset)) - 1".  
> > 
> > Using APIC ID offset to calculate is the hardware's approach, so I tried
> > to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.
> 
> Well, hybrid case needs some more explanation then.
> 
> 'pow2ceil(threads_per_pkg) << 16' - does exactly what SDM says for CPUID.01H.EBX[23:16]
> 
> Can you point to a spec that confirms that above is wrong and
> explain in more details how hybrid case is supposed to work
> and where it's documented?
>  

This is mainly about the meaning of "addressable ID". There's a spec
"Intel 64 Architecture Processor Topology Enumeration" [1].

In the section 1.5.3, it mentions the "addressable ID" of CPUID.01H.
EBX[23:16] indicates the "initial APIC ID". And for CPUID.04H:EAX[bits
31:26], that field is "core_id", which is the part of initial APIC ID.

Therefore, "maximum number of addressable IDs" indicates the maximum
number of IDs that each subfield of the APIC ID (we create based on the
smp topology) can hold, which is related to width.

For example, if the width of the APIC sub ID for the thread level is 2,
then it will hold a maximum of 4 sub IDs. For the smp system, the
sub-topology fields of the APIC IDs are filled, so pow2ceil() is always
valid.

For hybrid, there will be holes in the sub fields, for example, 4 E
cores form a module and a P core is exclusive to a module, for the core
field of the E core's APIC ID, the width of the core field is at least 2
to accommodate the IDs of the 4 cores, while the P core, though only 1
bit width is needed, still uses the same width as the E core in order to
align with the E core. In this case, pow2ceil() fails.

And the example I mentioned before ("on Alder Lake P, the CPUID.04H:EAX[
bits 31:26] is not matched with actual core numbers and it's calculated
by: '(1 << (pkg_offset - core_offset)) - 1'."), has the same reason,
that's because the APIC ID layout has the holes, and the possible APIC
IDs are more than actual cores.

[1]: https://cdrdv2-public.intel.com/759067/intel-64-architecture-processor-topology-enumeration.pdfi

Regards,
Zhao

     prev parent reply	other threads:[~2024-09-25  8:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-18 13:18 [PATCH v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package Chuang Xu
2024-09-19  7:06 ` Igor Mammedov
2024-09-19 18:29 ` Zhao Liu
2024-09-20 11:08   ` Igor Mammedov
2024-09-25  8:49     ` Zhao Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZvPOj82NvTbGlxsV@intel.com \
    --to=zhao1.liu@intel.com \
    --cc=chaiwen.cc@bytedance.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=weiguixiong@bytedance.com \
    --cc=xieyongji@bytedance.com \
    --cc=xuchuangxclwt@bytedance.com \
    --cc=yinyipeng@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).