Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Zhao Liu <zhao1.liu@linux.intel.com>
To: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Eduardo Habkost <eduardo@habkost.net>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Zhenyu Wang <zhenyu.z.wang@intel.com>,
	Zhuocheng Ding <zhuocheng.ding@intel.com>,
	Zhao Liu <zhao1.liu@intel.com>,
	Robert Hoo <robert.hu@linux.intel.com>,
	Babu Moger <babu.moger@amd.com>,
	Yongwei Ma <yongwei.ma@intel.com>
Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4]
Date: Mon, 15 Jan 2024 11:04:33 +0800	[thread overview]
Message-ID: <ZaSgwWPm31MHzGyU@intel.com> (raw)
In-Reply-To: <a2ee40c0-a198-41cd-86af-7ef52e6d591f@intel.com>

Hi Xiaoyao,

On Sun, Jan 14, 2024 at 10:11:59PM +0800, Xiaoyao Li wrote:
> Date: Sun, 14 Jan 2024 22:11:59 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
>  topo in CPUID[4]
> 
> On 1/11/2024 4:43 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > On Wed, Jan 10, 2024 at 05:31:28PM +0800, Xiaoyao Li wrote:
> > > Date: Wed, 10 Jan 2024 17:31:28 +0800
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > Subject: Re: [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache
> > >   topo in CPUID[4]
> > > 
> > > On 1/8/2024 4:27 PM, Zhao Liu wrote:
> > > > From: Zhao Liu <zhao1.liu@intel.com>
> > > > 
> > > > Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
> > > > CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
> > > > nearest power-of-2 integer.
> > > > 
> > > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > > using APIC ID offset (like L3 topology using 1 << die_offset [3]).
> > > > 
> > > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > > are associated with APIC ID. For example, in linux kernel, the field
> > > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID.
> > > 
> > > And for
> > > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > > matched with actual core numbers and it's calculated by:
> > > > "(1 << (pkg_offset - core_offset)) - 1".
> > > 
> > > could you elaborate it more? what is the value of actual core numbers on
> > > Alder lake P? and what is the pkg_offset and core_offset?
> > 
> > For example, the following's the CPUID dump of an ADL-S machine:
> > 
> > CPUID.04H:
> > 
> > 0x00000004 0x00: eax=0xfc004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
> > 0x00000004 0x01: eax=0xfc004122 ebx=0x01c0003f ecx=0x0000007f edx=0x00000000
> > 0x00000004 0x02: eax=0xfc01c143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
> > 0x00000004 0x03: eax=0xfc1fc163 ebx=0x0240003f ecx=0x00009fff edx=0x00000004
> > 0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
> > 
> > 
> > CPUID.1FH:
> > 
> > 0x0000001f 0x00: eax=0x00000001 ebx=0x00000001 ecx=0x00000100 edx=0x0000004c
> > 0x0000001f 0x01: eax=0x00000007 ebx=0x00000014 ecx=0x00000201 edx=0x0000004c
> > 0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x0000004c
> > 
> > The CPUID.04H:EAX[bits 31:26] is 63.
> >  From CPUID.1FH.00H:EAX[bits 04:00], the core_offset is 1, and from
> > CPUID.1FH.01H:EAX[bits 04:00], the pkg_offset is 7.
> > 
> > Thus we can verify that the above equation as:
> > 
> > 1 << (0x7 - 0x1) - 1 = 63.
> > 
> > "Maximum number of addressable IDs" refers to the maximum number of IDs
> > that can be enumerated in the APIC ID's topology layout, which does not
> > necessarily correspond to the actual number of topology domains.
> > 
> > > 
> > > > Therefore the offset of APIC ID should be preferred to calculate nearest
> > > > power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits
> > > > 31:26]:
> > > > 1. d/i cache is shared in a core, 1 << core_offset should be used
> > > >      instand of "cs->nr_threads" in encode_cache_cpuid4() for
> > > 
> > > /s/instand/instead
> > 
> > Thanks!
> > 
> > > 
> > > >      CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
> > > > 2. L2 cache is supposed to be shared in a core as for now, thereby
> > > >      1 << core_offset should also be used instand of "cs->nr_threads" in
> > > 
> > > ditto
> > 
> > Okay.
> > 
> > > 
> > > >      encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
> > > > 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
> > > >      calculated with the bit width between the Package and SMT levels in
> > > >      the APIC ID (1 << (pkg_offset - core_offset) - 1).
> > > > 
> > > > In addition, use APIC ID offset to replace "pow2ceil()" for
> > > > cache_info_passthrough case.
> > > > 
> > > > [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
> > > > [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
> > > > [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support")
> > > > 
> > > > Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> > > > Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> > > > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> > > > Tested-by: Babu Moger <babu.moger@amd.com>
> > > > Tested-by: Yongwei Ma <yongwei.ma@intel.com>
> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > Changes since v3:
> > > >    * Fix compile warnings. (Babu)
> > > >    * Fix spelling typo.
> > > > 
> > > > Changes since v1:
> > > >    * Use APIC ID offset to replace "pow2ceil()" for cache_info_passthrough
> > > >      case. (Yanan)
> > > >    * Split the L1 cache fix into a separate patch.
> > > >    * Rename the title of this patch (the original is "i386/cpu: Fix number
> > > >      of addressable IDs in CPUID.04H").
> > > > ---
> > > >    target/i386/cpu.c | 30 +++++++++++++++++++++++-------
> > > >    1 file changed, 23 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > > index 5a3678a789cf..c8d2a585723a 100644
> > > > --- a/target/i386/cpu.c
> > > > +++ b/target/i386/cpu.c
> > > > @@ -6014,7 +6014,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > >    {
> > > >        X86CPU *cpu = env_archcpu(env);
> > > >        CPUState *cs = env_cpu(env);
> > > > -    uint32_t die_offset;
> > > >        uint32_t limit;
> > > >        uint32_t signature[3];
> > > >        X86CPUTopoInfo topo_info;
> > > > @@ -6098,39 +6097,56 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > > >                    int host_vcpus_per_cache = 1 + ((*eax & 0x3FFC000) >> 14);
> > > >                    int vcpus_per_socket = cs->nr_cores * cs->nr_threads;
> > > >                    if (cs->nr_cores > 1) {
> > > > +                    int addressable_cores_offset =
> > > > +                                                apicid_pkg_offset(&topo_info) -
> > > > +                                                apicid_core_offset(&topo_info);
> > > > +
> > > >                        *eax &= ~0xFC000000;
> > > > -                    *eax |= (pow2ceil(cs->nr_cores) - 1) << 26;
> > > > +                    *eax |= (1 << (addressable_cores_offset - 1)) << 26;
> > > 
> > > it should be ((1 << addressable_cores_offset) - 1) << 26
> > 
> > Good catch! The helper wrapped in a subsequent patch masks the error here.
> > 
> > > 
> > > I think naming it addressable_cores_width is better than
> > > addressable_cores_offset. It's not offset because offset means the bit
> > > position from bit 0.
> > 
> > I agree, "width" is better.
> > 
> > > 
> > > And we can get the width by another algorithm:
> > > 
> > > int addressable_cores_width = apicid_core_width(&topo_info) +
> > > apicid_die_width(&topo_info);
> > > *eax |= ((1 << addressable_cores_width) - 1)) << 26;
> > 
> > This algorithm lacks flexibility because there will be more topology
> > levels between package and core, such as the cluster being introduced...
> > 
> > Using "addressable_cores_width" is clear enough.
> > 
> > > 		
> > > >                    }
> > > >                    if (host_vcpus_per_cache > vcpus_per_socket) {
> > > > +                    int pkg_offset = apicid_pkg_offset(&topo_info);
> > > > +
> > > >                        *eax &= ~0x3FFC000;
> > > > -                    *eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> > > > +                    *eax |= (1 << (pkg_offset - 1)) << 14;
> > > 
> > > Ditto, ((1 << pkg_offset) - 1) << 14
> > 
> > Thanks!
> > 
> > > 
> > > For this one, I think pow2ceil(vcpus_per_socket) is better. Because it's
> > > intuitive that when host_vcpus_per_cache > vcpus_per_socket, we expose
> > > vcpus_per_cache (configured by users) to VM.
> > 
> > I tend to use a uniform calculation that is less confusing and easier to
> > maintain.
> 
> less confusing?
> 
> the original code is
> 
> 	if (host_vcpus_per_cache > vcpus_per_socket) {
> 		*eax |= (pow2ceil(vcpus_per_socket) - 1) << 14;
> 	}
> 
> and this patch is going to change it to
> 
> 	if (host_vcpus_per_cache > vcpus_per_socket) {
> 		int pkg_offset = apicid_pkg_offset(&topo_info);
> 		*eax |= (1 << pkg_offset - 1)) << 14;
> 	}
> 
> Apparently, the former is clearer that everyone knows what is wants to do is
> "when guest's total vcpus_per_socket is even smaller than host's
> vcpu_per_cache, using guest's configuration". While the latter is more
> confusing.

IMO, the only differences are the variable naming and the way the
details are encoded, what is actually trying to be expressed is the
same - both set the cache topology at the package level.

There is no reason to use two encoding ways for the same field, and
it'll be a code maintenance disaster.

I can add comment here to allay your concern.

> 
> > Since this field encodes "Maximum number of addressable IDs",
> > OS can't get the exact number of CPUs/vCPUs sharing L3 from here, it can
> > only know that L3 is shared at the package level.
> 
> It doesn't matter with L3. What the code want to fulfill is that,

Yes, I misremembered here.

> 
> host_vcpus_per_cache is the actual number of LPs that share this level of
> cache. While vcpus_per_socket is the maximum numbere of LPs that can share a
> cache (at any level) in guest. When guest's maximum number is even smaller
> than host's, use guest's value.
> 

From the Guest's view, the cache is shared at package level. In hardware,
this one field only reflects the topology level and does not accurately
reflect the number of sharing CPUs.

So, we just need to make it clear that in this case the Guest cache
topology level is package.

Thanks,
Zhao

next prev parent reply	other threads:[~2024-01-15  2:52 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-08  8:27 [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 01/16] i386/cpu: Fix i/d-cache topology to core level for Intel CPU Zhao Liu
2024-01-08  8:27 ` [PATCH v7 02/16] i386/cpu: Use APIC ID offset to encode cache topo in CPUID[4] Zhao Liu
2024-01-10  9:31   ` Xiaoyao Li
2024-01-11  8:43     ` Zhao Liu
2024-01-14 14:11       ` Xiaoyao Li
2024-01-15  3:04         ` Zhao Liu [this message]
2024-01-15  3:51       ` Xiaoyao Li
2024-01-15  4:16         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 03/16] i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() Zhao Liu
2024-01-10 11:52   ` Xiaoyao Li
2024-01-11  8:46     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 04/16] i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] Zhao Liu
2024-01-08  8:27 ` [PATCH v7 05/16] i386: Decouple CPUID[0x1F] subleaf with specific topology level Zhao Liu
2024-01-11  3:19   ` Xiaoyao Li
2024-01-11  9:07     ` Zhao Liu
2024-01-23  9:56     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 06/16] i386: Introduce module-level cpu topology to CPUX86State Zhao Liu
2024-01-08  8:27 ` [PATCH v7 07/16] i386: Support modules_per_die in X86CPUTopoInfo Zhao Liu
2024-01-11  5:53   ` Xiaoyao Li
2024-01-11  9:18     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 08/16] i386: Expose module level in CPUID[0x1F] Zhao Liu
2024-01-11  6:04   ` Xiaoyao Li
2024-01-11  9:21     ` Zhao Liu
2024-01-15  3:25   ` Yuan Yao
2024-01-15  4:09     ` Zhao Liu
2024-01-15  4:34       ` Xiaoyao Li
2024-01-15  5:20         ` Yuan Yao
2024-01-15  6:20           ` Zhao Liu
2024-01-15  6:57             ` Yuan Yao
2024-01-15  7:20               ` Zhao Liu
2024-01-15  9:03                 ` Yuan Yao
2024-01-15  6:12         ` Zhao Liu
2024-01-15  6:11           ` Xiaoyao Li
2024-01-15  6:35             ` Zhao Liu
2024-01-15  7:16               ` Xiaoyao Li
2024-01-15 15:46                 ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 09/16] i386: Support module_id in X86CPUTopoIDs Zhao Liu
2024-01-14 12:42   ` Xiaoyao Li
2024-01-15  3:52     ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU Zhao Liu
2024-01-14 13:49   ` Xiaoyao Li
2024-01-15  3:27     ` Zhao Liu
2024-01-15  4:18       ` Xiaoyao Li
2024-01-15  5:59         ` Zhao Liu
2024-01-15  7:45           ` Xiaoyao Li
2024-01-15 15:18             ` Zhao Liu
2024-01-16 16:40               ` Xiaoyao Li
2024-01-19  7:59                 ` Zhao Liu
2024-01-26  3:37                   ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 11/16] tests: Add test case of APIC ID for module level parsing Zhao Liu
2024-01-08  8:27 ` [PATCH v7 12/16] hw/i386/pc: Support smp.clusters for x86 PC machine Zhao Liu
2024-01-08  8:27 ` [PATCH v7 13/16] i386: Add cache topology info in CPUCacheInfo Zhao Liu
2024-01-08  8:27 ` [PATCH v7 14/16] i386: Use CPUCacheInfo.share_level to encode CPUID[4] Zhao Liu
2024-01-14 14:31   ` Xiaoyao Li
2024-01-15  3:40     ` Zhao Liu
2024-01-15  4:25       ` Xiaoyao Li
2024-01-15  6:25         ` Zhao Liu
2024-01-15  7:00           ` Xiaoyao Li
2024-01-15 14:55             ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 15/16] i386: Use offsets get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] Zhao Liu
2024-01-14 14:42   ` Xiaoyao Li
2024-01-15  3:48     ` Zhao Liu
2024-01-15  4:27       ` Xiaoyao Li
2024-01-15 14:54         ` Zhao Liu
2024-01-08  8:27 ` [PATCH v7 16/16] i386: Use CPUCacheInfo.share_level to encode " Zhao Liu
2024-01-08 17:46 ` [PATCH v7 00/16] Support smp.clusters for x86 in QEMU Moger, Babu
2024-01-09  1:48   ` Zhao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaSgwWPm31MHzGyU@intel.com \
    --to=zhao1.liu@linux.intel.com \
    --cc=babu.moger@amd.com \
    --cc=eduardo@habkost.net \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=robert.hu@linux.intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yongwei.ma@intel.com \
    --cc=zhao1.liu@intel.com \
    --cc=zhenyu.z.wang@intel.com \
    --cc=zhuocheng.ding@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).