From: Dimitri Sivanich <sivanich@hpe.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, Tom Lendacky <thomas.lendacky@amd.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Arjan van de Ven <arjan@linux.intel.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
Dick Kennedy <dick.kennedy@broadcom.com>,
James Smart <james.smart@broadcom.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-scsi@vger.kernel.org, linux-hwmon@vger.kernel.org,
Jean Delvare <jdelvare@suse.com>, Huang Rui <ray.huang@amd.com>,
Guenter Roeck <linux@roeck-us.net>,
Steve Wahl <steve.wahl@hpe.com>,
Dimitri Sivanich <dimitri.sivanich@hpe.com>,
Russ Anderson <russ.anderson@hpe.com>
Subject: Re: [patch 00/29] x86/cpu: Rework the topology evaluation
Date: Tue, 25 Jul 2023 15:17:33 -0500 [thread overview]
Message-ID: <20230725201614.GB97682@hpe.com> (raw)
In-Reply-To: <20230724155329.474037902@linutronix.de>
We have successfully booted a 32-socket, 3840 cpu HPE Sapphire Rapids system
with your patchset applied, as found in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
# cat /sys/kernel/debug/x86/topo/cpus/3839
initial_apicid: ff7
apicid: ff7
pkg_id: 31
die_id: 31
cu_id: 255
core_id: 59
logical_pkg_id: 31
logical_die_id: 31
llc_id: 3968
l2c_id: 4086
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 60
max_die_per_pkg: 1
smp_num_siblings: 2
We were also able to boot a 60-socket, 2160 cpu HPE SuperDome Flex (SkyLake)
with the patchset.
# cat /sys/kernel/debug/x86/topo/cpus/2159
initial_apicid: ef7
apicid: ef7
pkg_id: 59
die_id: 59
cu_id: 255
core_id: 27
logical_pkg_id: 59
logical_die_id: 59
llc_id: 3776
l2c_id: 3830
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 18
max_die_per_pkg: 1
smp_num_siblings: 2
On Mon, Jul 24, 2023 at 07:43:50PM +0200, Thomas Gleixner wrote:
> Hi!
>
> A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at
> the way how topology is evaluated. That "fix" is just yet another cure the
> sypmtom hack which completely ignores the underlying disaster.
>
> The way how topology evaluation works is to overwrite the relevant
> variables as often as possible. E.g. smp_num_siblings gets overwritten a
> gazillion times, which is wrong to begin with. The boot CPU writes it 3
> times, each AP two times.
>
> What's worse is that this just works by chance on hybrid systems due to the
> fact that the existing ones all seem to boot on a P-Core which has
> SMT. Would it boot on a E-Core which has no SMT, then parts of the early
> topology evaluation including the primary thread mask which is required for
> parallel CPU bringup would be completely wrong. Overwriting it later on
> with the correct value does not help at all.
>
> What's wrong today with hybrid already is the number of cores per package.
> On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per
> package is evaluated to be 12. Which is not further surprising because the
> CPUID 0xb/0x1f parser looks at the number of logical processors at core
> level and divides them by the number of SMP siblings.
>
> 24 / 2 = 12
>
> Just that this CPU has obviously 16 cores not 12.
>
> It's is even clearly documented in the SDM that this is wrong.
>
> "Bits 15-00: The number of logical processors across all instances of this
> domain within the next higher- scoped domain relative to this current
> logical processor. (For example, in a processor socket/package comprising
> "M" dies of "N" cores each, where each core has "L" processors, the
> "die" domain subleaf value of this field would be M*N*L. In an asymmetric
> topology this would be the summation of the value across the lower domain
> level instances to create each upper domain level instance.) This number
> reflects configuration as shipped by Intel. Note, software must not use
> this field to enumerate processor topology.
>
> Software must not use the value of EBX[15:0] to enumerate processor topology
> of the system. The value is only intended for display and diagnostic purposes.
> The actual number of logical processors available to BIOS/OS/Applications
> may be different from the value of EBX[15:0], depending on software and
> platform hardware configurations."
>
> This "_NOT_ to use for topology evaluation" sentence existed even before
> hybrid came along and got ignored. The code worked by chance, but with
> hybrid all bets are off. The code completely falls apart once CPUID leaf
> 0x1f enumerates any topology level between CORE and DIE, but that's not a
> suprise.
>
> The proper thing to do is to actually evaluate the full topology including
> the non-present (hotpluggable) CPUs based on the APICIDs which are provided
> by the firmware and a proper topology domain level parser. This can exactly
> tell the number of physical packages, logical packages etc. _before_ even
> booting a single AP. All of that can be evaluated upfront.
>
> Aside of that there are too many places which do their own topology
> evaluation, but there is absolutely no central point which can actually
> provide all of that information in a consistent way. This needs to change.
>
> This series implements another piece towards this: sane CPUID evaluation,
> which is done at _one_ place in a proper well defined order instead of
> having it sprinkled all over the CPUID evaluation code.
>
> At the end of this series this is pretty much bug compatible with the
> current CPUID evaluation code in respect to the cores per package
> evaluation, but it gets rid of overwriting things like smp_num_siblings,
> which is now written once, but is still not capable to work correctly on a
> hybrid machine which boots from a non SMT core. These things can only be
> fixed up in the next step(s).
>
> When I tried to go further with this I ran into yet another pile of
> historical layers of duct tape and haywire with a gazillion of random
> variables sprinkled all over the place. That's still work in progress. It
> actually works, but the last step which switches over is not yet in a shape
> that can be easily reviewed. Stay tuned.
>
> The series is based on the APIC cleanup series:
>
> https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de
>
> and also available on top of that from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
>
> Thanks,
>
> tglx
> ---
> arch/x86/kernel/cpu/topology.c | 168 --------------------
> b/Documentation/arch/x86/topology.rst | 12 -
> b/arch/x86/events/amd/core.c | 2
> b/arch/x86/events/amd/uncore.c | 2
> b/arch/x86/events/intel/uncore.c | 2
> b/arch/x86/include/asm/apic.h | 1
> b/arch/x86/include/asm/cacheinfo.h | 3
> b/arch/x86/include/asm/cpuid.h | 32 +++
> b/arch/x86/include/asm/processor.h | 58 ++++--
> b/arch/x86/include/asm/smp.h | 2
> b/arch/x86/include/asm/topology.h | 51 +++++-
> b/arch/x86/include/asm/x86_init.h | 2
> b/arch/x86/kernel/amd_nb.c | 8
> b/arch/x86/kernel/apic/apic_flat_64.c | 7
> b/arch/x86/kernel/apic/apic_noop.c | 3
> b/arch/x86/kernel/apic/apic_numachip.c | 11 -
> b/arch/x86/kernel/apic/bigsmp_32.c | 6
> b/arch/x86/kernel/apic/local.h | 1
> b/arch/x86/kernel/apic/probe_32.c | 6
> b/arch/x86/kernel/apic/x2apic_cluster.c | 1
> b/arch/x86/kernel/apic/x2apic_phys.c | 6
> b/arch/x86/kernel/apic/x2apic_uv_x.c | 63 +------
> b/arch/x86/kernel/cpu/Makefile | 5
> b/arch/x86/kernel/cpu/amd.c | 156 ------------------
> b/arch/x86/kernel/cpu/cacheinfo.c | 51 ++----
> b/arch/x86/kernel/cpu/centaur.c | 4
> b/arch/x86/kernel/cpu/common.c | 108 +-----------
> b/arch/x86/kernel/cpu/cpu.h | 14 +
> b/arch/x86/kernel/cpu/debugfs.c | 98 +++++++++++
> b/arch/x86/kernel/cpu/hygon.c | 133 ---------------
> b/arch/x86/kernel/cpu/intel.c | 38 ----
> b/arch/x86/kernel/cpu/mce/amd.c | 4
> b/arch/x86/kernel/cpu/mce/apei.c | 4
> b/arch/x86/kernel/cpu/mce/core.c | 4
> b/arch/x86/kernel/cpu/mce/inject.c | 7
> b/arch/x86/kernel/cpu/proc.c | 8
> b/arch/x86/kernel/cpu/topology.h | 51 ++++++
> b/arch/x86/kernel/cpu/topology_amd.c | 179 +++++++++++++++++++++
> b/arch/x86/kernel/cpu/topology_common.c | 233 ++++++++++++++++++++++++++++
> b/arch/x86/kernel/cpu/topology_ext.c | 136 ++++++++++++++++
> b/arch/x86/kernel/cpu/zhaoxin.c | 18 --
> b/arch/x86/kernel/smpboot.c | 64 ++++---
> b/arch/x86/kernel/vsmp_64.c | 13 -
> b/arch/x86/mm/amdtopology.c | 35 +---
> b/arch/x86/xen/apic.c | 8
> b/drivers/edac/amd64_edac.c | 4
> b/drivers/edac/mce_amd.c | 4
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2
> b/drivers/hwmon/fam15h_power.c | 7
> b/drivers/scsi/lpfc/lpfc_init.c | 8
> b/drivers/virt/acrn/hsm.c | 2
> 51 files changed, 964 insertions(+), 881 deletions(-)
>
>
>
next prev parent reply other threads:[~2023-07-25 20:18 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-24 17:43 [patch 00/29] x86/cpu: Rework the topology evaluation Thomas Gleixner
2023-07-24 17:43 ` [patch 01/29] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
2023-07-25 7:46 ` Thomas Gleixner
2023-07-24 17:43 ` [patch 02/29] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
2023-07-24 17:43 ` [patch 03/29] x86/cpu: Move cpu_die_id " Thomas Gleixner
2023-07-24 17:43 ` [patch 04/29] scsi: lpfc: Use topology_core_id() Thomas Gleixner
2023-07-24 17:43 ` [patch 05/29] hwmon: (fam15h_power) " Thomas Gleixner
2023-07-24 17:53 ` Guenter Roeck
2023-07-24 17:43 ` [patch 06/29] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 07/29] x86/cpu: Move cu_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 08/29] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 09/29] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 10/29] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 11/29] x86/cpu: Provide debug interface Thomas Gleixner
2023-07-24 17:44 ` [patch 12/29] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
2023-07-24 17:44 ` [patch 13/29] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
2023-07-24 17:44 ` [patch 14/29] x86/cpu: Add legacy topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 15/29] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
2023-07-24 17:44 ` [patch 16/29] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
2023-07-24 17:44 ` [patch 17/29] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
2023-07-24 20:49 ` Peter Zijlstra
2023-07-24 21:02 ` Peter Zijlstra
2023-07-25 6:51 ` Thomas Gleixner
2023-07-24 17:44 ` [patch 18/29] x86/cpu: Use common topology code for Intel Thomas Gleixner
2023-07-24 17:44 ` [patch 19/29] x86/cpu/amd: Provide a separate acessor for Node ID Thomas Gleixner
2023-07-24 17:44 ` [patch 20/29] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 21/29] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
2023-07-24 17:44 ` [patch 22/29] x86/cpu: Use common topology code for AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 23/29] x86/cpu: Use common topology code for HYGON Thomas Gleixner
2023-07-24 17:44 ` [patch 24/29] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 25/29] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
2023-07-24 17:44 ` [patch 26/29] x86/cpu: Remove topology.c Thomas Gleixner
2023-07-24 17:44 ` [patch 27/29] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 28/29] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
2023-07-24 17:44 ` [patch 29/29] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
2023-07-24 21:05 ` [patch 00/29] x86/cpu: Rework the topology evaluation Peter Zijlstra
2023-07-25 20:17 ` Dimitri Sivanich [this message]
2023-07-26 22:15 ` Sohil Mehta
2023-07-26 22:38 ` Thomas Gleixner
2023-07-26 23:27 ` Sohil Mehta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230725201614.GB97682@hpe.com \
--to=sivanich@hpe.com \
--cc=andrew.cooper3@citrix.com \
--cc=arjan@linux.intel.com \
--cc=dick.kennedy@broadcom.com \
--cc=dimitri.sivanich@hpe.com \
--cc=james.smart@broadcom.com \
--cc=jdelvare@suse.com \
--cc=jejb@linux.ibm.com \
--cc=linux-hwmon@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=martin.petersen@oracle.com \
--cc=ray.huang@amd.com \
--cc=russ.anderson@hpe.com \
--cc=steve.wahl@hpe.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox