public inbox for linux-hwmon@vger.kernel.org
 help / color / mirror / Atom feed
From: Dimitri Sivanich <sivanich@hpe.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Tom Lendacky <thomas.lendacky@amd.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Dick Kennedy <dick.kennedy@broadcom.com>,
	James Smart <james.smart@broadcom.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-hwmon@vger.kernel.org,
	Jean Delvare <jdelvare@suse.com>, Huang Rui <ray.huang@amd.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Steve Wahl <steve.wahl@hpe.com>,
	Dimitri Sivanich <dimitri.sivanich@hpe.com>,
	Russ Anderson <russ.anderson@hpe.com>
Subject: Re: [patch 00/29] x86/cpu: Rework the topology evaluation
Date: Tue, 25 Jul 2023 15:17:33 -0500	[thread overview]
Message-ID: <20230725201614.GB97682@hpe.com> (raw)
In-Reply-To: <20230724155329.474037902@linutronix.de>

We have successfully booted a 32-socket, 3840 cpu HPE Sapphire Rapids system
with your patchset applied, as found in:
  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1

# cat /sys/kernel/debug/x86/topo/cpus/3839
initial_apicid: ff7
apicid: ff7
pkg_id: 31
die_id: 31
cu_id: 255
core_id: 59
logical_pkg_id: 31
logical_die_id: 31
llc_id: 3968
l2c_id: 4086
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 60
max_die_per_pkg: 1
smp_num_siblings: 2


We were also able to boot a 60-socket, 2160 cpu HPE SuperDome Flex (SkyLake)
with the patchset.

# cat /sys/kernel/debug/x86/topo/cpus/2159
initial_apicid: ef7
apicid: ef7
pkg_id: 59
die_id: 59
cu_id: 255
core_id: 27
logical_pkg_id: 59
logical_die_id: 59
llc_id: 3776
l2c_id: 3830
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 18
max_die_per_pkg: 1
smp_num_siblings: 2


On Mon, Jul 24, 2023 at 07:43:50PM +0200, Thomas Gleixner wrote:
> Hi!
> 
> A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at
> the way how topology is evaluated. That "fix" is just yet another cure the
> sypmtom hack which completely ignores the underlying disaster.
> 
> The way how topology evaluation works is to overwrite the relevant
> variables as often as possible. E.g. smp_num_siblings gets overwritten a
> gazillion times, which is wrong to begin with. The boot CPU writes it 3
> times, each AP two times.
> 
> What's worse is that this just works by chance on hybrid systems due to the
> fact that the existing ones all seem to boot on a P-Core which has
> SMT. Would it boot on a E-Core which has no SMT, then parts of the early
> topology evaluation including the primary thread mask which is required for
> parallel CPU bringup would be completely wrong. Overwriting it later on
> with the correct value does not help at all.
> 
> What's wrong today with hybrid already is the number of cores per package.
> On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per
> package is evaluated to be 12. Which is not further surprising because the
> CPUID 0xb/0x1f parser looks at the number of logical processors at core
> level and divides them by the number of SMP siblings.
> 
>    24 / 2 = 12
> 
> Just that this CPU has obviously 16 cores not 12.
> 
> It's is even clearly documented in the SDM that this is wrong.
> 
>  "Bits 15-00: The number of logical processors across all instances of this
>   domain within the next higher- scoped domain relative to this current
>   logical processor. (For example, in a processor socket/package comprising
>   "M" dies of "N" cores each, where each core has "L" processors, the
>   "die" domain subleaf value of this field would be M*N*L. In an asymmetric
>   topology this would be the summation of the value across the lower domain
>   level instances to create each upper domain level instance.) This number
>   reflects configuration as shipped by Intel. Note, software must not use
>   this field to enumerate processor topology.
> 
>   Software must not use the value of EBX[15:0] to enumerate processor topology
>   of the system. The value is only intended for display and diagnostic purposes.
>   The actual number of logical processors available to BIOS/OS/Applications
>   may be different from the value of EBX[15:0], depending on software and
>   platform hardware configurations."
> 
> This "_NOT_ to use for topology evaluation" sentence existed even before
> hybrid came along and got ignored. The code worked by chance, but with
> hybrid all bets are off. The code completely falls apart once CPUID leaf
> 0x1f enumerates any topology level between CORE and DIE, but that's not a
> suprise.
> 
> The proper thing to do is to actually evaluate the full topology including
> the non-present (hotpluggable) CPUs based on the APICIDs which are provided
> by the firmware and a proper topology domain level parser. This can exactly
> tell the number of physical packages, logical packages etc. _before_ even
> booting a single AP. All of that can be evaluated upfront.
> 
> Aside of that there are too many places which do their own topology
> evaluation, but there is absolutely no central point which can actually
> provide all of that information in a consistent way. This needs to change.
> 
> This series implements another piece towards this: sane CPUID evaluation,
> which is done at _one_ place in a proper well defined order instead of
> having it sprinkled all over the CPUID evaluation code.
> 
> At the end of this series this is pretty much bug compatible with the
> current CPUID evaluation code in respect to the cores per package
> evaluation, but it gets rid of overwriting things like smp_num_siblings,
> which is now written once, but is still not capable to work correctly on a
> hybrid machine which boots from a non SMT core. These things can only be
> fixed up in the next step(s).
> 
> When I tried to go further with this I ran into yet another pile of
> historical layers of duct tape and haywire with a gazillion of random
> variables sprinkled all over the place. That's still work in progress.  It
> actually works, but the last step which switches over is not yet in a shape
> that can be easily reviewed. Stay tuned.
> 
> The series is based on the APIC cleanup series:
> 
>   https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de
> 
> and also available on top of that from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
> 
> Thanks,
> 
> 	tglx
> ---
>  arch/x86/kernel/cpu/topology.c              |  168 --------------------
>  b/Documentation/arch/x86/topology.rst       |   12 -
>  b/arch/x86/events/amd/core.c                |    2 
>  b/arch/x86/events/amd/uncore.c              |    2 
>  b/arch/x86/events/intel/uncore.c            |    2 
>  b/arch/x86/include/asm/apic.h               |    1 
>  b/arch/x86/include/asm/cacheinfo.h          |    3 
>  b/arch/x86/include/asm/cpuid.h              |   32 +++
>  b/arch/x86/include/asm/processor.h          |   58 ++++--
>  b/arch/x86/include/asm/smp.h                |    2 
>  b/arch/x86/include/asm/topology.h           |   51 +++++-
>  b/arch/x86/include/asm/x86_init.h           |    2 
>  b/arch/x86/kernel/amd_nb.c                  |    8 
>  b/arch/x86/kernel/apic/apic_flat_64.c       |    7 
>  b/arch/x86/kernel/apic/apic_noop.c          |    3 
>  b/arch/x86/kernel/apic/apic_numachip.c      |   11 -
>  b/arch/x86/kernel/apic/bigsmp_32.c          |    6 
>  b/arch/x86/kernel/apic/local.h              |    1 
>  b/arch/x86/kernel/apic/probe_32.c           |    6 
>  b/arch/x86/kernel/apic/x2apic_cluster.c     |    1 
>  b/arch/x86/kernel/apic/x2apic_phys.c        |    6 
>  b/arch/x86/kernel/apic/x2apic_uv_x.c        |   63 +------
>  b/arch/x86/kernel/cpu/Makefile              |    5 
>  b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
>  b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++----
>  b/arch/x86/kernel/cpu/centaur.c             |    4 
>  b/arch/x86/kernel/cpu/common.c              |  108 +-----------
>  b/arch/x86/kernel/cpu/cpu.h                 |   14 +
>  b/arch/x86/kernel/cpu/debugfs.c             |   98 +++++++++++
>  b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
>  b/arch/x86/kernel/cpu/intel.c               |   38 ----
>  b/arch/x86/kernel/cpu/mce/amd.c             |    4 
>  b/arch/x86/kernel/cpu/mce/apei.c            |    4 
>  b/arch/x86/kernel/cpu/mce/core.c            |    4 
>  b/arch/x86/kernel/cpu/mce/inject.c          |    7 
>  b/arch/x86/kernel/cpu/proc.c                |    8 
>  b/arch/x86/kernel/cpu/topology.h            |   51 ++++++
>  b/arch/x86/kernel/cpu/topology_amd.c        |  179 +++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_common.c     |  233 ++++++++++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_ext.c        |  136 ++++++++++++++++
>  b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
>  b/arch/x86/kernel/smpboot.c                 |   64 ++++---
>  b/arch/x86/kernel/vsmp_64.c                 |   13 -
>  b/arch/x86/mm/amdtopology.c                 |   35 +---
>  b/arch/x86/xen/apic.c                       |    8 
>  b/drivers/edac/amd64_edac.c                 |    4 
>  b/drivers/edac/mce_amd.c                    |    4 
>  b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 
>  b/drivers/hwmon/fam15h_power.c              |    7 
>  b/drivers/scsi/lpfc/lpfc_init.c             |    8 
>  b/drivers/virt/acrn/hsm.c                   |    2 
>  51 files changed, 964 insertions(+), 881 deletions(-)
> 
> 
> 

  parent reply	other threads:[~2023-07-25 20:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24 17:43 [patch 00/29] x86/cpu: Rework the topology evaluation Thomas Gleixner
2023-07-24 17:43 ` [patch 01/29] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
2023-07-25  7:46   ` Thomas Gleixner
2023-07-24 17:43 ` [patch 02/29] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
2023-07-24 17:43 ` [patch 03/29] x86/cpu: Move cpu_die_id " Thomas Gleixner
2023-07-24 17:43 ` [patch 04/29] scsi: lpfc: Use topology_core_id() Thomas Gleixner
2023-07-24 17:43 ` [patch 05/29] hwmon: (fam15h_power) " Thomas Gleixner
2023-07-24 17:53   ` Guenter Roeck
2023-07-24 17:43 ` [patch 06/29] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 07/29] x86/cpu: Move cu_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 08/29] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 09/29] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 10/29] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 11/29] x86/cpu: Provide debug interface Thomas Gleixner
2023-07-24 17:44 ` [patch 12/29] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
2023-07-24 17:44 ` [patch 13/29] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
2023-07-24 17:44 ` [patch 14/29] x86/cpu: Add legacy topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 15/29] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
2023-07-24 17:44 ` [patch 16/29] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
2023-07-24 17:44 ` [patch 17/29] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
2023-07-24 20:49   ` Peter Zijlstra
2023-07-24 21:02     ` Peter Zijlstra
2023-07-25  6:51     ` Thomas Gleixner
2023-07-24 17:44 ` [patch 18/29] x86/cpu: Use common topology code for Intel Thomas Gleixner
2023-07-24 17:44 ` [patch 19/29] x86/cpu/amd: Provide a separate acessor for Node ID Thomas Gleixner
2023-07-24 17:44 ` [patch 20/29] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 21/29] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
2023-07-24 17:44 ` [patch 22/29] x86/cpu: Use common topology code for AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 23/29] x86/cpu: Use common topology code for HYGON Thomas Gleixner
2023-07-24 17:44 ` [patch 24/29] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 25/29] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
2023-07-24 17:44 ` [patch 26/29] x86/cpu: Remove topology.c Thomas Gleixner
2023-07-24 17:44 ` [patch 27/29] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 28/29] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
2023-07-24 17:44 ` [patch 29/29] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
2023-07-24 21:05 ` [patch 00/29] x86/cpu: Rework the topology evaluation Peter Zijlstra
2023-07-25 20:17 ` Dimitri Sivanich [this message]
2023-07-26 22:15 ` Sohil Mehta
2023-07-26 22:38   ` Thomas Gleixner
2023-07-26 23:27     ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230725201614.GB97682@hpe.com \
    --to=sivanich@hpe.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=arjan@linux.intel.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=dimitri.sivanich@hpe.com \
    --cc=james.smart@broadcom.com \
    --cc=jdelvare@suse.com \
    --cc=jejb@linux.ibm.com \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    --cc=ray.huang@amd.com \
    --cc=russ.anderson@hpe.com \
    --cc=steve.wahl@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox