public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86@kernel.org, regressions@lists.linux.dev,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Mario Limonciello <mario.limonciello@amd.com>,
	Eric DeVolder <eric.devolder@oracle.com>,
	linux-kernel@vger.kernel.org
Subject: Re: AMD topology broken on various 754/AM2+/AM3/AM3+ systems causes NB/EDAC/GART regression since 6.14
Date: Mon, 3 Nov 2025 08:40:21 +0100	[thread overview]
Message-ID: <20251103084021.61971a89.michal.pecio@gmail.com> (raw)
In-Reply-To: <20251024213204.GA311478@yaz-khff2.amd.com>

On Fri, 24 Oct 2025 17:32:04 -0400, Yazen Ghannam wrote:
> So far, I think the way to go is add explicit quirk for known issues.
> 
> Please see the patch below.
> 
> Thanks,
> Yazen
> 
> 
> From eeb0745e973055d8840b536cfa842d6f2bf4ac52 Mon Sep 17 00:00:00 2001
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> Date: Fri, 24 Oct 2025 21:19:26 +0000
> Subject: [PATCH] x86/topology: Add helper to ignore bogus MADT entries
> 
> Some older Intel and AMD systems include bogus ACPI MADT entries. These
> entries show as "disabled". And it's not clear if they are physically
> present but offline, i.e halted. Or if they are not physically present
> at all.
> 
> Ideally, if they are not physically present, then they should not be
> listed in MADT. There doesn't seem to be any explicit x86 topology info
> that can be used to verify if the entries are bogus or not.
> 
> Add a  helper function to collect vendor-specific checks to ignore bogus
> APIC IDs. Start with known quirks for an Intel SNB model and older AMD
> K10 models.
> 
> Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/kernel/cpu/topology.c | 52 ++++++++++++++++++++++++++--------
>  1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 6073a16628f9..704788b92395 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -219,6 +219,45 @@ static unsigned int topo_unit_count(u32 lvlid, enum x86_topology_domains at_leve
>  	return cnt;
>  }
>  
> +/*
> + * Some older BIOSes include extra entries in MADT.
> + * Do some vendor-specific checks to ignore them.
> + */
> +static bool ignore_extra_apic_entry(u32 apic_id)
> +{
> +	u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> +	/* Allow "physically not possible" cases if in a guest. */
> +	if (!hypervisor_is_type(X86_HYPER_NATIVE))
> +	       return false;
> +
> +	/* This model only supports 8 threads in a package. */
> +	if (c->x86_vendor == X86_VENDOR_INTEL &&
> +	    c->x86 == 0x6 && c->x86_model == 0x2d) {
> +		if (topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map) >= 8)

Looks like possible functional change compared to the code it replaces,
perhaps it should be a separate patch?

> +			goto reject;
> +	}
> +
> +	/*
> +	 * Various older models have extra entries. A common trait is that the
> +	 * package ID derived from the APIC ID would be more than was ever supported.
> +	 */
> +	if (c->x86_vendor == X86_VENDOR_AMD && c->x86 < 0x17) {
> +		pkgid >>= x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN - 1];
> +
> +		if (pkgid >= 8)
> +			goto reject;

Yes, this works. The excess entries are counted as "rejected" rather
than "hotpluggable", package count is correct and EDAC/GART are back.

But to be exact, the known case is (apic_id >= 0x80) rather than
(pkgid >= 8). The latter assumes not only that there are no more than
8 packages, but also that their IDs start from zero.

I have this AM4 system with some proprietary HP BIOS:

[02Fh 0047 001h]               Local Apic ID : 10
[037h 0055 001h]               Local Apic ID : 11
[03Fh 0063 001h]               Local Apic ID : 12
[047h 0071 001h]               Local Apic ID : 13

domain: Thread     shift: 0 dom_size:     1 max_threads:     1
domain: Core       shift: 4 dom_size:    16 max_threads:    16
domain: Module     shift: 4 dom_size:     1 max_threads:    16
domain: Tile       shift: 4 dom_size:     1 max_threads:    16
domain: Die        shift: 4 dom_size:     1 max_threads:    16
domain: DieGrp     shift: 4 dom_size:     1 max_threads:    16
domain: Package    shift: 4 dom_size:     1 max_threads:    16

It seems that pkgid is 0x1 here, which is not a problem because
it's single socket, but dunno if HP or somebody else couldn't do
similar things in an 8-socket system and end up with pkgid > 8.


By the way, do you know what's the reason I don't have /sys directories
for those phantom CPUs (before this patch) and should I have them if
they were legitimate hotplug CPUs?

Maybe there is already some check performed in other part of the kernel
which rejects those CPUs and which could be replicated here.

> +	}
> +
> +	return false;
> +
> +reject:
> +	pr_info_once("Ignoring hot-pluggable APIC ID %x.\n", apic_id);
> +	topo_info.nr_rejected_cpus++;
> +	return true;
> +}
> +
>  static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>  {
>  	int cpu, dom;
> @@ -240,19 +279,8 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
>  		cpuid_to_apicid[cpu] = apic_id;
>  		topo_set_cpuids(cpu, apic_id, acpi_id);
>  	} else {
> -		u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
> -
> -		/*
> -		 * Check for present APICs in the same package when running
> -		 * on bare metal. Allow the bogosity in a guest.
> -		 */
> -		if (hypervisor_is_type(X86_HYPER_NATIVE) &&
> -		    topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map)) {
> -			pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
> -				     apic_id);
> -			topo_info.nr_rejected_cpus++;
> +		if (ignore_extra_apic_entry(apic_id))
>  			return;
> -		}
>  
>  		topo_info.nr_disabled_cpus++;
>  	}
> -- 
> 2.51.1
> 

  parent reply	other threads:[~2025-11-03  7:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20251024204658.3da9bf3f.michal.pecio@gmail.com>
2025-10-24 21:32 ` AMD topology broken on various 754/AM2+/AM3/AM3+ systems causes NB/EDAC/GART regression since 6.14 Yazen Ghannam
2025-10-27 16:18   ` Mario Limonciello
2025-10-28 14:36     ` Yazen Ghannam
2025-11-03  7:40   ` Michal Pecio [this message]
2025-11-03 14:38     ` Yazen Ghannam
2025-11-03 17:12       ` Michal Pecio
2025-11-04 15:23         ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251103084021.61971a89.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=bp@alien8.de \
    --cc=eric.devolder@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=mingo@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox