From: Thomas Gleixner <tglx@linutronix.de>
To: Peter Schneider <pschneider1968@googlemail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, stable@vger.kernel.org,
regressions@lists.linux.dev
Subject: Re: Kernel 6.9 regression: X86: Bogus messages from topology detection
Date: Thu, 30 May 2024 15:35:55 +0200 [thread overview]
Message-ID: <87r0dj8ls4.ffs@tglx> (raw)
In-Reply-To: <76b1e0b9-26ae-4915-920d-9093f057796b@googlemail.com>
Peter!
On Thu, May 30 2024 at 12:06, Peter Schneider wrote:
> Am 30.05.24 um 10:30 schrieb Thomas Gleixner:
>
>> Can you please apply the debug patch below ad provide the full dmesg
>> after boot?
>
> Here you go... The patch applied cleanly against 6.9.3, which I saw
> was just released by Greg, so I used that. If you want, I can repeat
> the test against 6.9.2, too.
.3 is fine
> Please note: to be able to boot any kernel >= 6.8.4 on my machine, I also had to apply
> this patch by Martin Petersen, fixing another (unrelated SCSI) regression I reported some
> time ago, see here:
>
> https://lore.kernel.org/all/20240521023040.2703884-1-martin.petersen@oracle.com/
>
> But I think these two issues are not connected in any way. It was during testing the above
> patch by Martin that I noticed this new issue in 6.9 BTW.
Right. It's a seperate problem.
> I have attached resulting file dmesg_6.9.3-dirty_Bad_wDebugInfo.txt,
> and I hope you can make some sense of it.
It's exactly what I expected but it does not make any sense at all.
> [ 0.000000] Legacy: 2 5 5
So that means that during early boot where the topology parameters are
decoded from CPUID the CPUID evaluation code sees that the maximum
supported CPUID leaf is 0x02 and it therefore reads complete non-sense.
Later on when the full CPUID evaluation happens it sees the full space
and uses leaf 0xb.
> [ 1.687649] L:b 0 0 S:1 N:2 T:1
> [ 1.687652] D: 0
> [ 1.687653] L:b 1 1 S:5 N:24 T:2
> [ 1.687655] D: 1
> [ 1.687656] L:b 2 2 S:0 N:0 T:0
> [ 1.687658] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
And this obviously sees the proper numbers and complains about the
inconsistency.
So something on this CPU is broken. The same problem exists on all APs:
> [ 1.790035] .... node #0, CPUs: #4
> [ 1.790312] .... node #1, CPUs: #12 #16
> [ 0.011992] Legacy: 2 5 5
> [ 0.011992] Legacy: 2 5 5
> [ 0.011992] Legacy: 2 5 5
> [ 0.011992] Legacy: 2 5 5
.....
Now the million-dollar question is what unlocks CPUID to read the proper
value of EAX of leaf 0. All I could come up with is to sprinkle a dozen
of printks into that code. Updated debug patch below.
Thanks,
tglx
---
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -65,6 +65,7 @@ static void parse_legacy(struct topo_sca
cores <<= smt_shift;
}
+ pr_info("Legacy: %u %u %u\n", c->cpuid_level, smt_shift, core_shift);
topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift);
topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores);
}
--- a/arch/x86/kernel/cpu/topology_ext.c
+++ b/arch/x86/kernel/cpu/topology_ext.c
@@ -72,6 +72,9 @@ static inline bool topo_subleaf(struct t
cpuid_subleaf(leaf, subleaf, &sl);
+ pr_info("L:%0x %0x %0x S:%u N:%u T:%u\n", leaf, subleaf, sl.level, sl.x2apic_shift,
+ sl.num_processors, sl.type);
+
if (!sl.num_processors || sl.type == INVALID_TYPE)
return false;
@@ -97,6 +100,7 @@ static inline bool topo_subleaf(struct t
leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id);
}
+ pr_info("D: %u\n", dom);
topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors);
return true;
}
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1584,22 +1584,30 @@ static void __init early_identify_cpu(st
/* cyrix could have cpuid enabled via c_identify()*/
if (have_cpuid_p()) {
cpu_detect(c);
+ pr_info("MAXL1: %x\n", cpuid_eax(0));
get_cpu_vendor(c);
+ pr_info("MAXL2: %x\n", cpuid_eax(0));
get_cpu_cap(c);
+ pr_info("MAXL3: %x\n", cpuid_eax(0));
setup_force_cpu_cap(X86_FEATURE_CPUID);
get_cpu_address_sizes(c);
+ pr_info("MAXL4: %x\n", cpuid_eax(0));
cpu_parse_early_param();
+ pr_info("MAXL5: %x\n", cpuid_eax(0));
cpu_init_topology(c);
+ pr_info("MAXL6: %x\n", cpuid_eax(0));
if (this_cpu->c_early_init)
this_cpu->c_early_init(c);
+ pr_info("MAXL7: %x\n", cpuid_eax(0));
c->cpu_index = 0;
filter_cpuid_features(c, false);
if (this_cpu->c_bsp_init)
this_cpu->c_bsp_init(c);
+ pr_info("MAXL8: %x\n", cpuid_eax(0));
} else {
setup_clear_cpu_cap(X86_FEATURE_CPUID);
get_cpu_address_sizes(c);
@@ -1797,9 +1805,12 @@ static void identify_cpu(struct cpuinfo_
#ifdef CONFIG_X86_VMX_FEATURE_NAMES
memset(&c->vmx_capability, 0, sizeof(c->vmx_capability));
#endif
+ pr_info("MAXLG1: %x\n", cpuid_eax(0));
generic_identify(c);
+ pr_info("MAXLG2: %x\n", cpuid_eax(0));
+
cpu_parse_topology(c);
if (this_cpu->c_identify)
next prev parent reply other threads:[~2024-05-30 13:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-27 7:29 Kernel 6.9 regression: X86: Bogus messages from topology detection Peter Schneider
2024-05-27 13:14 ` Thomas Gleixner
2024-05-27 20:49 ` Thomas Gleixner
2024-05-27 21:06 ` Peter Schneider
2024-05-27 21:15 ` Peter Schneider
2024-05-27 21:34 ` Christian Heusel
2024-05-30 8:30 ` Thomas Gleixner
2024-05-30 10:06 ` Peter Schneider
2024-05-30 13:35 ` Thomas Gleixner [this message]
2024-05-30 15:53 ` Thomas Gleixner
2024-05-30 16:24 ` Thomas Gleixner
2024-05-31 6:52 ` Peter Schneider
2024-05-31 8:33 ` Thomas Gleixner
2024-05-31 8:42 ` Thomas Gleixner
2024-05-31 9:41 ` Peter Schneider
2024-05-31 10:07 ` Thomas Gleixner
2024-05-31 10:22 ` Peter Schneider
2024-06-01 7:06 ` Linux regression tracking (Thorsten Leemhuis)
2024-06-01 7:20 ` Thomas Gleixner
2024-06-01 7:25 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-31 8:13 ` Christian Heusel
2024-05-31 8:16 ` Christian Heusel
2024-05-31 8:48 ` Thomas Gleixner
2024-05-31 9:11 ` Thomas Gleixner
2024-05-31 13:08 ` Christian Heusel
2024-05-31 13:42 ` Thomas Gleixner
2024-05-31 14:29 ` Christian Heusel
2024-05-31 15:25 ` Thomas Gleixner
2024-05-31 11:06 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r0dj8ls4.ffs@tglx \
--to=tglx@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=pschneider1968@googlemail.com \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).