* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
[not found] <CA+2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com>
@ 2023-12-10 11:05 ` Borislav Petkov
[not found] ` <CA+2tU585yW2z37NtYqjVe+ZMw+oU_vbS4X=Q=sAxME8GrPFDpg@mail.gmail.com>
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Borislav Petkov @ 2023-12-10 11:05 UTC (permalink / raw)
To: Chris Lindee
Cc: Thomas Gleixner, regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
On Sat, Dec 09, 2023 at 09:31:48PM -0600, Chris Lindee wrote:
> My Dell-EMC PowerEdge T340 server worked with v6.4.15, but it will not
> start with v6.5.x - it won't even display dmesg events during boot. I
> reproduced the issue on v6.7-rc4 and bisected the first problematic commit
> to:
>
> 0c7ffa32dbd6 x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup()
> and enable it
I presume booting with "cpuhp.parallel=0" on the kernel cmdline fixes
it?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
[not found] ` <CA+2tU585yW2z37NtYqjVe+ZMw+oU_vbS4X=Q=sAxME8GrPFDpg@mail.gmail.com>
@ 2023-12-12 14:34 ` Thomas Gleixner
[not found] ` <CA+2tU5_TzDQO2U8SGDYhsVPR8iYh8Q8vTqv9+HUc7LN3cV=2Sg@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2023-12-12 14:34 UTC (permalink / raw)
To: Chris Lindee, Borislav Petkov
Cc: regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
Chris!
On Mon, Dec 11 2023 at 12:32, Chris Lindee wrote:
> My system hardware remains available for testing. What further information
> do you need?
Can you please provide dmesg output from 6.7-rc4 with "cpuhp.parallel=0"
on the command line?
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
2023-12-10 11:05 ` [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4 Borislav Petkov
[not found] ` <CA+2tU585yW2z37NtYqjVe+ZMw+oU_vbS4X=Q=sAxME8GrPFDpg@mail.gmail.com>
@ 2023-12-12 15:32 ` Linux regression tracking #adding (Thorsten Leemhuis)
2024-03-11 19:47 ` Guenter Roeck
2 siblings, 0 replies; 11+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-12-12 15:32 UTC (permalink / raw)
To: regressions
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]
On 10.12.23 12:05, Borislav Petkov wrote:
> On Sat, Dec 09, 2023 at 09:31:48PM -0600, Chris Lindee wrote:
>> My Dell-EMC PowerEdge T340 server worked with v6.4.15, but it will not
>> start with v6.5.x - it won't even display dmesg events during boot. I
>> reproduced the issue on v6.7-rc4 and bisected the first problematic commit
>> to:
>>
>> 0c7ffa32dbd6 x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup()
>> and enable it
>
> I presume booting with "cpuhp.parallel=0" on the kernel cmdline fixes
> it?
To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced 0c7ffa32dbd6
#regzbot title x86: system stopped booting with v6.5
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
[not found] ` <CA+2tU5_TzDQO2U8SGDYhsVPR8iYh8Q8vTqv9+HUc7LN3cV=2Sg@mail.gmail.com>
@ 2023-12-13 14:34 ` Thomas Gleixner
2023-12-13 21:08 ` Thomas Gleixner
[not found] ` <CA+2tU5-RoR8qCGRdsMH0wo5n8v4fV8A_H5yT+mZadsF7E+QkWg@mail.gmail.com>
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2023-12-13 14:34 UTC (permalink / raw)
To: Chris Lindee
Cc: Borislav Petkov, regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
On Tue, Dec 12 2023 at 13:11, Chris Lindee wrote:
> On Tue, Dec 12, 2023 at 8:34 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Can you please provide dmesg output from 6.7-rc4 with "cpuhp.parallel=0"
>> on the command line?
>
> Oops, I already upgraded to 6.7-rc5
> <https://koji.fedoraproject.org/koji/buildinfo?buildID=2332379>; let me
> know if you would prefer I downgrade.
Nah. That's fine.
> [ 0.002775] x2apic: enabled by BIOS, switching to x2apic ops
So BIOS has x2apic enabled, but ...
> [ 0.136888] smp: Bringing up secondary CPUs ...
> [ 0.136888] smpboot: x86: Booting SMP configuration:
> [ 0.136888] .... node #0, CPUs: #1
> [ 0.013471] x2apic enabled
this is really strange as it means that the APs do not come up with
X2APIC enabled. Oh well. BIOS creativity is unlimited...
> [ 0.137980] #2 #3
> [ 0.139937] smp: Brought up 1 node, 4 CPUs
Though that does not really explain why the parallel bringup goes south
as that is not really different from what the serialized bringup is
doing in terms of kicking the APs into life.
Can you try and add "ignore_loglevel earlyprintk=ttyS0,115200n8" to the
kernel command line and capture dmesg on the serial port when booting
with the parallel hotplug enabled?
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
2023-12-13 14:34 ` Thomas Gleixner
@ 2023-12-13 21:08 ` Thomas Gleixner
[not found] ` <CA+2tU5-RoR8qCGRdsMH0wo5n8v4fV8A_H5yT+mZadsF7E+QkWg@mail.gmail.com>
1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2023-12-13 21:08 UTC (permalink / raw)
To: Chris Lindee
Cc: Borislav Petkov, regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
On Wed, Dec 13 2023 at 15:34, Thomas Gleixner wrote:
> On Tue, Dec 12 2023 at 13:11, Chris Lindee wrote:
> Though that does not really explain why the parallel bringup goes south
> as that is not really different from what the serialized bringup is
> doing in terms of kicking the APs into life.
>
> Can you try and add "ignore_loglevel earlyprintk=ttyS0,115200n8" to the
> kernel command line and capture dmesg on the serial port when booting
> with the parallel hotplug enabled?
Don't bother it wont tell us much and I decoded the issue already. Will
send a patch tomorrow.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
[not found] ` <CA+2tU5-18YQSpFzgUKyeZdJXe0Rs0GxtYY2s5QjznzTucJKCiQ@mail.gmail.com>
@ 2023-12-14 7:04 ` Thomas Gleixner
[not found] ` <CA+2tU596G+auWJ2MzYFJ3bv=Y4zP2NSrdtak7SdF9SH6Ht_dzg@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2023-12-14 7:04 UTC (permalink / raw)
To: Chris Lindee
Cc: Borislav Petkov, regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
On Wed, Dec 13 2023 at 23:04, Chris Lindee wrote:
> On Wed, Dec 13, 2023 at 7:35 PM Chris Lindee <chris.lindee@gmail.com> wrote:
>
>> The following are the processor options made available by the BIOS, with
>> the currently selected value. If there are any interesting combinations
>> that you'd like tested, I can try a handful of them.
>>
>
>> x2APIC Mode: Enabled (requires Virtualization Technology to enable)
>>
>
> I disabled x2APIC and the system would boot, both without "cpuhp.parallel"
> and with "cpuhp.parallel=1" on the command line.
I would have expected that.
With the completely untested patch below it should also boot in X2APIC
mode even with the broken AP configuration in the BIOS.
Thanks,
tglx
---
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -255,6 +255,22 @@ SYM_INNER_LABEL(secondary_startup_64_no_
testl $X2APIC_ENABLE, %eax
jnz .Lread_apicid_msr
+#ifdef CONFIG_X86_X2APIC
+ /*
+ * If system is in X2APIC mode then MMIO base mignt not be
+ * mapped causing the MMIO read below to fault. Faults can't
+ * be handled at that point.
+ */
+ cmpl $0, x2apic_mode(%rip)
+ jz .Lread_apicid_mmio
+
+ /* Force the AP into X2APIC mode. */
+ orl $X2APIC_ENABLE, %eax
+ wrmsr
+ jmp .Lread_apicid_msr
+#endif
+
+.Lread_apicid_mmio:
/* Read the APIC ID from the fix-mapped MMIO space. */
movq apic_mmio_base(%rip), %rcx
addq $APIC_ID, %rcx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
[not found] ` <CA+2tU596G+auWJ2MzYFJ3bv=Y4zP2NSrdtak7SdF9SH6Ht_dzg@mail.gmail.com>
@ 2023-12-15 8:58 ` Thomas Gleixner
2023-12-15 15:41 ` Ashok Raj
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2023-12-15 8:58 UTC (permalink / raw)
To: Chris Lindee
Cc: Borislav Petkov, regressions, Ingo Molnar, Dave Hansen, x86,
Peter Zijlstra (Intel), Ashok Raj, David Woodhouse
Chris reported that a Dell PowerEdge T340 system stopped to boot when
upgrading to a kernel which contains the parallel hotplug changes.
Disabling parallel hotplug on the kernel command line makes it boot again.
It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes
up in X2APIC mode, but the APs come up inconsistently in xAPIC mode.
Parallel hotplug requires that the upcoming CPU reads out its APIC ID from
the local APIC in order to map it to the Linux CPU number.
In this particular case the readout on the APs uses the MMIO mapped
registers because the BIOS failed to enable x2APIC mode. That readout
results in a pagefault because the kernel does not have the APIC MMIO space
mapped when X2APIC mode was enabled by the BIOS on the boot CPU and the
kernel switched to X2APIC mode early. That page fault can't be handled on
the upcoming CPU that early and results in a silent boot failure.
If parallel hotplug is disabled the system boots because in that case
the APIC ID read is not required as the Linux CPU number is provided to
the AP in the smpboot control word. When the kernel uses x2APIC mode
then the APs are switched to x2APIC mode too slightly later in the
bringup process, but there is no reason to do it that late.
Cure the BIOS bogosity by checking in the parallel bootup path whether the
kernel uses x2APIC mode and if so switching over the APs to x2APIC mode
before the APIC ID readout.
Fixes: 0c7ffa32dbd6 "x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it"
Reported-by: Chris Lindee <chris.lindee@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chris Lindee <chris.lindee@gmail.com>
Cc: stable@vger.kernel.org
---
arch/x86/kernel/head_64.S | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -268,6 +268,22 @@ SYM_INNER_LABEL(secondary_startup_64_no_
testl $X2APIC_ENABLE, %eax
jnz .Lread_apicid_msr
+#ifdef CONFIG_X86_X2APIC
+ /*
+ * If system is in X2APIC mode then MMIO base mignt not be
+ * mapped causing the MMIO read below to fault. Faults can't
+ * be handled at that point.
+ */
+ cmpl $0, x2apic_mode(%rip)
+ jz .Lread_apicid_mmio
+
+ /* Force the AP into X2APIC mode. */
+ orl $X2APIC_ENABLE, %eax
+ wrmsr
+ jmp .Lread_apicid_msr
+#endif
+
+.Lread_apicid_mmio:
/* Read the APIC ID from the fix-mapped MMIO space. */
movq apic_mmio_base(%rip), %rcx
addq $APIC_ID, %rcx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
2023-12-15 8:58 ` [PATCH] x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully Thomas Gleixner
@ 2023-12-15 15:41 ` Ashok Raj
[not found] ` <CA+2tU5_hD-atkp9UcCJnL6TMneSOBxL87=ppvrQ1ugUn_0-7NA@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Ashok Raj @ 2023-12-15 15:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Chris Lindee, Borislav Petkov, regressions, Ingo Molnar,
Dave Hansen, x86, Peter Zijlstra (Intel), David Woodhouse,
Ashok Raj
On Fri, Dec 15, 2023 at 09:58:58AM +0100, Thomas Gleixner wrote:
> Chris reported that a Dell PowerEdge T340 system stopped to boot when
> upgrading to a kernel which contains the parallel hotplug changes.
> Disabling parallel hotplug on the kernel command line makes it boot again.
>
> It turns out that the Dell BIOS has x2APIC enabled and the boot CPU comes
> up in X2APIC mode, but the APs come up inconsistently in xAPIC mode.
>
> Parallel hotplug requires that the upcoming CPU reads out its APIC ID from
> the local APIC in order to map it to the Linux CPU number.
>
> In this particular case the readout on the APs uses the MMIO mapped
> registers because the BIOS failed to enable x2APIC mode. That readout
> results in a pagefault because the kernel does not have the APIC MMIO space
> mapped when X2APIC mode was enabled by the BIOS on the boot CPU and the
> kernel switched to X2APIC mode early. That page fault can't be handled on
> the upcoming CPU that early and results in a silent boot failure.
>
> If parallel hotplug is disabled the system boots because in that case
> the APIC ID read is not required as the Linux CPU number is provided to
> the AP in the smpboot control word. When the kernel uses x2APIC mode
> then the APs are switched to x2APIC mode too slightly later in the
> bringup process, but there is no reason to do it that late.
>
> Cure the BIOS bogosity by checking in the parallel bootup path whether the
> kernel uses x2APIC mode and if so switching over the APs to x2APIC mode
> before the APIC ID readout.
>
> Fixes: 0c7ffa32dbd6 "x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it"
> Reported-by: Chris Lindee <chris.lindee@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Chris Lindee <chris.lindee@gmail.com>
> Cc: stable@vger.kernel.org
> ---
> arch/x86/kernel/head_64.S | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully
[not found] ` <CA+2tU5_hD-atkp9UcCJnL6TMneSOBxL87=ppvrQ1ugUn_0-7NA@mail.gmail.com>
@ 2023-12-15 18:35 ` Borislav Petkov
0 siblings, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2023-12-15 18:35 UTC (permalink / raw)
To: Chris Lindee
Cc: Ashok Raj, Thomas Gleixner, regressions, Ingo Molnar, Dave Hansen,
x86, Peter Zijlstra (Intel), David Woodhouse
On Fri, Dec 15, 2023 at 12:18:43PM -0600, Chris Lindee wrote:
> I think there is a typo in the code comment: s/mignt/might/
Fixed, thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
2023-12-10 11:05 ` [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4 Borislav Petkov
[not found] ` <CA+2tU585yW2z37NtYqjVe+ZMw+oU_vbS4X=Q=sAxME8GrPFDpg@mail.gmail.com>
2023-12-12 15:32 ` [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4 Linux regression tracking #adding (Thorsten Leemhuis)
@ 2024-03-11 19:47 ` Guenter Roeck
2024-03-12 9:13 ` Thorsten Leemhuis
2 siblings, 1 reply; 11+ messages in thread
From: Guenter Roeck @ 2024-03-11 19:47 UTC (permalink / raw)
To: regressions
Cc: Chris Lindee, Thomas Gleixner, regressions, Ingo Molnar,
Dave Hansen, x86, Peter Zijlstra (Intel), Ashok Raj,
David Woodhouse
On Sun, Dec 10, 2023 at 12:05:18PM +0100, Borislav Petkov wrote:
> On Sat, Dec 09, 2023 at 09:31:48PM -0600, Chris Lindee wrote:
> > My Dell-EMC PowerEdge T340 server worked with v6.4.15, but it will not
> > start with v6.5.x - it won't even display dmesg events during boot. I
> > reproduced the issue on v6.7-rc4 and bisected the first problematic commit
> > to:
> >
> > 0c7ffa32dbd6 x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup()
> > and enable it
>
> I presume booting with "cpuhp.parallel=0" on the kernel cmdline fixes
> it?
>
#regzbot fix: 69a7386c1ec2
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4
2024-03-11 19:47 ` Guenter Roeck
@ 2024-03-12 9:13 ` Thorsten Leemhuis
0 siblings, 0 replies; 11+ messages in thread
From: Thorsten Leemhuis @ 2024-03-12 9:13 UTC (permalink / raw)
To: regressions, Guenter Roeck
On 11.03.24 20:47, Guenter Roeck wrote:
> On Sun, Dec 10, 2023 at 12:05:18PM +0100, Borislav Petkov wrote:
>> On Sat, Dec 09, 2023 at 09:31:48PM -0600, Chris Lindee wrote:
>>> My Dell-EMC PowerEdge T340 server worked with v6.4.15, but it will not
>>> start with v6.5.x - it won't even display dmesg events during boot. I
>>> reproduced the issue on v6.7-rc4 and bisected the first problematic commit
>>> to:
>>>
>>> 0c7ffa32dbd6 x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup()
>>> and enable it
>>
>> I presume booting with "cpuhp.parallel=0" on the kernel cmdline fixes
>> it?
>>
> #regzbot fix: 69a7386c1ec2
Many thx, I had missed that. Sadly regzbot was unable to handle this, as
there was no blank line before the regzbot command. I wonder if I should
drop that requirement...
#regzbot fix: 69a7386c1ec2
Ciao, Thorsten
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-12 9:13 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+2tU59853R49EaU_tyvOZuOTDdcU0RshGyydccp9R1NX9bEeQ@mail.gmail.com>
2023-12-10 11:05 ` [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4 Borislav Petkov
[not found] ` <CA+2tU585yW2z37NtYqjVe+ZMw+oU_vbS4X=Q=sAxME8GrPFDpg@mail.gmail.com>
2023-12-12 14:34 ` Thomas Gleixner
[not found] ` <CA+2tU5_TzDQO2U8SGDYhsVPR8iYh8Q8vTqv9+HUc7LN3cV=2Sg@mail.gmail.com>
2023-12-13 14:34 ` Thomas Gleixner
2023-12-13 21:08 ` Thomas Gleixner
[not found] ` <CA+2tU5-RoR8qCGRdsMH0wo5n8v4fV8A_H5yT+mZadsF7E+QkWg@mail.gmail.com>
[not found] ` <CA+2tU5-18YQSpFzgUKyeZdJXe0Rs0GxtYY2s5QjznzTucJKCiQ@mail.gmail.com>
2023-12-14 7:04 ` Thomas Gleixner
[not found] ` <CA+2tU596G+auWJ2MzYFJ3bv=Y4zP2NSrdtak7SdF9SH6Ht_dzg@mail.gmail.com>
2023-12-15 8:58 ` [PATCH] x86/smpboot/64: Handle X2APIC BIOS inconsistency gracefully Thomas Gleixner
2023-12-15 15:41 ` Ashok Raj
[not found] ` <CA+2tU5_hD-atkp9UcCJnL6TMneSOBxL87=ppvrQ1ugUn_0-7NA@mail.gmail.com>
2023-12-15 18:35 ` Borislav Petkov
2023-12-12 15:32 ` [REGRESSION] x86/smpboot/64: System will not boot on v6.5, v6.7-rc4 Linux regression tracking #adding (Thorsten Leemhuis)
2024-03-11 19:47 ` Guenter Roeck
2024-03-12 9:13 ` Thorsten Leemhuis
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.