linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9
@ 2024-09-11  8:53 Niels Dettenbach
  2024-09-11 15:49 ` [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 (updated) Niels Dettenbach
  0 siblings, 1 reply; 3+ messages in thread
From: Niels Dettenbach @ 2024-09-11  8:53 UTC (permalink / raw)
  To: linux-arch; +Cc: stable, trivial

virtual machines under Xen Hypervisor (DomU) running in Xen PV mode use a 
special, nonstandard synthetized CPU topology which "just works" under 
kernels 6.9.x while newer kernels assuming a "crash kernel" and disable 
SMT (reducing to one CPU core) because the newer topology implementation 
produces a wrong error "[Firmware Bug]: APIC enumeration order not 
specification compliant" after new topology checks which are improper for 
Xen PV platform. As a result, the kernel disables SMT and activates just 
one CPU core within the VM (DomU).

The patch disables the regarding checks if it is running in Xen PV 
mode (only) and bring back SMT / all CPUs as in the past to such DomU 
VMs.

Signed-off-by: Niels Dettenbach <nd@syndicat.com>

---


The current behaviour leads all of our production Xen Host platforms 
unusable after updating to newer linux kernels (with just one CPU 
available/activated per VM) while older kernels other OS still work 
fully (and stable since many years on the platform). 

Xen PV mode is still provided by current Xen and widely used - even 
if less wide as the newer Xen PVH mode today. So a solution probably 
will be required.

So we assume that bug affects stable@vger.kernel.org as well.


dmesg from affected kernel:

-- snip --
Sep 10 14:35:50 ffm1 kernel: [    0.640364] CPU topo: Enumerated BSP APIC 0 is not marked in APICBASE MSR
Sep 10 14:35:50 ffm1 kernel: [    0.640367] CPU topo: Assuming crash kernel. Limiting to one CPU to prevent machine INIT
Sep 10 14:35:50 ffm1 kernel: [    0.640368] CPU topo: [Firmware Bug]: APIC enumeration order not specification compliant
Sep 10 14:35:50 ffm1 kernel: [    0.640376] CPU topo: Max. logical packages:   1
Sep 10 14:35:50 ffm1 kernel: [    0.640378] CPU topo: Max. logical dies:       1
Sep 10 14:35:50 ffm1 kernel: [    0.640379] CPU topo: Max. dies per package:   1
Sep 10 14:35:50 ffm1 kernel: [    0.640386] CPU topo: Max. threads per core:   1
Sep 10 14:35:50 ffm1 kernel: [    0.640388] CPU topo: Num. cores per package:     1
Sep 10 14:35:50 ffm1 kernel: [    0.640389] CPU topo: Num. threads per package:   1
Sep 10 14:35:50 ffm1 kernel: [    0.640390] CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
Sep 10 14:35:50 ffm1 kernel: [    0.640402] Cannot find an available gap in the 32-bit address range
-- snap --

We tested the patch intensely under productive / high load since 2 days now with no issues.


references:
arch/x86/kernel/cpu/topology.c
[line 448]
--- snip ---
        /*
         * XEN PV is special as it does not advertise the local APIC
         * properly, but provides a fake topology for it so that the
         * infrastructure works. So don't apply the restrictions vs. APIC
         * here.
         */
---snap ---

Related errors / tickets:
https://forum.qubes-os.org/t/fedora-sees-only-1-cpu-core-after-updating-the-kernel-to-6-9-x/27205/5






--- linux/arch/x86/kernel/cpu/topology.c.orig   2024-09-11 09:53:16.194095250 +0200
+++ linux/arch/x86/kernel/cpu/topology.c        2024-09-11 09:55:17.338448094 +0200
@@ -158,7 +158,7 @@ static __init bool check_for_real_bsp(u3
                is_bsp = !!(msr & MSR_IA32_APICBASE_BSP);
        }

-       if (apic_id == topo_info.boot_cpu_apic_id) {
+       if (!xen_pv_domain() && apic_id == topo_info.boot_cpu_apic_id) {
                /*
                 * If the boot CPU has the APIC BSP bit set then the
                 * firmware enumeration is agreeing. If the CPU does not
@@ -185,7 +185,7 @@ static __init bool check_for_real_bsp(u3
        pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x != %x\n",
                topo_info.boot_cpu_apic_id, apic_id);

-       if (is_bsp) {
+       if (!xen_pv_domain() && is_bsp) {
                /*
                 * The boot CPU has the APIC BSP bit set. Use it and complain
                 * about the broken firmware enumeration.







^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 (updated)
  2024-09-11  8:53 [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 Niels Dettenbach
@ 2024-09-11 15:49 ` Niels Dettenbach
  2024-09-12  5:21   ` Greg KH
  0 siblings, 1 reply; 3+ messages in thread
From: Niels Dettenbach @ 2024-09-11 15:49 UTC (permalink / raw)
  To: linux-arch; +Cc: stable, trivial

Am Mittwoch, 11. September 2024, 10:53:30  schrieb Niels Dettenbach:
> virtual machines under Xen Hypervisor (DomU) running in Xen PV mode use a
> special, nonstandard synthetized CPU topology which "just works" under
> kernels 6.9.x while newer kernels assuming a "crash kernel" and disable
> SMT (reducing to one CPU core) because the newer topology implementation
> produces a wrong error "[Firmware Bug]: APIC enumeration order not
> specification compliant" after new topology checks which are improper for
> Xen PV platform. As a result, the kernel disables SMT and activates just
> one CPU core within the VM (DomU).
> 
> The patch disables the regarding checks if it is running in Xen PV
> mode (only) and bring back SMT / all CPUs as in the past to such DomU
> VMs.
> 
> Signed-off-by: Niels Dettenbach <nd@syndicat.com>
> 

Signed-off-by: Niels Dettenbach <nd@syndicat.com>
---

A reworked proposal patch which substitutes my initial proposed patch:


--- linux/arch/x86/kernel/cpu/topology.c        2024-09-11 17:42:42.699278317 +0200
+++ linux/arch/x86/kernel/cpu/topology.c.orig   2024-09-11 09:53:16.194095250 +0200
@@ -132,14 +132,6 @@
        u64 msr;

        /*
-        * assume Xen PV has a working (special) topology
-        */
-       if (xen_pv_domain()) {
-               topo_info.real_bsp_apic_id = topo_info.boot_cpu_apic_id;
-               return false;
-       }
-
-       /*
         * There is no real good way to detect whether this a kdump()
         * kernel, but except on the Voyager SMP monstrosity which is not
         * longer supported, the real BSP APIC ID is the first one which is

 







^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 (updated)
  2024-09-11 15:49 ` [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 (updated) Niels Dettenbach
@ 2024-09-12  5:21   ` Greg KH
  0 siblings, 0 replies; 3+ messages in thread
From: Greg KH @ 2024-09-12  5:21 UTC (permalink / raw)
  To: Niels Dettenbach; +Cc: linux-arch, stable, trivial

On Wed, Sep 11, 2024 at 05:49:46PM +0200, Niels Dettenbach wrote:
> Am Mittwoch, 11. September 2024, 10:53:30  schrieb Niels Dettenbach:
> > virtual machines under Xen Hypervisor (DomU) running in Xen PV mode use a
> > special, nonstandard synthetized CPU topology which "just works" under
> > kernels 6.9.x while newer kernels assuming a "crash kernel" and disable
> > SMT (reducing to one CPU core) because the newer topology implementation
> > produces a wrong error "[Firmware Bug]: APIC enumeration order not
> > specification compliant" after new topology checks which are improper for
> > Xen PV platform. As a result, the kernel disables SMT and activates just
> > one CPU core within the VM (DomU).
> > 
> > The patch disables the regarding checks if it is running in Xen PV
> > mode (only) and bring back SMT / all CPUs as in the past to such DomU
> > VMs.
> > 
> > Signed-off-by: Niels Dettenbach <nd@syndicat.com>
> > 
> 
> Signed-off-by: Niels Dettenbach <nd@syndicat.com>
> ---
> 
> A reworked proposal patch which substitutes my initial proposed patch:
> 
> 
> --- linux/arch/x86/kernel/cpu/topology.c        2024-09-11 17:42:42.699278317 +0200
> +++ linux/arch/x86/kernel/cpu/topology.c.orig   2024-09-11 09:53:16.194095250 +0200

<snip>

Please submit this to the proper developers and maintainers for this
code, as the get_maintainers script will tell you as the documentation
states.  As it is, this isn't going anywhere as you did not include them
properly.

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-09-12  5:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11  8:53 [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 Niels Dettenbach
2024-09-11 15:49 ` [PATCH 1/1] x86: SMT broken on Xen PV DomU since 6.9 (updated) Niels Dettenbach
2024-09-12  5:21   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).