public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Tom Lendacky <thomas.lendacky@amd.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Huang Rui <ray.huang@amd.com>, Juergen Gross <jgross@suse.com>,
	Dimitri Sivanich <dimitri.sivanich@hpe.com>,
	Sohil Mehta <sohil.mehta@intel.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Zhang Rui <rui.zhang@intel.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Feng Tang <feng.tang@intel.com>,
	Andy Shevchenko <andy@infradead.org>,
	Michael Kelley <mhklinux@outlook.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels
Date: Tue, 23 Jan 2024 14:11:14 +0100 (CET)	[thread overview]
Message-ID: <20240118123649.400523172@linutronix.de> (raw)
In-Reply-To: 20240118123127.055361964@linutronix.de

From: Thomas Gleixner <tglx@linutronix.de>

When a kdump kernel is started from a crashing CPU then there is no
guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
tries to online the BSP then the INIT sequence will reset the machine.

There is a command line option to prevent this, but in case of nested kdump
kernels this is wrong.

But that command line option is not required at all because the real
BSP is enumerated as the first CPU by firmware. Support for the only
known system which was different (Voyager) got removed long ago.

Detect whether the boot CPU APIC ID is the first APIC ID enumerated by
the firmware. If the first APIC ID enumerated is not matching the boot
CPU APIC ID then skip registering it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Check for the first enumerated APIC ID (Rui)
---
 Documentation/admin-guide/kdump/kdump.rst       |    7 -
 Documentation/admin-guide/kernel-parameters.txt |    9 --
 arch/x86/kernel/cpu/topology.c                  |   99 ++++++++++++++----------
 3 files changed, 61 insertions(+), 54 deletions(-)
---
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch
    CPU is enough for kdump kernel to dump vmcore on most of systems.
 
    However, you can also specify nr_cpus=X to enable multiple processors
-   in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
-   tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
-   admin-guide/kernel-parameters.txt for more details.
+   in kdump kernel.
 
    With CONFIG_SMP=n, the above things are not related.
 
@@ -454,8 +452,7 @@ loading dump-capture kernel.
   to use multi-thread programs with it, such as parallel dump feature of
   makedumpfile. Otherwise, the multi-thread program may have a great
   performance degradation. To enable multi-cpu support, you should bring up an
-  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
-  options while loading it.
+  SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.
 
 * For s390x there are two kdump modes: If a ELF header is specified with
   the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1098,15 +1098,6 @@
 			Disable TLBIE instruction. Currently does not work
 			with KVM, with HASH MMU, or with coherent accelerators.
 
-	disable_cpu_apicid= [X86,APIC,SMP]
-			Format: <int>
-			The number of initial APIC ID for the
-			corresponding CPU to be disabled at boot,
-			mostly used for the kdump 2nd kernel to
-			disable BSP to wake up multiple CPUs without
-			causing system reset or hang due to sending
-			INIT from AP to BSP.
-
 	disable_ddw	[PPC/PSERIES]
 			Disable Dynamic DMA Window support. Use this
 			to workaround buggy firmware.
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -32,18 +32,13 @@ static struct {
 	unsigned int		nr_disabled_cpus;
 	unsigned int		nr_rejected_cpus;
 	u32			boot_cpu_apic_id;
+	u32			real_bsp_apic_id;
 } topo_info __read_mostly = {
 	.nr_assigned_cpus	= 1,
 	.boot_cpu_apic_id	= BAD_APICID,
+	.real_bsp_apic_id	= BAD_APICID,
 };
 
-/*
- * Processor to be disabled specified by kernel parameter
- * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
- * avoid undefined behaviour caused by sending INIT from AP to BSP.
- */
-static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
-
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
 	return phys_id == (u64)cpuid_to_apicid[cpu];
@@ -123,34 +118,40 @@ static void topo_set_cpuids(unsigned int
 		cpu_mark_primary_thread(cpu, apic_id);
 }
 
-/**
- * topology_register_apic - Register an APIC in early topology maps
- * @apic_id:	The APIC ID to set up
- * @acpi_id:	The ACPI ID associated to the APIC
- * @present:	True if the corresponding CPU is present
- */
-void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
+static __init bool check_for_real_bsp(u32 apic_id)
 {
-	int cpu;
+	/*
+	 * There is no real good way to detect whether this a kdump()
+	 * kernel, but except on the Voyager SMP monstrosity which is not
+	 * longer supported, the real BSP APIC ID is the first one which is
+	 * enumerated by firmware. That allows to detect whether the boot
+	 * CPU is the real BSP. If it is not, then do not register the APIC
+	 * because sending INIT to the real BSP would reset the whole
+	 * system.
+	 *
+	 * The first APIC ID which is enumerated by firmware is detectable
+	 * because the boot CPU APIC ID is registered before that without
+	 * invoking this code.
+	 */
+	if (topo_info.real_bsp_apic_id != BAD_APICID)
+		return false;
 
-	if (apic_id >= MAX_LOCAL_APIC) {
-		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
-		topo_info.nr_rejected_cpus++;
-		return;
+	if (apic_id == topo_info.boot_cpu_apic_id) {
+		topo_info.real_bsp_apic_id = apic_id;
+		return false;
 	}
 
-	/* CPU numbers exhausted? */
-	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
-		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
-		topo_info.nr_rejected_cpus++;
-		return;
-	}
+	pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x > %x\n",
+		topo_info.boot_cpu_apic_id, apic_id);
+	pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
 
-	if (disabled_cpu_apicid == apic_id) {
-		pr_info("Disabling CPU as requested via 'disable_cpu_apicid=0x%x'.\n", apic_id);
-		topo_info.nr_rejected_cpus++;
-		return;
-	}
+	topo_info.real_bsp_apic_id = apic_id;
+	return true;
+}
+
+static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
+{
+	int cpu;
 
 	if (present) {
 		/*
@@ -173,6 +174,33 @@ void __init topology_register_apic(u32 a
 }
 
 /**
+ * topology_register_apic - Register an APIC in early topology maps
+ * @apic_id:	The APIC ID to set up
+ * @acpi_id:	The ACPI ID associated to the APIC
+ * @present:	True if the corresponding CPU is present
+ */
+void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
+{
+	if (apic_id >= MAX_LOCAL_APIC) {
+		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
+		topo_info.nr_rejected_cpus++;
+		return;
+	}
+
+	/* CPU numbers exhausted? */
+	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
+		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
+		topo_info.nr_rejected_cpus++;
+		return;
+	}
+
+	if (check_for_real_bsp(apic_id))
+		return;
+
+	topo_register_apic(apic_id, acpi_id, present);
+}
+
+/**
  * topology_register_boot_apic - Register the boot CPU APIC
  * @apic_id:	The APIC ID to set up
  *
@@ -183,7 +211,7 @@ void __init topology_register_boot_apic(
 	WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
 
 	topo_info.boot_cpu_apic_id = apic_id;
-	topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
+	topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
 }
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
@@ -336,12 +364,3 @@ static int __init setup_possible_cpus(ch
 }
 early_param("possible_cpus", setup_possible_cpus);
 #endif
-
-static int __init apic_set_disabled_cpu_apicid(char *arg)
-{
-	if (!arg || !get_option(&arg, &disabled_cpu_apicid))
-		return -EINVAL;
-
-	return 0;
-}
-early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);


  parent reply	other threads:[~2024-01-23 13:11 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
2024-01-23 13:10 ` [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code Thomas Gleixner
2024-01-23 13:10 ` [patch v2 02/30] x86/cpu/topology: Provide separate APIC registration functions Thomas Gleixner
2024-01-23 13:10 ` [patch v2 03/30] x86/acpi: Use new " Thomas Gleixner
2024-01-23 13:10 ` [patch v2 04/30] x86/jailhouse: Use new APIC registration function Thomas Gleixner
2024-01-23 13:10 ` [patch v2 05/30] x86/of: Use new APIC registration functions Thomas Gleixner
2024-01-23 13:10 ` [patch v2 06/30] x86/mpparse: Use new APIC registration function Thomas Gleixner
2024-01-23 13:11 ` [patch v2 07/30] x86/acpi: Dont invoke topology_register_apic() for XEN PV Thomas Gleixner
2024-01-23 13:11 ` [patch v2 08/30] x86/xen/smp_pv: Register fake APICs Thomas Gleixner
2024-01-23 13:11 ` [patch v2 09/30] x86/cpu/topology: Confine topology information Thomas Gleixner
2024-01-23 13:11 ` [patch v2 10/30] x86/cpu/topology: Simplify APIC registration Thomas Gleixner
2024-01-23 13:11 ` [patch v2 11/30] x86/cpu/topology: Use a data structure for topology info Thomas Gleixner
2024-01-23 13:11 ` [patch v2 12/30] x86/smpboot: Make error message actually useful Thomas Gleixner
2024-01-23 13:11 ` [patch v2 13/30] x86/cpu/topology: Sanitize the APIC admission logic Thomas Gleixner
2024-01-23 13:11 ` [patch v2 14/30] x86/cpu/topology: Rework possible CPU management Thomas Gleixner
2024-01-31 23:47   ` Sohil Mehta
2024-01-23 13:11 ` Thomas Gleixner [this message]
2024-01-31 17:59   ` [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels Michael Kelley
2024-01-23 13:11 ` [patch v2 16/30] x86/topology: Add a mechanism to track topology via APIC IDs Thomas Gleixner
2024-01-23 13:11 ` [patch v2 17/30] x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug Thomas Gleixner
2024-01-23 13:11 ` [patch v2 18/30] x86/cpu/topology: Assign hotpluggable CPUIDs during init Thomas Gleixner
2024-01-23 13:11 ` [patch v2 19/30] x86/xen/smp_pv: Count number of vCPUs early Thomas Gleixner
2024-01-23 13:11 ` [patch v2 20/30] x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT Thomas Gleixner
2024-01-23 13:11 ` [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing Thomas Gleixner
2024-01-26  7:07   ` Zhang, Rui
2024-01-26 20:22     ` Thomas Gleixner
2024-01-28 20:01       ` Paul E. McKenney
2024-02-12 16:40       ` Thomas Gleixner
2024-02-12 19:49         ` Michael Kelley
2024-02-13 20:23         ` Sohil Mehta
2024-01-23 13:11 ` [patch v2 22/30] x86/cpu/topology: Mop up primary thread mask handling Thomas Gleixner
2024-01-23 13:11 ` [patch v2 23/30] x86/cpu/topology: Simplify cpu_mark_primary_thread() Thomas Gleixner
2024-01-23 13:11 ` [patch v2 24/30] x86/cpu/topology: Provide logical pkg/die mapping Thomas Gleixner
2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
2024-02-01 22:31   ` Sohil Mehta
2024-02-02  6:45   ` Zhang, Rui
2024-02-12 16:21     ` Thomas Gleixner
2024-01-23 13:11 ` [patch v2 26/30] x86/cpu/topology: Retrieve cores per package from topology bitmaps Thomas Gleixner
2024-01-23 13:11 ` [patch v2 27/30] x86/cpu/topology: Rename smp_num_siblings Thomas Gleixner
2024-01-23 13:11 ` [patch v2 28/30] x86/cpu/topology: Rename topology_max_die_per_package() Thomas Gleixner
2024-01-23 13:11 ` [patch v2 29/30] x86/cpu/topology: Provide __num_[cores|threads]_per_package Thomas Gleixner
2024-01-23 13:11 ` [patch v2 30/30] x86/cpu/topology: Get rid of cpuinfo::x86_max_cores Thomas Gleixner
2024-01-24 14:31 ` [patch v2 00/30] x86/apic: Rework APIC registration Zhang, Rui
2024-02-01 22:10 ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240118123649.400523172@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=andrew.cooper3@citrix.com \
    --cc=andy@infradead.org \
    --cc=arjan@linux.intel.com \
    --cc=dimitri.sivanich@hpe.com \
    --cc=feng.tang@intel.com \
    --cc=jgross@suse.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ray.huang@amd.com \
    --cc=rui.zhang@intel.com \
    --cc=sohil.mehta@intel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox