public inbox for linuxppc-dev@ozlabs.org
 help / color / mirror / Atom feed
From: Shivang Upadhyay <shivangu@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Cc: Shivang Upadhyay <shivangu@linux.ibm.com>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Srikar Dronamraju <srikar@linux.ibm.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	"Nysal Jan K.A." <nysal@linux.ibm.com>,
	Vishal Chourasia <vishalc@linux.ibm.com>,
	Ritesh Harjani <ritesh.list@gmail.com>,
	Sourabh Jain <sourabhjain@linux.ibm.com>,
	Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Subject: [PATCH v2] pseries/kexec: skip resetting CPUs added by firmware but not started by the kernel
Date: Mon, 30 Mar 2026 11:52:06 +0530	[thread overview]
Message-ID: <20260330062206.170437-1-shivangu@linux.ibm.com> (raw)

During DLPAR operations, the newly added CPUs start in halted mode.
The kernel then takes some time to initialize those CPUs internally and
start them using the "start-cpu" RTAS call. However, if a kexec crash
occurs in this window (before the new CPU has been initialized),
the kexec NMI will try to reset all other CPUs from the crashing CPU.
This leads to firmware starting the uninitialized CPUs as well.

This can cause the kdump kernel to hang during bring-up.

Sample Log:
  [175993.028231][ T1502] NIP [00007fffb953f394] 0x7fffb953f394
  [175993.028314][ T1502] LR [00007fffb953f394] 0x7fffb953f394
  [175993.028390][ T1502] --- interrupt: 3000
  [    5.519483][    T1] Processor 0 is stuck.
  [   11.089481][    T1] Processor 1 is stuck.

To fix this, only issue the system-reset hcall to CPUs that have
actually been started by the kernel.

Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Srikar Dronamraju <srikar@linux.ibm.com>
Cc: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Nysal Jan K.A. <nysal@linux.ibm.com>
Cc: Vishal Chourasia <vishalc@linux.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Signed-off-by: Shivang Upadhyay <shivangu@linux.ibm.com>
---
Changelog:

V2:
  * added set_crash_nmi_ipi to saperate crash's case from other nmi_ipi
    users

V1:
  * https://lore.kernel.org/all/20251205142825.44698-1-shivangu@linux.ibm.com/
---
 arch/powerpc/include/asm/smp.h       |  1 +
 arch/powerpc/kernel/smp.c            |  1 +
 arch/powerpc/platforms/pseries/smp.c | 29 +++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index e41b9ea42122..cb74201f5674 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -47,6 +47,7 @@ struct smp_ops_t {
 	void  (*cause_ipi)(int cpu);
 #endif
 	int   (*cause_nmi_ipi)(int cpu);
+	void  (*set_crash_nmi_ipi)(void);
 	void  (*probe)(void);
 	int   (*kick_cpu)(int nr);
 	int   (*prepare_cpu)(int nr);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3467f86fd78f..3390ee8adf79 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -594,6 +594,7 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 {
 	int cpu;
 
+	smp_ops->set_crash_nmi_ipi();
 	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
 	if (kdump_in_progress() && crash_wake_offline) {
 		for_each_present_cpu(cpu) {
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index db99725e752b..c6c2baacca9a 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -51,6 +51,9 @@
  */
 static cpumask_var_t of_spin_mask;
 
+
+static int crash_nmi_ipi;
+
 /* Query where a cpu is now.  Return codes #defined in plpar_wrappers.h */
 int smp_query_cpu_stopped(unsigned int pcpu)
 {
@@ -171,12 +174,35 @@ static void dbell_or_ic_cause_ipi(int cpu)
 	ic_cause_ipi(cpu);
 }
 
+static void pseries_set_crash_nmi_ipi(void)
+{
+	crash_nmi_ipi = 1;
+}
+
 static int pseries_cause_nmi_ipi(int cpu)
 {
 	int hwcpu;
+	int k, curcpu;
 
+	curcpu = smp_processor_id();
 	if (cpu == NMI_IPI_ALL_OTHERS) {
-		hwcpu = H_SIGNAL_SYS_RESET_ALL_OTHERS;
+		if (crash_nmi_ipi) {
+			for_each_present_cpu(k) {
+				if (k != curcpu) {
+					hwcpu = get_hard_smp_processor_id(k);
+
+					/* it is possible that cpu is present,
+					 * but not started yet.
+					 */
+
+					if (paca_ptrs[hwcpu]->cpu_start == 1) {
+						plpar_signal_sys_reset(hwcpu);
+					}
+				}
+			}
+			return 1;
+		} else
+			hwcpu = H_SIGNAL_SYS_RESET_ALL_OTHERS;
 	} else {
 		if (cpu < 0) {
 			WARN_ONCE(true, "incorrect cpu parameter %d", cpu);
@@ -243,6 +269,7 @@ static struct smp_ops_t pseries_smp_ops = {
 	.message_pass	= NULL,	/* Use smp_muxed_ipi_message_pass */
 	.cause_ipi	= NULL,	/* Filled at runtime by pSeries_smp_probe() */
 	.cause_nmi_ipi	= pseries_cause_nmi_ipi,
+	.set_crash_nmi_ipi = pseries_set_crash_nmi_ipi,
 	.probe		= pSeries_smp_probe,
 	.prepare_cpu	= pseries_smp_prepare_cpu,
 	.kick_cpu	= smp_pSeries_kick_cpu,
-- 
2.53.0



             reply	other threads:[~2026-03-30  6:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30  6:22 Shivang Upadhyay [this message]
2026-03-30  8:49 ` [PATCH v2] pseries/kexec: skip resetting CPUs added by firmware but not started by the kernel Shrikanth Hegde
2026-03-31  2:59   ` Shivang Upadhyay
2026-03-31  4:33 ` Srikar Dronamraju
2026-03-31  6:44   ` Shivang Upadhyay
2026-04-06  8:52 ` Vishal Chourasia
2026-04-07 10:19   ` Shivang Upadhyay
2026-04-07 10:25     ` Vishal Chourasia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260330062206.170437-1-shivangu@linux.ibm.com \
    --to=shivangu@linux.ibm.com \
    --cc=anushree.mathur@linux.vnet.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=nysal@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=sourabhjain@linux.ibm.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=vishalc@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox