* [PATCH v3 1/8] drivers: firmware: riscv: add SSE NMI support
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
@ 2025-11-27 12:52 ` Yunhui Cui
2025-11-27 12:52 ` [PATCH v3 2/8] riscv: smp: move ipi_cpu_crash_stop() declaration to smp.h Yunhui Cui
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:52 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Add support for handling Non-Maskable Interrupts (NMIs) through the
RISC-V Supervisor Software Events (SSE) framework. Add basic NMI
functionality via SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED registration
and enabling.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
MAINTAINERS | 8 +++
drivers/firmware/riscv/Kconfig | 10 +++
drivers/firmware/riscv/Makefile | 1 +
drivers/firmware/riscv/riscv_sse_nmi.c | 90 ++++++++++++++++++++++++++
include/linux/riscv_sse_nmi.h | 26 ++++++++
5 files changed, 135 insertions(+)
create mode 100644 drivers/firmware/riscv/riscv_sse_nmi.c
create mode 100644 include/linux/riscv_sse_nmi.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 8bf5416953f45..c06658da8af96 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22057,6 +22057,14 @@ S: Maintained
F: drivers/firmware/riscv/riscv_sse.c
F: include/linux/riscv_sse.h
+RISC-V SSE NMI SUPPORT
+M: Yunhui Cui <cuiyunhui@bytedance.com>
+R: Xu Lu <luxu.kernel@bytedance.com>
+L: linux-riscv@lists.infradead.org
+S: Maintained
+F: drivers/firmware/riscv/riscv_sse_nmi.c
+F: include/linux/riscv_sse_nmi.h
+
RISC-V THEAD SoC SUPPORT
M: Drew Fustini <fustini@kernel.org>
M: Guo Ren <guoren@kernel.org>
diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
index ed5b663ac5f91..6c77c7823571a 100644
--- a/drivers/firmware/riscv/Kconfig
+++ b/drivers/firmware/riscv/Kconfig
@@ -12,4 +12,14 @@ config RISCV_SBI_SSE
this option provides support to register callbacks on specific SSE
events.
+config RISCV_SSE_NMI
+ bool "Enable SBI Supervisor Software Events NMI support"
+ depends on RISCV_SBI_SSE && SMP
+ default y
+ help
+ This option enables support for delivering Non-Maskable Interrupt
+ (NMI) notifications through the Supervisor Software Events (SSE)
+ framework. When enabled, the system can deliver local, unknown and
+ other types of NMIs.
+
endmenu
diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
index c8795d4bbb2ea..fbc182b53ae53 100644
--- a/drivers/firmware/riscv/Makefile
+++ b/drivers/firmware/riscv/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_RISCV_SBI_SSE) += riscv_sbi_sse.o
+obj-$(CONFIG_RISCV_SSE_NMI) += riscv_sse_nmi.o
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
new file mode 100644
index 0000000000000..752ee88b230da
--- /dev/null
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#define pr_fmt(fmt) "SSE NMI: " fmt
+
+#include <linux/atomic.h>
+#include <linux/riscv_sbi_sse.h>
+#include <linux/riscv_sse_nmi.h>
+
+#include <asm/irq_regs.h>
+#include <asm/sbi.h>
+#include <asm/smp.h>
+
+static bool nmi_available;
+static struct sse_event *local_nmi_evt;
+static DEFINE_PER_CPU(atomic_t, local_nmi) = ATOMIC_INIT(LOCAL_NMI_NONE);
+
+bool nmi_support(void)
+{
+ return READ_ONCE(nmi_available);
+}
+
+static inline struct sbiret sbi_sse_ecall(int fid, unsigned long arg0,
+ unsigned long arg1)
+{
+ return sbi_ecall(SBI_EXT_SSE, fid, arg0, arg1, 0, 0, 0, 0);
+}
+
+void send_nmi_single(unsigned int cpu, enum local_nmi_type type)
+{
+ unsigned int hart_id = cpuid_to_hartid_map(cpu);
+ u32 evt = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED;
+ struct sbiret ret;
+
+ atomic_or(type, per_cpu_ptr(&local_nmi, cpu));
+
+ ret = sbi_sse_ecall(SBI_SSE_EVENT_INJECT, evt, hart_id);
+ if (ret.error)
+ pr_err("Failed to signal event %x to hartid %d, error %ld\n",
+ evt, hart_id, ret.error);
+}
+
+void send_nmi_mask(cpumask_t *mask, enum local_nmi_type type)
+{
+ unsigned int cpu;
+
+ for_each_cpu(cpu, mask)
+ send_nmi_single(cpu, type);
+}
+
+static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
+{
+ return 0;
+}
+
+static int __init local_nmi_init(void)
+{
+ int ret;
+
+ local_nmi_evt = sse_event_register(SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED, 0,
+ local_nmi_handler, NULL);
+ if (IS_ERR(local_nmi_evt))
+ return PTR_ERR(local_nmi_evt);
+
+ ret = sse_event_enable(local_nmi_evt);
+ if (ret) {
+ sse_event_unregister(local_nmi_evt);
+ return ret;
+ }
+
+ pr_info("Using SSE for Local NMI event delivery\n");
+
+ return 0;
+}
+
+static int __init sse_nmi_init(void)
+{
+ int ret;
+
+ ret = local_nmi_init();
+ if (ret) {
+ pr_err("Local_nmi_init failed with error %d\n", ret);
+ return ret;
+ }
+
+ WRITE_ONCE(nmi_available, true);
+
+ return 0;
+}
+
+late_initcall(sse_nmi_init);
diff --git a/include/linux/riscv_sse_nmi.h b/include/linux/riscv_sse_nmi.h
new file mode 100644
index 0000000000000..16db85c5162f5
--- /dev/null
+++ b/include/linux/riscv_sse_nmi.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __LINUX_RISCV_SSE_NMI_H
+#define __LINUX_RISCV_SSE_NMI_H
+
+#include <linux/cpumask.h>
+
+enum local_nmi_type {
+ LOCAL_NMI_NONE = 0U,
+ LOCAL_NMI_STOP = BIT(0),
+ LOCAL_NMI_CRASH = BIT(1),
+ LOCAL_NMI_BACKTRACE = BIT(2),
+ LOCAL_NMI_KGDB = BIT(3),
+};
+
+#ifdef CONFIG_RISCV_SSE_NMI
+bool nmi_support(void);
+void send_nmi_mask(cpumask_t *mask, enum local_nmi_type type);
+void send_nmi_single(unsigned int cpu, enum local_nmi_type type);
+#else
+static inline bool nmi_support(void) { return false; }
+static inline void send_nmi_mask(cpumask_t *mask) { };
+static inline void send_nmi_single(unsigned int cpu) { };
+#endif
+
+#endif
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 2/8] riscv: smp: move ipi_cpu_crash_stop() declaration to smp.h
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
2025-11-27 12:52 ` [PATCH v3 1/8] drivers: firmware: riscv: add SSE NMI support Yunhui Cui
@ 2025-11-27 12:52 ` Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 3/8] smp: move num_other_online_cpus() into smp.h Yunhui Cui
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:52 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Move ipi_cpu_crash_stop() declaration from smp.c to smp.h to enable
external reference, and rename it to cpu_crash_stop().
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/smp.h | 9 +++++++++
arch/riscv/kernel/smp.c | 9 ++-------
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 7ac80e9f22889..f53f1f0e7aa9e 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -54,6 +54,15 @@ void riscv_ipi_set_virq_range(int virq, int nr);
/* Check other CPUs stop or not */
bool smp_crash_stop_failed(void);
+#ifdef CONFIG_KEXEC_CORE
+void cpu_crash_stop(unsigned int cpu, struct pt_regs *regs);
+#else
+static inline void cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+{
+ unreachable();
+}
+#endif
+
/* Secondary hart entry */
asmlinkage void smp_callin(void);
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index e650dec448176..9dbcb9a06a96d 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -78,7 +78,7 @@ static void ipi_stop(void)
#ifdef CONFIG_KEXEC_CORE
static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
-static inline void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
+void cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
{
crash_save_cpu(regs, cpu);
@@ -94,11 +94,6 @@ static inline void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
for(;;)
wait_for_interrupt();
}
-#else
-static inline void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
-{
- unreachable();
-}
#endif
static void send_ipi_mask(const struct cpumask *mask, enum ipi_message_type op)
@@ -134,7 +129,7 @@ static irqreturn_t handle_IPI(int irq, void *data)
ipi_stop();
break;
case IPI_CPU_CRASH_STOP:
- ipi_cpu_crash_stop(cpu, get_irq_regs());
+ cpu_crash_stop(cpu, get_irq_regs());
break;
case IPI_IRQ_WORK:
irq_work_run();
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 3/8] smp: move num_other_online_cpus() into smp.h
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
2025-11-27 12:52 ` [PATCH v3 1/8] drivers: firmware: riscv: add SSE NMI support Yunhui Cui
2025-11-27 12:52 ` [PATCH v3 2/8] riscv: smp: move ipi_cpu_crash_stop() declaration to smp.h Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 4/8] riscv: smp: use NMI for crash stop Yunhui Cui
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Remove duplicate definitions from arm64/riscv, add unified
implementation in smp.h.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/arm64/kernel/smp.c | 11 -----------
arch/riscv/kernel/smp.c | 11 -----------
include/linux/smp.h | 11 +++++++++++
3 files changed, 11 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 68cea3a4a35ca..2d1e7839dc9b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -1171,17 +1171,6 @@ void tick_broadcast(const struct cpumask *mask)
}
#endif
-/*
- * The number of CPUs online, not counting this CPU (which may not be
- * fully online and so not counted in num_online_cpus()).
- */
-static inline unsigned int num_other_online_cpus(void)
-{
- unsigned int this_cpu_online = cpu_online(smp_processor_id());
-
- return num_online_cpus() - this_cpu_online;
-}
-
void smp_send_stop(void)
{
static unsigned long stop_in_progress;
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 9dbcb9a06a96d..669325e68a21a 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -272,17 +272,6 @@ void smp_send_stop(void)
}
#ifdef CONFIG_KEXEC_CORE
-/*
- * The number of CPUs online, not counting this CPU (which may not be
- * fully online and so not counted in num_online_cpus()).
- */
-static inline unsigned int num_other_online_cpus(void)
-{
- unsigned int this_cpu_online = cpu_online(smp_processor_id());
-
- return num_online_cpus() - this_cpu_online;
-}
-
void crash_smp_send_stop(void)
{
static int cpus_stopped;
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 18e9c918325e5..5300c5c14232b 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -275,6 +275,17 @@ static inline int get_boot_cpu_id(void)
#define get_cpu() ({ preempt_disable(); __smp_processor_id(); })
#define put_cpu() preempt_enable()
+/*
+ * The number of CPUs online, not counting this CPU (which may not be
+ * fully online and so not counted in num_online_cpus()).
+ */
+static inline unsigned int num_other_online_cpus(void)
+{
+ unsigned int this_cpu_online = cpu_online(smp_processor_id());
+
+ return num_online_cpus() - this_cpu_online;
+}
+
/*
* Callback to arch code if there's nosmp or maxcpus=0 on the
* boot command line:
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 4/8] riscv: smp: use NMI for crash stop
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
` (2 preceding siblings ...)
2025-11-27 12:53 ` [PATCH v3 3/8] smp: move num_other_online_cpus() into smp.h Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 5/8] riscv: smp: use NMI for CPU stop Yunhui Cui
` (3 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Use NMI instead of IPI for crash stop if RISC-V SSE NMI is supported.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/kernel/smp.c | 11 ++++++++++-
drivers/firmware/riscv/riscv_sse_nmi.c | 12 ++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 669325e68a21a..1b8cf986abbd0 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -16,6 +16,7 @@
#include <linux/kgdb.h>
#include <linux/percpu.h>
#include <linux/profile.h>
+#include <linux/riscv_sse_nmi.h>
#include <linux/smp.h>
#include <linux/sched.h>
#include <linux/seq_file.h>
@@ -300,7 +301,15 @@ void crash_smp_send_stop(void)
atomic_set(&waiting_for_crash_ipi, num_other_online_cpus());
pr_crit("SMP: stopping secondary CPUs\n");
- send_ipi_mask(&mask, IPI_CPU_CRASH_STOP);
+
+ /*
+ * Not a high frequency operation and is in final state, directly use
+ * NMI instead of IPI to ensure reliability.
+ */
+ if (!nmi_support())
+ send_ipi_mask(&mask, IPI_CPU_CRASH_STOP);
+ else
+ send_nmi_mask(&mask, LOCAL_NMI_CRASH);
/* Wait up to one second for other CPUs to stop */
timeout = USEC_PER_SEC;
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index 752ee88b230da..add028efd25a0 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -10,6 +10,9 @@
#include <asm/sbi.h>
#include <asm/smp.h>
+#define NMI_HANDLE(mask, func, ...) \
+ do { if (type & (mask)) func(__VA_ARGS__); } while (0)
+
static bool nmi_available;
static struct sse_event *local_nmi_evt;
static DEFINE_PER_CPU(atomic_t, local_nmi) = ATOMIC_INIT(LOCAL_NMI_NONE);
@@ -49,6 +52,15 @@ void send_nmi_mask(cpumask_t *mask, enum local_nmi_type type)
static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
{
+ enum local_nmi_type type;
+ unsigned int cpu = smp_processor_id();
+
+ type = atomic_read(this_cpu_ptr(&local_nmi));
+
+ NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
+
+ atomic_andnot(type, this_cpu_ptr(&local_nmi));
+
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
` (3 preceding siblings ...)
2025-11-27 12:53 ` [PATCH v3 4/8] riscv: smp: use NMI for crash stop Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-12-04 4:07 ` Radim Krčmář
2025-11-27 12:53 ` [PATCH v3 6/8] riscv: smp: use NMI for backtrace Yunhui Cui
` (2 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/smp.h | 2 ++
arch/riscv/kernel/smp.c | 10 +++++++---
drivers/firmware/riscv/riscv_sse_nmi.c | 1 +
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index f53f1f0e7aa9e..e01ea962adfc4 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -63,6 +63,8 @@ static inline void cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
}
#endif
+void cpu_stop(void);
+
/* Secondary hart entry */
asmlinkage void smp_callin(void);
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 1b8cf986abbd0..bca95ec0b0f74 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -69,7 +69,7 @@ int riscv_hartid_to_cpuid(unsigned long hartid)
return -ENOENT;
}
-static void ipi_stop(void)
+void cpu_stop(void)
{
set_cpu_online(smp_processor_id(), false);
while (1)
@@ -127,7 +127,7 @@ static irqreturn_t handle_IPI(int irq, void *data)
generic_smp_call_function_interrupt();
break;
case IPI_CPU_STOP:
- ipi_stop();
+ cpu_stop();
break;
case IPI_CPU_CRASH_STOP:
cpu_crash_stop(cpu, get_irq_regs());
@@ -259,7 +259,11 @@ void smp_send_stop(void)
if (system_state <= SYSTEM_RUNNING)
pr_crit("SMP: stopping secondary CPUs\n");
- send_ipi_mask(&mask, IPI_CPU_STOP);
+
+ if (!nmi_support())
+ send_ipi_mask(&mask, IPI_CPU_STOP);
+ else
+ send_nmi_mask(&mask, LOCAL_NMI_CRASH);
}
/* Wait up to one second for other CPUs to stop */
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index add028efd25a0..02e2de2bb70f7 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
type = atomic_read(this_cpu_ptr(&local_nmi));
NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
+ NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
atomic_andnot(type, this_cpu_ptr(&local_nmi));
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-11-27 12:53 ` [PATCH v3 5/8] riscv: smp: use NMI for CPU stop Yunhui Cui
@ 2025-12-04 4:07 ` Radim Krčmář
2025-12-04 5:28 ` [External] " yunhui cui
0 siblings, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2025-12-04 4:07 UTC (permalink / raw)
To: Yunhui Cui, conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Cc: linux-riscv
2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
> diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> type = atomic_read(this_cpu_ptr(&local_nmi));
>
> NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
Please document the intended preemption design for all SSE events,
because it will be a nightmare if we forget some assumptions in the
coming years. (That includes the relative priorities of RAS/PMU/...)
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-04 4:07 ` Radim Krčmář
@ 2025-12-04 5:28 ` yunhui cui
2025-12-04 13:16 ` Radim Krčmář
0 siblings, 1 reply; 18+ messages in thread
From: yunhui cui @ 2025-12-04 5:28 UTC (permalink / raw)
To: Radim Krčmář
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
Hi Radim,
On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >
> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> > ---
> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> > type = atomic_read(this_cpu_ptr(&local_nmi));
> >
> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>
> Please document the intended preemption design for all SSE events,
> because it will be a nightmare if we forget some assumptions in the
> coming years. (That includes the relative priorities of RAS/PMU/...)
Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>
> Thanks.
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-04 5:28 ` [External] " yunhui cui
@ 2025-12-04 13:16 ` Radim Krčmář
2025-12-08 11:40 ` yunhui cui
0 siblings, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2025-12-04 13:16 UTC (permalink / raw)
To: yunhui cui
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
>> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >
>> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> > ---
>> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> > type = atomic_read(this_cpu_ptr(&local_nmi));
>> >
>> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>>
>> Please document the intended preemption design for all SSE events,
>> because it will be a nightmare if we forget some assumptions in the
>> coming years. (That includes the relative priorities of RAS/PMU/...)
>
> Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
That is how it is. I don't understand why it must be like that.
For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
it will currently interrupt NMI_CRASH as they both have priority 0,
although NMI_CRASH probably shouldn't be masked by anything, and should
preempt everything.
NMI_BACKTRACE, on the other hand, probably shouldn't have that high
priority as there seem more important events (e.g. RAS and NMI_CRASH).
The issues can be avoided by event priorities, masking, or deemed as
non-issue, but I think it would be beneficial to provide some reasoning
behind the design, as the choices don't seem obvious to me.
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-04 13:16 ` Radim Krčmář
@ 2025-12-08 11:40 ` yunhui cui
2025-12-10 14:22 ` Radim Krčmář
0 siblings, 1 reply; 18+ messages in thread
From: yunhui cui @ 2025-12-08 11:40 UTC (permalink / raw)
To: Radim Krčmář
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
Hi Radim,
On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> > Hi Radim,
> >
> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >> >
> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >> > ---
> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> >> > type = atomic_read(this_cpu_ptr(&local_nmi));
> >> >
> >> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> >> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
> >>
> >> Please document the intended preemption design for all SSE events,
> >> because it will be a nightmare if we forget some assumptions in the
> >> coming years. (That includes the relative priorities of RAS/PMU/...)
> >
> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>
> That is how it is. I don't understand why it must be like that.
>
> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
> it will currently interrupt NMI_CRASH as they both have priority 0,
> although NMI_CRASH probably shouldn't be masked by anything, and should
> preempt everything.
> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>
> The issues can be avoided by event priorities, masking, or deemed as
> non-issue, but I think it would be beneficial to provide some reasoning
> behind the design, as the choices don't seem obvious to me.
Indeed, it is necessary to consider the priority among different
events. Should different priorities also be assigned to NMI_CRASH,
NMI_BACKTRACE, NMI_STOP, and NMI_KGDB? Do these operations need to be
visible to the BIOS? Could you kindly provide some good suggestions?
>
> Thanks.
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-08 11:40 ` yunhui cui
@ 2025-12-10 14:22 ` Radim Krčmář
2025-12-12 3:09 ` yunhui cui
0 siblings, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2025-12-10 14:22 UTC (permalink / raw)
To: yunhui cui
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
>> > Hi Radim,
>> >
>> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>> >>
>> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
>> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >> >
>> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> >> > ---
>> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> >> > type = atomic_read(this_cpu_ptr(&local_nmi));
>> >> >
>> >> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> >> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>> >>
>> >> Please document the intended preemption design for all SSE events,
>> >> because it will be a nightmare if we forget some assumptions in the
>> >> coming years. (That includes the relative priorities of RAS/PMU/...)
>> >
>> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
>> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
>> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
>> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>>
>> That is how it is. I don't understand why it must be like that.
>>
>> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
>> it will currently interrupt NMI_CRASH as they both have priority 0,
>> although NMI_CRASH probably shouldn't be masked by anything, and should
>> preempt everything.
>> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
>> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>>
>> The issues can be avoided by event priorities, masking, or deemed as
>> non-issue, but I think it would be beneficial to provide some reasoning
>> behind the design, as the choices don't seem obvious to me.
>
> Indeed, it is necessary to consider the priority among different
> events. Should different priorities also be assigned to NMI_CRASH,
> NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?
I think it would be beneficial to document the desired behavior even if
we can't (currently?) implement it, because like you said, SSE can't
directly express the priority when multiplexing SOFTWARE_INJECTED.
> Do these operations need to be
> visible to the BIOS?
BIOS shouldn't care what lower privilege wants to do.
SBI could define more events for software use, though.
> Could you kindly provide some good suggestions?
I think it would be good practice to explicitly set a unique priority
when registering SSE events. Maybe through a global priority enum, and
make sure that all event registrations are passing a value from that
enum.
That would make sure that different events interact like we expect them
to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.
If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
importance handlers (e.g. CRASH) would eventually run.
We're dealing with low-occurrence scenarios, so this might be "good
enough for now"...
Situation would get simpler if we could avoid some sub-handlers;
alternatively, it would get more complicated if SOFTWARE_INJECTED had
lower priority than some other event -- we'd make CRASH partially
recover its high priority image by masking other SSE events during its
execution (and we'd need warding amulets against hangs and starvation).
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
2025-12-10 14:22 ` Radim Krčmář
@ 2025-12-12 3:09 ` yunhui cui
0 siblings, 0 replies; 18+ messages in thread
From: yunhui cui @ 2025-12-12 3:09 UTC (permalink / raw)
To: Radim Krčmář
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
Hi Radim,
On Wed, Dec 10, 2025 at 10:22 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> > Hi Radim,
> >
> > On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> >> > Hi Radim,
> >> >
> >> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >> >>
> >> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> >> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >> >> >
> >> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >> >> > ---
> >> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> >> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> >> >> > type = atomic_read(this_cpu_ptr(&local_nmi));
> >> >> >
> >> >> > NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> >> >> > + NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
> >> >>
> >> >> Please document the intended preemption design for all SSE events,
> >> >> because it will be a nightmare if we forget some assumptions in the
> >> >> coming years. (That includes the relative priorities of RAS/PMU/...)
> >> >
> >> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> >> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> >> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> >> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
> >>
> >> That is how it is. I don't understand why it must be like that.
> >>
> >> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
> >> it will currently interrupt NMI_CRASH as they both have priority 0,
> >> although NMI_CRASH probably shouldn't be masked by anything, and should
> >> preempt everything.
> >> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
> >> priority as there seem more important events (e.g. RAS and NMI_CRASH).
> >>
> >> The issues can be avoided by event priorities, masking, or deemed as
> >> non-issue, but I think it would be beneficial to provide some reasoning
> >> behind the design, as the choices don't seem obvious to me.
> >
> > Indeed, it is necessary to consider the priority among different
> > events. Should different priorities also be assigned to NMI_CRASH,
> > NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?
>
> I think it would be beneficial to document the desired behavior even if
> we can't (currently?) implement it, because like you said, SSE can't
> directly express the priority when multiplexing SOFTWARE_INJECTED.
>
> > Do these operations need to be
> > visible to the BIOS?
>
> BIOS shouldn't care what lower privilege wants to do.
> SBI could define more events for software use, though.
>
> > Could you kindly provide some good suggestions?
>
> I think it would be good practice to explicitly set a unique priority
> when registering SSE events. Maybe through a global priority enum, and
> make sure that all event registrations are passing a value from that
> enum.
Yes, each distinct event should have a preset priority. I noticed that
SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW in Clément's patchset
(https://lore.kernel.org/all/20251105082639.342973-5-cleger@rivosinc.com/)
also does not explicitly set a priority.
As you mentioned, a "global priority enum" is required. I will also
respond to Clément's email.
>
> That would make sure that different events interact like we expect them
> to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.
>
> If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
> priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
> lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
> importance handlers (e.g. CRASH) would eventually run.
> We're dealing with low-occurrence scenarios, so this might be "good
> enough for now"...
Currently, x86 also handles multiple local NMI events, including
BACKTRACE, via a single NMI vector. We might as well set the priority
of SOFTWARE_INJECTED to the highest.
>
> Situation would get simpler if we could avoid some sub-handlers;
> alternatively, it would get more complicated if SOFTWARE_INJECTED had
> lower priority than some other event -- we'd make CRASH partially
> recover its high priority image by masking other SSE events during its
> execution (and we'd need warding amulets against hangs and starvation).
>
> Thanks.
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 6/8] riscv: smp: use NMI for backtrace
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
` (4 preceding siblings ...)
2025-11-27 12:53 ` [PATCH v3 5/8] riscv: smp: use NMI for CPU stop Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 7/8] riscv: smp: kgdb: use NMI for CPU roundup Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support Yunhui Cui
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Use NMI instead of IPI for backtrace if RISC-V SSE NMI is supported.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/kernel/smp.c | 12 +++++++++++-
drivers/firmware/riscv/riscv_sse_nmi.c | 2 ++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index bca95ec0b0f74..6d9a67c2c2a6e 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -342,9 +342,19 @@ static void riscv_backtrace_ipi(cpumask_t *mask)
send_ipi_mask(mask, IPI_CPU_BACKTRACE);
}
+static void riscv_backtrace_nmi(cpumask_t *mask)
+{
+ send_nmi_mask(mask, LOCAL_NMI_BACKTRACE);
+}
+
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
{
- nmi_trigger_cpumask_backtrace(mask, exclude_cpu, riscv_backtrace_ipi);
+ if (!nmi_support())
+ nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
+ riscv_backtrace_ipi);
+ else
+ nmi_trigger_cpumask_backtrace(mask, exclude_cpu,
+ riscv_backtrace_nmi);
}
#ifdef CONFIG_KGDB
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index 02e2de2bb70f7..a138d6bdbc0d1 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -3,6 +3,7 @@
#define pr_fmt(fmt) "SSE NMI: " fmt
#include <linux/atomic.h>
+#include <linux/nmi.h>
#include <linux/riscv_sbi_sse.h>
#include <linux/riscv_sse_nmi.h>
@@ -59,6 +60,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
+ NMI_HANDLE(LOCAL_NMI_BACKTRACE, nmi_cpu_backtrace, regs);
atomic_andnot(type, this_cpu_ptr(&local_nmi));
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 7/8] riscv: smp: kgdb: use NMI for CPU roundup
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
` (5 preceding siblings ...)
2025-11-27 12:53 ` [PATCH v3 6/8] riscv: smp: use NMI for backtrace Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-11-27 12:53 ` [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support Yunhui Cui
7 siblings, 0 replies; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Use NMI for kgdb CPU roundup if RISC-V SSE NMI is available.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/kernel/smp.c | 5 ++++-
drivers/firmware/riscv/riscv_sse_nmi.c | 2 ++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 6d9a67c2c2a6e..84436e348b6ea 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -368,7 +368,10 @@ void kgdb_roundup_cpus(void)
if (cpu == this_cpu)
continue;
- send_ipi_single(cpu, IPI_KGDB_ROUNDUP);
+ if (!nmi_support())
+ send_ipi_single(cpu, IPI_KGDB_ROUNDUP);
+ else
+ send_nmi_single(cpu, LOCAL_NMI_KGDB);
}
}
#endif
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index a138d6bdbc0d1..85aa65f31943b 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -3,6 +3,7 @@
#define pr_fmt(fmt) "SSE NMI: " fmt
#include <linux/atomic.h>
+#include <linux/kgdb.h>
#include <linux/nmi.h>
#include <linux/riscv_sbi_sse.h>
#include <linux/riscv_sse_nmi.h>
@@ -61,6 +62,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
NMI_HANDLE(LOCAL_NMI_BACKTRACE, nmi_cpu_backtrace, regs);
+ NMI_HANDLE(LOCAL_NMI_KGDB, kgdb_nmicallback, cpu, regs);
atomic_andnot(type, this_cpu_ptr(&local_nmi));
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support
2025-11-27 12:52 [PATCH v3 0/8] Add NMI Support to RISC-V via SSE Yunhui Cui
` (6 preceding siblings ...)
2025-11-27 12:53 ` [PATCH v3 7/8] riscv: smp: kgdb: use NMI for CPU roundup Yunhui Cui
@ 2025-11-27 12:53 ` Yunhui Cui
2025-12-04 4:11 ` Radim Krčmář
7 siblings, 1 reply; 18+ messages in thread
From: Yunhui Cui @ 2025-11-27 12:53 UTC (permalink / raw)
To: conor, paul.walmsley, palmer, aou, alex, cuiyunhui, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Register unknown_nmi_handler() as the handler for the UNKNOWN_NMI
event. When the system becomes unresponsive, unknown_nmi_handler()
can be manually triggered, which in turn invokes nmi_panic() to
collect vmcore for root cause analysis.
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
arch/riscv/include/asm/sbi.h | 1 +
drivers/firmware/riscv/riscv_sse_nmi.c | 68 ++++++++++++++++++++++++++
2 files changed, 69 insertions(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 874cc1d7603a5..52d3fdf2d4cc1 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -486,6 +486,7 @@ enum sbi_sse_attr_id {
#define SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS 0x00100000
#define SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS 0x00108000
#define SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED 0xffff0000
+#define SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI 0xffff0001
#define SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED 0xffff8000
#define SBI_SSE_EVENT_PLATFORM BIT(14)
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index 85aa65f31943b..d98015d1cb893 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -7,6 +7,7 @@
#include <linux/nmi.h>
#include <linux/riscv_sbi_sse.h>
#include <linux/riscv_sse_nmi.h>
+#include <linux/sysctl.h>
#include <asm/irq_regs.h>
#include <asm/sbi.h>
@@ -16,7 +17,10 @@
do { if (type & (mask)) func(__VA_ARGS__); } while (0)
static bool nmi_available;
+static int unknown_nmi_panic;
static struct sse_event *local_nmi_evt;
+static struct sse_event *unknown_nmi_evt;
+static struct ctl_table_header *unknown_nmi_sysctl_header;
static DEFINE_PER_CPU(atomic_t, local_nmi) = ATOMIC_INIT(LOCAL_NMI_NONE);
bool nmi_support(void)
@@ -52,6 +56,35 @@ void send_nmi_mask(cpumask_t *mask, enum local_nmi_type type)
send_nmi_single(cpu, type);
}
+static int __init setup_unknown_nmi_panic(char *str)
+{
+ unknown_nmi_panic = 1;
+ return 1;
+}
+__setup("unknown_nmi_panic", setup_unknown_nmi_panic);
+
+static const struct ctl_table unknown_nmi_table[] = {
+ {
+ .procname = "unknown_nmi_panic",
+ .data = &unknown_nmi_panic,
+ .maxlen = sizeof(bool),
+ .mode = 0644,
+ .proc_handler = proc_dobool,
+ },
+};
+
+static int unknown_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
+{
+ pr_emerg("NMI received for unknown on CPU %d.\n", smp_processor_id());
+
+ if (unknown_nmi_panic)
+ nmi_panic(regs, "NMI: Not continuing");
+
+ pr_emerg("Dazed and confused, but trying to continue\n");
+
+ return 0;
+}
+
static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
{
enum local_nmi_type type;
@@ -69,6 +102,35 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
return 0;
}
+static int unknown_nmi_init(void)
+{
+ int ret;
+
+ unknown_nmi_evt = sse_event_register(SBI_SSE_EVENT_LOCAL_UNKNOWN_NMI, 0,
+ unknown_nmi_handler, NULL);
+ if (IS_ERR(unknown_nmi_evt))
+ return PTR_ERR(unknown_nmi_evt);
+
+ ret = sse_event_enable(unknown_nmi_evt);
+ if (ret)
+ goto err_unregister;
+
+ unknown_nmi_sysctl_header = register_sysctl("kernel", unknown_nmi_table);
+ if (!unknown_nmi_sysctl_header) {
+ ret = -ENOMEM;
+ goto err_disable;
+ }
+
+ pr_info("Using SSE for unknown NMI event delivery\n");
+ return 0;
+
+err_disable:
+ sse_event_disable(unknown_nmi_evt);
+err_unregister:
+ sse_event_unregister(unknown_nmi_evt);
+ return ret;
+}
+
static int __init local_nmi_init(void)
{
int ret;
@@ -101,6 +163,12 @@ static int __init sse_nmi_init(void)
WRITE_ONCE(nmi_available, true);
+ ret = unknown_nmi_init();
+ if (ret) {
+ pr_err("Unknown_nmi_init failed with error %d\n", ret);
+ return ret;
+ }
+
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support
2025-11-27 12:53 ` [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support Yunhui Cui
@ 2025-12-04 4:11 ` Radim Krčmář
2025-12-04 5:18 ` [External] " yunhui cui
0 siblings, 1 reply; 18+ messages in thread
From: Radim Krčmář @ 2025-12-04 4:11 UTC (permalink / raw)
To: Yunhui Cui, conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones
Cc: linux-riscv
2025-11-27T20:53:05+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> Register unknown_nmi_handler() as the handler for the UNKNOWN_NMI
> event. When the system becomes unresponsive, unknown_nmi_handler()
> can be manually triggered, which in turn invokes nmi_panic() to
> collect vmcore for root cause analysis.
Is UNKNOWN_NMI what we expect the watchdog to send?
Thanks.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support
2025-12-04 4:11 ` Radim Krčmář
@ 2025-12-04 5:18 ` yunhui cui
2025-12-04 13:26 ` Radim Krčmář
0 siblings, 1 reply; 18+ messages in thread
From: yunhui cui @ 2025-12-04 5:18 UTC (permalink / raw)
To: Radim Krčmář
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
Hi Radim,
On Thu, Dec 4, 2025 at 12:11 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-11-27T20:53:05+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> > Register unknown_nmi_handler() as the handler for the UNKNOWN_NMI
> > event. When the system becomes unresponsive, unknown_nmi_handler()
> > can be manually triggered, which in turn invokes nmi_panic() to
> > collect vmcore for root cause analysis.
>
> Is UNKNOWN_NMI what we expect the watchdog to send?
For reference: As stated in
https://github.com/riscv-non-isa/riscv-sbi-doc/pull/223, "Generally,
an external interrupt is used as an Unknown NMI pin, and an Unknown
NMI event is sent to the SBI firmware by triggering this pin. Then the
SBI firmware will send SBI_SSE_EVENT_GLOBAL_UNKNOWN_NMI to the
kernel."
When the Linux system is unresponsive, we can manually trigger it via
BMC (ipmitool).
>
> Thanks.
Thanks,
Yunhui
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 8/8] drivers: firmware: riscv: add unknown nmi support
2025-12-04 5:18 ` [External] " yunhui cui
@ 2025-12-04 13:26 ` Radim Krčmář
0 siblings, 0 replies; 18+ messages in thread
From: Radim Krčmář @ 2025-12-04 13:26 UTC (permalink / raw)
To: yunhui cui
Cc: conor, paul.walmsley, palmer, aou, alex, luxu.kernel,
linux-kernel, linux-riscv, jassisinghbrar, conor.dooley,
valentina.fernandezalanis, catalin.marinas, will, maz,
timothy.hayes, lpieralisi, arnd, kees, tglx, viresh.kumar,
boqun.feng, linux-arm-kernel, cleger, atishp, ajones, linux-riscv
2025-12-04T13:18:00+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 12:11 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-11-27T20:53:05+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
>> > Register unknown_nmi_handler() as the handler for the UNKNOWN_NMI
>> > event. When the system becomes unresponsive, unknown_nmi_handler()
>> > can be manually triggered, which in turn invokes nmi_panic() to
>> > collect vmcore for root cause analysis.
>>
>> Is UNKNOWN_NMI what we expect the watchdog to send?
>
> For reference: As stated in
> https://github.com/riscv-non-isa/riscv-sbi-doc/pull/223, "Generally,
> an external interrupt is used as an Unknown NMI pin, and an Unknown
> NMI event is sent to the SBI firmware by triggering this pin. Then the
> SBI firmware will send SBI_SSE_EVENT_GLOBAL_UNKNOWN_NMI to the
> kernel."
>
> When the Linux system is unresponsive, we can manually trigger it via
> BMC (ipmitool).
Makes sense, thanks, and do we plan to deal with other crash sources?
For example if a watchdog/sysrq triggers a crash, and then gets
interrupted with UNKNOWN_NMI.
^ permalink raw reply [flat|nested] 18+ messages in thread