[PATCH v9 0/5] riscv: add SBI Supervisor Software Events support

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support
@ 2026-05-09 13:09 Zhanpeng Zhang
  2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:09 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Zhanpeng Zhang


This series continues Clément's SBI Supervisor Software Events (SSE)
Linux support. Most of the framework is still Clément's work; many
thanks to him for bringing the series this far. Clément reached out and
suggested that I could continue the upstream work, so I am picking up
the series from here.

SSE gives the SBI a firmware-to-S/HS-mode event delivery path that does
not depend on ordinary S-mode interrupt delivery. This is useful for
NMI-style platform events and for PMU overflow delivery through firmware.

This respin mainly fixes two issues:

  - KVM Context Corruption

    An SSE can interrupt HS-mode code around guest execution, run a
    synthetic SSE handler, and then complete back through firmware.  The
    handler episode must be transparent to Linux-owned trap and
    virtualization state.  This version folds the stvec and hstatus
    preservation into the Linux SSE entry wrapper, rather than relying on
    OpenSBI to keep extra hidden resume state. This follows the earlier
    discussion [1].

  - SSE Handler's SSTATUS Leakage

    With CONFIG_RISCV_PMU_SBI_SSE, perf can use the local PMU overflow
    SSE event instead of the ordinary PMU interrupt. This makes the PMU
    overflow handler a real hot path for SSE: workloads such as
    "perf top" can enter the SSE handler and completion paths at a high
    rate. The previous code did not preserve the interrupted
    sstatus.SIE/SPIE/SPP state separately from the synthetic handler
    trap-return state, so high-frequency PMU-SSE delivery could quickly
    expose the leakage. One observed failure mode was an "environment
    call from U-mode" panic. This series addresses that by making the
    SSE entry wrapper preserve the live sstatus state around the handler
    before completing the event.

The selftest has also been extended to cover the high-frequency PMU-SSE
patterns that exposed these bugs.  The new stress modes exercise repeated
handler entry and completion, SBI calls from a handler, PMU stop/start
activity, and re-injection.

How to test:

  Build a kernel with at least:

    CONFIG_RISCV_SBI_SSE=y
    CONFIG_RISCV_PMU_SBI=y
    CONFIG_RISCV_PMU_SBI_SSE=y

  Build the selftest module against that kernel and run the smoke test:

    ./run_sse_test.sh

  Then run the stress layers.  Use stress=1 to run them after the smoke
  test, or stress=2 to run only the stress layers:

    ./run_sse_test.sh stress={0,1,2}

Known limitations and follow-up work:

SSE is a relatively large extension which builds synthetic supervisor
contexts and then resumes the interrupted context through firmware. This
series fixes the issues found so far in that entry/completion path, but I
think it still needs more testing and review. There may still be corner
cases in the constructed handler context or in the state restored after
completion that this series does not cover yet.

The next planned area is virtualization support. Follow-up work is
expected to distinguish host and guest PMU-SSE samples, and to support
delivering SSE events to guests, i.e. the KVM-SSE work from earlier
revisions.

[1] https://lore.kernel.org/r/9290f53d-3545-4299-9781-c1c558f71158@rivosinc.com

---

Changes in v9:
  - Picked up Clément's v8 series and rebased it onto the current RISC-V
    for-next branch.
  - Fixed KVM Context Corruption by keeping stvec and hstatus stable across
    the synthetic SSE handler episode.
  - Fixed SSE Handler's SSTATUS Leakage, where interrupted
    sstatus.SIE/SPIE/SPP state could be confused with the handler
    trap-return state under high-frequency PMU-SSE delivery.
  - Extended the SSE selftest with stress={0,1,2} modes that cover repeated
    handler entry, SBI calls, PMU activity and re-injection, and fixed event
    registration and attribute-buffer lifetime corner cases.
  - Updated MAINTAINERS for the RISC-V SSE driver.

Previous versions:
  v8:
    https://lore.kernel.org/r/20251105082639.342973-1-cleger@rivosinc.com

Clément Léger (5):
  riscv: add SBI SSE extension definitions
  riscv: add support for SBI Supervisor Software Events extension
  drivers: firmware: add riscv SSE support
  perf: RISC-V: add support for SSE event
  selftests/riscv: add SSE test module

 MAINTAINERS                                   |  17 +
 arch/riscv/include/asm/asm.h                  |  14 +-
 arch/riscv/include/asm/sbi.h                  |  61 ++
 arch/riscv/include/asm/scs.h                  |   7 +
 arch/riscv/include/asm/sse.h                  |  47 ++
 arch/riscv/include/asm/switch_to.h            |  14 +
 arch/riscv/include/asm/thread_info.h          |   1 +
 arch/riscv/kernel/Makefile                    |   1 +
 arch/riscv/kernel/asm-offsets.c               |  14 +
 arch/riscv/kernel/sbi_sse.c                   | 162 ++++
 arch/riscv/kernel/sbi_sse_entry.S             | 187 +++++
 drivers/firmware/Kconfig                      |   1 +
 drivers/firmware/Makefile                     |   1 +
 drivers/firmware/riscv/Kconfig                |  15 +
 drivers/firmware/riscv/Makefile               |   3 +
 drivers/firmware/riscv/riscv_sbi_sse.c        | 694 ++++++++++++++++
 drivers/perf/Kconfig                          |  10 +
 drivers/perf/riscv_pmu.c                      |  23 +
 drivers/perf/riscv_pmu_sbi.c                  |  71 +-
 include/linux/perf/riscv_pmu.h                |   5 +
 include/linux/riscv_sbi_sse.h                 |  57 ++
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/sse/Makefile    |   5 +
 .../selftests/riscv/sse/module/Makefile       |  16 +
 .../riscv/sse/module/riscv_sse_test.c         | 769 ++++++++++++++++++
 .../selftests/riscv/sse/run_sse_test.sh       |  48 ++
 26 files changed, 2231 insertions(+), 14 deletions(-)
 create mode 100644 arch/riscv/include/asm/sse.h
 create mode 100644 arch/riscv/kernel/sbi_sse.c
 create mode 100644 arch/riscv/kernel/sbi_sse_entry.S
 create mode 100644 drivers/firmware/riscv/Kconfig
 create mode 100644 drivers/firmware/riscv/Makefile
 create mode 100644 drivers/firmware/riscv/riscv_sbi_sse.c
 create mode 100644 include/linux/riscv_sbi_sse.h
 create mode 100644 tools/testing/selftests/riscv/sse/Makefile
 create mode 100644 tools/testing/selftests/riscv/sse/module/Makefile
 create mode 100644 tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
 create mode 100644 tools/testing/selftests/riscv/sse/run_sse_test.sh

-- 
2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v9 1/5] riscv: add SBI SSE extension definitions
  2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
@ 2026-05-09 13:09 ` Zhanpeng Zhang
  2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:09 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Zhanpeng Zhang

From: Clément Léger <cleger@rivosinc.com>

Add needed definitions for SBI Supervisor Software Events extension [1].
This extension enables the SBI to inject events into supervisor software
much like ARM SDEI.

[1] https://lists.riscv.org/g/tech-prs/message/515

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
---
 arch/riscv/include/asm/sbi.h | 61 ++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 5725e0ca4dda..3403b0f706b4 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -38,6 +38,7 @@ enum sbi_ext_id {
 	SBI_EXT_FWFT = 0x46574654,
 	SBI_EXT_MPXY = 0x4D505859,
 	SBI_EXT_DBTR = 0x44425452,
+	SBI_EXT_SSE = 0x535345,
 
 	/* Experimentals extensions must lie within this range */
 	SBI_EXT_EXPERIMENTAL_START = 0x08000000,
@@ -506,6 +507,66 @@ enum sbi_mpxy_rpmi_attribute_id {
 #define SBI_MPXY_CHAN_CAP_SEND_WITHOUT_RESP	BIT(4)
 #define SBI_MPXY_CHAN_CAP_GET_NOTIFICATIONS	BIT(5)
 
+enum sbi_ext_sse_fid {
+	SBI_SSE_EVENT_ATTR_READ = 0,
+	SBI_SSE_EVENT_ATTR_WRITE,
+	SBI_SSE_EVENT_REGISTER,
+	SBI_SSE_EVENT_UNREGISTER,
+	SBI_SSE_EVENT_ENABLE,
+	SBI_SSE_EVENT_DISABLE,
+	SBI_SSE_EVENT_COMPLETE,
+	SBI_SSE_EVENT_INJECT,
+	SBI_SSE_HART_UNMASK,
+	SBI_SSE_HART_MASK,
+};
+
+enum sbi_sse_state {
+	SBI_SSE_STATE_UNUSED     = 0,
+	SBI_SSE_STATE_REGISTERED = 1,
+	SBI_SSE_STATE_ENABLED    = 2,
+	SBI_SSE_STATE_RUNNING    = 3,
+};
+
+/* SBI SSE Event Attributes. */
+enum sbi_sse_attr_id {
+	SBI_SSE_ATTR_STATUS		= 0x00000000,
+	SBI_SSE_ATTR_PRIO		= 0x00000001,
+	SBI_SSE_ATTR_CONFIG		= 0x00000002,
+	SBI_SSE_ATTR_PREFERRED_HART	= 0x00000003,
+	SBI_SSE_ATTR_ENTRY_PC		= 0x00000004,
+	SBI_SSE_ATTR_ENTRY_ARG		= 0x00000005,
+	SBI_SSE_ATTR_INTERRUPTED_SEPC	= 0x00000006,
+	SBI_SSE_ATTR_INTERRUPTED_FLAGS	= 0x00000007,
+	SBI_SSE_ATTR_INTERRUPTED_A6	= 0x00000008,
+	SBI_SSE_ATTR_INTERRUPTED_A7	= 0x00000009,
+
+	SBI_SSE_ATTR_MAX		= 0x0000000A
+};
+
+#define SBI_SSE_ATTR_STATUS_STATE_OFFSET	0
+#define SBI_SSE_ATTR_STATUS_STATE_MASK		0x3
+#define SBI_SSE_ATTR_STATUS_PENDING_OFFSET	2
+#define SBI_SSE_ATTR_STATUS_INJECT_OFFSET	3
+
+#define SBI_SSE_ATTR_CONFIG_ONESHOT	BIT(0)
+
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_SSTATUS_SPP	BIT(0)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_SSTATUS_SPIE	BIT(1)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_HSTATUS_SPV	BIT(2)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_HSTATUS_SPVP	BIT(3)
+
+#define SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS	0x00000000
+#define SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP		0x00000001
+#define SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS	0x00008000
+#define SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW	0x00010000
+#define SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS	0x00100000
+#define SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS	0x00108000
+#define SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED	0xffff0000
+#define SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED	0xffff8000
+
+#define SBI_SSE_EVENT_PLATFORM		BIT(14)
+#define SBI_SSE_EVENT_GLOBAL		BIT(15)
+
 /* SBI debug triggers function IDs */
 enum sbi_ext_dbtr_fid {
 	SBI_EXT_DBTR_NUM_TRIGGERS = 0,
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension
  2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
  2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
@ 2026-05-09 13:09 ` Zhanpeng Zhang
  2026-05-09 13:53   ` sashiko-bot
  2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:09 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Zhanpeng Zhang

From: Clément Léger <cleger@rivosinc.com>

The SBI SSE extension allows the supervisor software to be notified by
the SBI of specific events that are not maskable. The context switch is
handled partially by the firmware which will save registers a6 and a7.
When entering kernel we can rely on these 2 registers to setup the stack
and save all the registers.

Since SSE events can be delivered at any time to the kernel, including
during exception handling, we need a way to locate the current_task for
context tracking. On RISC-V, it is stored in scratch when in user space
or tp when in kernel space (in which case SSCRATCH is zero). But at the
beginning of exception handling, SSCRATCH is used to swap tp and
check the origin of the exception. If interrupted at that point, then,
there is no way to reliably know where is located the current
task_struct. Even checking the interruption location won't work as SSE
event can be nested on top of each other so the original interruption
site might be lost at some point. In order to retrieve it reliably,
store the current task in an additional __sse_entry_task per_cpu array.
This array is then used to retrieve the current task based on the
hart ID that is passed to the SSE event handler in a6.

That being said, the way the current task struct is stored should
probably be reworked to find a better reliable alternative.

Since each event (and each CPU for local events) have their own
context and can preempt each other, allocate a stack (and a shadow stack
if needed) for each of them (and for each cpu for local events).

The SSE handler runs as a synthetic handler episode, but completion must
resume the context that was interrupted by the SSE event.  Preserve stvec
and, when the hypervisor extension is present, hstatus across do_sse() so
the handler episode does not leak trap-vector or virtualization state into
the interrupted context.

When completing the event, if we were coming from kernel with interrupts
disabled, simply return there. If coming from userspace or kernel with
interrupts enabled, simulate an interrupt exception by setting IE_SIE in
CSR_IP to allow delivery of signals to user task. For instance this can
happen when a RAS event has been generated by a user application and a
SIGBUS has been sent to a task.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
---
 arch/riscv/include/asm/asm.h         |  14 +-
 arch/riscv/include/asm/scs.h         |   7 +
 arch/riscv/include/asm/sse.h         |  47 +++++++
 arch/riscv/include/asm/switch_to.h   |  14 ++
 arch/riscv/include/asm/thread_info.h |   1 +
 arch/riscv/kernel/Makefile           |   1 +
 arch/riscv/kernel/asm-offsets.c      |  14 ++
 arch/riscv/kernel/sbi_sse.c          | 162 +++++++++++++++++++++++
 arch/riscv/kernel/sbi_sse_entry.S    | 187 +++++++++++++++++++++++++++
 9 files changed, 444 insertions(+), 3 deletions(-)
 create mode 100644 arch/riscv/include/asm/sse.h
 create mode 100644 arch/riscv/kernel/sbi_sse.c
 create mode 100644 arch/riscv/kernel/sbi_sse_entry.S

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index e9e8ba83e632..50bace00053c 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -90,16 +90,24 @@
 .endm
 
 #ifdef CONFIG_SMP
-.macro asm_per_cpu dst sym tmp
-	lw    \tmp, TASK_TI_CPU_NUM(tp)
-	slli  \tmp, \tmp, RISCV_LGPTR
+.macro asm_per_cpu_with_cpu dst sym tmp cpu
+	slli  \tmp, \cpu, RISCV_LGPTR
 	la    \dst, __per_cpu_offset
 	add   \dst, \dst, \tmp
 	REG_L \tmp, 0(\dst)
 	la    \dst, \sym
 	add   \dst, \dst, \tmp
 .endm
+
+.macro asm_per_cpu dst sym tmp
+	lw \tmp, TASK_TI_CPU_NUM(tp)
+	asm_per_cpu_with_cpu \dst \sym \tmp \tmp
+.endm
 #else /* CONFIG_SMP */
+.macro asm_per_cpu_with_cpu dst sym tmp cpu
+	la    \dst, \sym
+.endm
+
 .macro asm_per_cpu dst sym tmp
 	la    \dst, \sym
 .endm
diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
index 023a412fe38d..0d70a35bc01a 100644
--- a/arch/riscv/include/asm/scs.h
+++ b/arch/riscv/include/asm/scs.h
@@ -17,6 +17,11 @@
 	load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
 .endm
 
+/* Load the per-CPU IRQ shadow call stack to gp. */
+.macro scs_load_sse_stack reg_evt
+	REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
+.endm
+
 /* Load task_scs_sp(current) to gp. */
 .macro scs_load_current
 	REG_L	gp, TASK_TI_SCS_SP(tp)
@@ -40,6 +45,8 @@
 .endm
 .macro scs_load_irq_stack tmp
 .endm
+.macro scs_load_sse_stack reg_evt
+.endm
 .macro scs_load_current
 .endm
 .macro scs_load_current_if_task_changed prev
diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
new file mode 100644
index 000000000000..d3ce8c2b5221
--- /dev/null
+++ b/arch/riscv/include/asm/sse.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+#ifndef __ASM_SSE_H
+#define __ASM_SSE_H
+
+#include <asm/sbi.h>
+
+#ifdef CONFIG_RISCV_SBI_SSE
+
+struct sse_event_interrupted_state {
+	unsigned long a6;
+	unsigned long a7;
+};
+
+struct sse_event_arch_data {
+	void *stack;
+	void *shadow_stack;
+	unsigned long tmp;
+	struct sse_event_interrupted_state interrupted;
+	unsigned long interrupted_phys;
+	u32 evt_id;
+	unsigned int hart_id;
+	unsigned int cpu_id;
+};
+
+static inline bool sse_event_is_global(u32 evt)
+{
+	return !!(evt & SBI_SSE_EVENT_GLOBAL);
+}
+
+void arch_sse_event_update_cpu(struct sse_event_arch_data *arch_evt, int cpu);
+int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id,
+			int cpu);
+void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
+int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
+
+void sse_handle_event(struct sse_event_arch_data *arch_evt,
+		      struct pt_regs *regs);
+asmlinkage void handle_sse(void);
+asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
+		       struct pt_regs *reg);
+
+#endif
+
+#endif
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..ded8dc440b12 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct task_struct *next)
 			:: "r" (next->thread.envcfg) : "memory");
 }
 
+#ifdef CONFIG_RISCV_SBI_SSE
+DECLARE_PER_CPU(struct task_struct *, __sbi_sse_entry_task);
+
+static inline void __switch_sbi_sse_entry_task(struct task_struct *next)
+{
+	__this_cpu_write(__sbi_sse_entry_task, next);
+}
+#else
+static inline void __switch_sbi_sse_entry_task(struct task_struct *next)
+{
+}
+#endif
+
 extern struct task_struct *__switch_to(struct task_struct *,
 				       struct task_struct *);
 
@@ -122,6 +135,7 @@ do {							\
 	if (switch_to_should_flush_icache(__next))	\
 		local_flush_icache_all();		\
 	__switch_to_envcfg(__next);			\
+	__switch_sbi_sse_entry_task(__next);		\
 	((last) = __switch_to(__prev, __next));		\
 } while (0)
 
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 55019fdfa9ec..d14b45610c73 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -36,6 +36,7 @@
 #define OVERFLOW_STACK_SIZE     SZ_4K
 
 #define IRQ_STACK_SIZE		THREAD_SIZE
+#define SSE_STACK_SIZE		THREAD_SIZE
 
 #ifndef __ASSEMBLER__
 
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index cabb99cadfb6..8d4aed6368a6 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -101,6 +101,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
 obj-$(CONFIG_PERF_EVENTS)	+= perf_callchain.o
 obj-$(CONFIG_HAVE_PERF_REGS)	+= perf_regs.o
 obj-$(CONFIG_RISCV_SBI)		+= sbi.o sbi_ecall.o
+obj-$(CONFIG_RISCV_SBI_SSE)	+= sbi_sse.o sbi_sse_entry.o
 ifeq ($(CONFIG_RISCV_SBI), y)
 obj-$(CONFIG_SMP)		+= sbi-ipi.o
 obj-$(CONFIG_SMP) += cpu_ops_sbi.o
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index af827448a609..7cc03f5a0673 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -15,6 +15,8 @@
 #include <asm/ptrace.h>
 #include <asm/cpu_ops_sbi.h>
 #include <asm/stacktrace.h>
+#include <asm/sbi.h>
+#include <asm/sse.h>
 #include <asm/suspend.h>
 
 void asm_offsets(void);
@@ -533,6 +535,18 @@ void asm_offsets(void)
 	DEFINE(FREGS_A6,	    offsetof(struct __arch_ftrace_regs, a6));
 	DEFINE(FREGS_A7,	    offsetof(struct __arch_ftrace_regs, a7));
 #endif
+
+#ifdef CONFIG_RISCV_SBI_SSE
+	OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
+	OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
+	OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
+	OFFSET(SSE_REG_HART_ID, sse_event_arch_data, hart_id);
+	OFFSET(SSE_REG_CPU_ID, sse_event_arch_data, cpu_id);
+
+	DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
+	DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
+	DEFINE(ASM_NR_CPUS, CONFIG_NR_CPUS);
+#endif
 #ifdef CONFIG_RISCV_SBI
 	DEFINE(SBI_EXT_FWFT, SBI_EXT_FWFT);
 	DEFINE(SBI_EXT_FWFT_SET, SBI_EXT_FWFT_SET);
diff --git a/arch/riscv/kernel/sbi_sse.c b/arch/riscv/kernel/sbi_sse.c
new file mode 100644
index 000000000000..f30ae9ff77c9
--- /dev/null
+++ b/arch/riscv/kernel/sbi_sse.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+#include <linux/nmi.h>
+#include <linux/scs.h>
+#include <linux/bitfield.h>
+#include <linux/percpu-defs.h>
+
+#include <asm/asm-prototypes.h>
+#include <asm/switch_to.h>
+#include <asm/irq_stack.h>
+#include <asm/sbi.h>
+#include <asm/sse.h>
+
+DEFINE_PER_CPU(struct task_struct *, __sbi_sse_entry_task);
+
+void __weak sse_handle_event(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
+{
+}
+
+void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
+{
+	nmi_enter();
+
+	/* Retrieve missing GPRs from SBI */
+	sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
+		  SBI_SSE_ATTR_INTERRUPTED_A6,
+		  (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
+		  arch_evt->interrupted_phys, 0, 0);
+
+	memcpy(&regs->a6, &arch_evt->interrupted, sizeof(arch_evt->interrupted));
+
+	sse_handle_event(arch_evt, regs);
+
+	nmi_exit();
+}
+
+static void *alloc_to_stack_pointer(void *alloc)
+{
+	return alloc ? alloc + SSE_STACK_SIZE : NULL;
+}
+
+static void *stack_pointer_to_alloc(void *stack)
+{
+	return stack - SSE_STACK_SIZE;
+}
+
+#ifdef CONFIG_VMAP_STACK
+static void *sse_stack_alloc(unsigned int cpu)
+{
+	void *stack = arch_alloc_vmap_stack(SSE_STACK_SIZE, cpu_to_node(cpu));
+
+	return alloc_to_stack_pointer(stack);
+}
+
+static void sse_stack_free(void *stack)
+{
+	vfree(stack_pointer_to_alloc(stack));
+}
+
+static void arch_sse_stack_cpu_sync(struct sse_event_arch_data *arch_evt)
+{
+	void *p_stack = arch_evt->stack;
+	unsigned long stack = (unsigned long)stack_pointer_to_alloc(p_stack);
+	unsigned long stack_end = stack + SSE_STACK_SIZE;
+
+	/*
+	 * Flush the tlb to avoid taking any exception when accessing the
+	 * vmapped stack inside the SSE handler
+	 */
+	if (sse_event_is_global(arch_evt->evt_id))
+		flush_tlb_kernel_range(stack, stack_end);
+	else
+		local_flush_tlb_kernel_range(stack, stack_end);
+}
+#else /* CONFIG_VMAP_STACK */
+static void *sse_stack_alloc(unsigned int cpu)
+{
+	void *stack = kmalloc(SSE_STACK_SIZE, GFP_KERNEL);
+
+	return alloc_to_stack_pointer(stack);
+}
+
+static void sse_stack_free(void *stack)
+{
+	kfree(stack_pointer_to_alloc(stack));
+}
+
+static void arch_sse_stack_cpu_sync(struct sse_event_arch_data *arch_evt) {}
+#endif /* CONFIG_VMAP_STACK */
+
+static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
+{
+	void *stack;
+
+	if (!scs_is_enabled())
+		return 0;
+
+	stack = scs_alloc(cpu_to_node(cpu));
+	if (!stack)
+		return -ENOMEM;
+
+	arch_evt->shadow_stack = stack;
+
+	return 0;
+}
+
+void arch_sse_event_update_cpu(struct sse_event_arch_data *arch_evt, int cpu)
+{
+	arch_evt->cpu_id = cpu;
+	arch_evt->hart_id = cpuid_to_hartid_map(cpu);
+}
+
+int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id,
+			int cpu)
+{
+	void *stack;
+
+	arch_evt->evt_id = evt_id;
+	stack = sse_stack_alloc(cpu);
+	if (!stack)
+		return -ENOMEM;
+
+	arch_evt->stack = stack;
+
+	if (sse_init_scs(cpu, arch_evt)) {
+		sse_stack_free(arch_evt->stack);
+		return -ENOMEM;
+	}
+
+	if (sse_event_is_global(evt_id)) {
+		arch_evt->interrupted_phys =
+					virt_to_phys(&arch_evt->interrupted);
+	} else {
+		arch_evt->interrupted_phys =
+				per_cpu_ptr_to_phys(&arch_evt->interrupted);
+	}
+
+	arch_sse_event_update_cpu(arch_evt, cpu);
+
+	return 0;
+}
+
+void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
+{
+	scs_free(arch_evt->shadow_stack);
+	sse_stack_free(arch_evt->stack);
+}
+
+int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
+{
+	struct sbiret sret;
+
+	arch_sse_stack_cpu_sync(arch_evt);
+
+	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt->evt_id,
+			 (unsigned long)handle_sse, (unsigned long)arch_evt, 0,
+			 0, 0);
+
+	return sbi_err_map_linux_errno(sret.error);
+}
diff --git a/arch/riscv/kernel/sbi_sse_entry.S b/arch/riscv/kernel/sbi_sse_entry.S
new file mode 100644
index 000000000000..6086751e6644
--- /dev/null
+++ b/arch/riscv/kernel/sbi_sse_entry.S
@@ -0,0 +1,187 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+
+#include <linux/init.h>
+#include <linux/linkage.h>
+
+#include <asm/alternative-macros.h>
+#include <asm/asm.h>
+#include <asm/csr.h>
+#include <asm/hwcap.h>
+#include <asm/scs.h>
+
+/* When entering handle_sse, the following registers are set:
+ * a6: contains the hartid
+ * a7: contains a sse_event_arch_data struct pointer
+ */
+SYM_CODE_START(handle_sse)
+	/* Save stack temporarily */
+	REG_S sp, SSE_REG_EVT_TMP(a7)
+	/* Set entry stack */
+	REG_L sp, SSE_REG_EVT_STACK(a7)
+
+	addi sp, sp, -(PT_SIZE_ON_STACK)
+	REG_S ra, PT_RA(sp)
+	REG_S s0, PT_S0(sp)
+	REG_S s1, PT_S1(sp)
+	REG_S s2, PT_S2(sp)
+	REG_S s3, PT_S3(sp)
+	REG_S s4, PT_S4(sp)
+	REG_S s5, PT_S5(sp)
+	REG_S s6, PT_S6(sp)
+	REG_S s7, PT_S7(sp)
+	REG_S s8, PT_S8(sp)
+	REG_S s9, PT_S9(sp)
+	REG_S s10, PT_S10(sp)
+	REG_S s11, PT_S11(sp)
+	REG_S tp, PT_TP(sp)
+	REG_S t0, PT_T0(sp)
+	REG_S t1, PT_T1(sp)
+	REG_S t2, PT_T2(sp)
+	REG_S t3, PT_T3(sp)
+	REG_S t4, PT_T4(sp)
+	REG_S t5, PT_T5(sp)
+	REG_S t6, PT_T6(sp)
+	REG_S gp, PT_GP(sp)
+	REG_S a0, PT_A0(sp)
+	REG_S a1, PT_A1(sp)
+	REG_S a2, PT_A2(sp)
+	REG_S a3, PT_A3(sp)
+	REG_S a4, PT_A4(sp)
+	REG_S a5, PT_A5(sp)
+
+	/* Retrieve entry sp */
+	REG_L a4, SSE_REG_EVT_TMP(a7)
+	/* Save CSRs */
+	csrr a0, CSR_EPC
+	csrr a1, CSR_SSTATUS
+	csrr a2, CSR_STVAL
+	csrr a3, CSR_SCAUSE
+
+	REG_S a0, PT_EPC(sp)
+	REG_S a1, PT_STATUS(sp)
+	REG_S a2, PT_BADADDR(sp)
+	REG_S a3, PT_CAUSE(sp)
+	REG_S a4, PT_SP(sp)
+
+	/* Disable user memory access and floating/vector computing */
+	li t0, SR_SUM | SR_FS_VS
+	csrc CSR_STATUS, t0
+
+	load_global_pointer
+	scs_load_sse_stack a7
+
+#ifdef CONFIG_SMP
+	lw t4, SSE_REG_HART_ID(a7)
+	lw t3, SSE_REG_CPU_ID(a7)
+
+	bne t4, a6, .Lfind_hart_id_slowpath
+
+.Lcpu_id_found:
+#else
+	mv t3, zero
+#endif
+
+	asm_per_cpu_with_cpu t2 __sbi_sse_entry_task t1 t3
+	REG_L tp, 0(t2)
+
+	mv a1, sp /* pt_regs on stack */
+
+	/*
+	 * Run the SSE handler with the normal exception vector, but restore the
+	 * interrupted stvec before completing the event. SSE can arrive while
+	 * the kernel is using a temporary trap vector in a sensitive entry path.
+	 */
+	csrr s3, CSR_STVEC
+	la t0, handle_exception
+	csrw CSR_STVEC, t0
+
+	/*
+	 * Preserve the full HS-mode virtualization state across the handler.
+	 * hstatus is live supervisor state rather than an SSE interrupted
+	 * attribute, and OpenSBI consumes hstatus.SPV during event completion.
+	 * Saving the whole CSR keeps the handler episode transparent to KVM and
+	 * avoids having to infer which hstatus bits may matter to a guest resume.
+	 */
+	ALTERNATIVE("nop", "csrr s5, hstatus", 0, RISCV_ISA_EXT_h, 1)
+
+	/*
+	 * Save sscratch for restoration since we might have interrupted the
+	 * kernel in early exception path and thus, we don't know the content of
+	 * sscratch.
+	 */
+	csrrw s4, CSR_SSCRATCH, x0
+
+	mv a0, a7
+
+	call do_sse
+
+	csrw CSR_SSCRATCH, s4
+	ALTERNATIVE("nop", "csrw hstatus, s5", 0, RISCV_ISA_EXT_h, 1)
+	csrw CSR_STVEC, s3
+
+	REG_L a0, PT_STATUS(sp)
+	REG_L a1, PT_EPC(sp)
+	REG_L a2, PT_BADADDR(sp)
+	REG_L a3, PT_CAUSE(sp)
+	csrw CSR_SSTATUS, a0
+	csrw CSR_EPC, a1
+	csrw CSR_STVAL, a2
+	csrw CSR_SCAUSE, a3
+
+	REG_L ra, PT_RA(sp)
+	REG_L s0, PT_S0(sp)
+	REG_L s1, PT_S1(sp)
+	REG_L s2, PT_S2(sp)
+	REG_L s3, PT_S3(sp)
+	REG_L s4, PT_S4(sp)
+	REG_L s5, PT_S5(sp)
+	REG_L s6, PT_S6(sp)
+	REG_L s7, PT_S7(sp)
+	REG_L s8, PT_S8(sp)
+	REG_L s9, PT_S9(sp)
+	REG_L s10, PT_S10(sp)
+	REG_L s11, PT_S11(sp)
+	REG_L tp, PT_TP(sp)
+	REG_L t0, PT_T0(sp)
+	REG_L t1, PT_T1(sp)
+	REG_L t2, PT_T2(sp)
+	REG_L t3, PT_T3(sp)
+	REG_L t4, PT_T4(sp)
+	REG_L t5, PT_T5(sp)
+	REG_L t6, PT_T6(sp)
+	REG_L gp, PT_GP(sp)
+	REG_L a0, PT_A0(sp)
+	REG_L a1, PT_A1(sp)
+	REG_L a2, PT_A2(sp)
+	REG_L a3, PT_A3(sp)
+	REG_L a4, PT_A4(sp)
+	REG_L a5, PT_A5(sp)
+
+	REG_L sp, PT_SP(sp)
+
+	li a7, SBI_EXT_SSE
+	li a6, SBI_SSE_EVENT_COMPLETE
+	ecall
+
+#ifdef CONFIG_SMP
+.Lfind_hart_id_slowpath:
+
+	/* Restore current task struct from __sbi_sse_entry_task */
+	li t1, ASM_NR_CPUS
+	/* Slowpath to find the CPU id associated to the hart id */
+	la t0, __cpuid_to_hartid_map
+
+.Lhart_id_loop:
+	REG_L t2, 0(t0)
+	beq t2, a6, .Lcpu_id_found
+
+	/* Increment pointer and CPU number */
+	addi t3, t3, 1
+	addi t0, t0, RISCV_SZPTR
+	bltu t3, t1, .Lhart_id_loop
+
+	/*
+	 * This should never happen since we expect the hart_id to match one
+	 * of our CPU, but better be safe than sorry
+	 */
+	la tp, init_task
+	la a0, sse_hart_id_panic_string
+	la t0, panic
+	jalr t0
+#endif
+
+SYM_CODE_END(handle_sse)
+
+SYM_DATA_START_LOCAL(sse_hart_id_panic_string)
+    .ascii "Unable to match hart_id with cpu\0"
+SYM_DATA_END(sse_hart_id_panic_string)
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension
  2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
@ 2026-05-09 13:53   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-09 13:53 UTC (permalink / raw)
  To: Zhanpeng Zhang; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [Critical] Uninitialized loop counter `t3` in the slow path causes out-of-bounds memory access and incorrect context tracking.
- [Critical] `__sbi_sse_entry_task` is uninitialized for the boot CPU, leading to a NULL pointer dereference.
- [High] `stack_pointer_to_alloc()` performs blind pointer arithmetic on `NULL`, causing a kernel crash during error cleanup.
- [High] Missing signal delivery interrupt simulation described in the commit message.
- [High] Data truncation of `interrupted_phys` corrupts physical memory on 32-bit systems with 64-bit physical addresses.
- [High] Missing TLB flush for vmap-backed Shadow Call Stack risks vmalloc faults in NMI context.
- [High] Kernel silently drops modifications to `a6` and `a7` registers, breaking kernel ABIs.
- [Medium] Truncation of `hart_id` to 32 bits and sign extension breaks the fast path for large hart IDs.
--

commit 984b54245d1919f79774c39971ea6d6896737561
Author: Clément Léger <cleger@rivosinc.com>
Subject: riscv: add support for SBI Supervisor Software Events extension

This commit introduces support for the SBI Supervisor Software Events (SSE)
extension on RISC-V. It allows the supervisor software to handle non-maskable
events dispatched by the SBI, tracking context via an additional per-cpu
array to safely handle interruptions.

The commit message states:
"simulate an interrupt exception by setting IE_SIE in CSR_IP to allow
delivery of signals to user task."

Looking at do_sse() and handle_sse(), there doesn't appear to be any logic
that checks if interrupts were enabled or writes to CSR_IP/CSR_SIP. Because
handle_sse() completes via an SBI call and returns directly to the
interrupted context, will signals queued during the handler (such as a
SIGBUS) fail to be delivered until an unrelated hardware interrupt occurs?

> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
> index 0000000000000..d3ce8c2b52212
> --- /dev/null
> +++ b/arch/riscv/include/asm/sse.h
[ ... ]
> +struct sse_event_arch_data {
> +	void *stack;
> +	void *shadow_stack;
> +	unsigned long tmp;
> +	struct sse_event_interrupted_state interrupted;
> +	unsigned long interrupted_phys;

On 32-bit systems with 64-bit physical addresses (Sv32), virt_to_phys()
returns a 64-bit phys_addr_t. Does assigning it to an unsigned long truncate
the upper 32 bits, causing the SBI firmware to read the interrupted state
from the wrong physical address when passed to sbi_ecall() in do_sse()?

> +	u32 evt_id;
> +	unsigned int hart_id;

Since RISC-V hart IDs are 64-bit (unsigned long), will storing this as a
32-bit unsigned int cause issues?

In sbi_sse_entry.S, this is loaded using lw which sign-extends the value. If
a hart ID has bit 31 set, it will become a negative 64-bit value and never
match the true 64-bit hart ID passed by SBI in a6, unnecessarily forcing the
slow path.

> +	unsigned int cpu_id;
> +};
[ ... ]
> diff --git a/arch/riscv/kernel/sbi_sse.c b/arch/riscv/kernel/sbi_sse.c
> index 0000000000000..f30ae9ff77c97
> --- /dev/null
> +++ b/arch/riscv/kernel/sbi_sse.c
[ ... ]
> +DEFINE_PER_CPU(struct task_struct *, __sbi_sse_entry_task);

Is it possible for an unmaskable SSE event to fire before the first context
switch? Since __sbi_sse_entry_task defaults to NULL and is only updated
during a context switch, the boot CPU or secondary idle tasks might not have
this initialized. Could this lead to a NULL pointer dereference when NMI
context attempts to access current via tp?

> +
> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> +}
> +
> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> +	nmi_enter();
> +
> +	/* Retrieve missing GPRs from SBI */
> +	sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
> +		  SBI_SSE_ATTR_INTERRUPTED_A6,
> +		  (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
> +		  arch_evt->interrupted_phys, 0, 0);
> +
> +	memcpy(&regs->a6, &arch_evt->interrupted, sizeof(arch_evt->interrupted));
> +
> +	sse_handle_event(arch_evt, regs);

If a kernel subsystem like ptrace or a signal handler modifies regs->a6
or regs->a7 during sse_handle_event(), are those changes lost?

The SBI firmware restores the interrupted context's a6 and a7 from its
internal state, but it looks like we only read these attributes here and
never execute SBI_SSE_EVENT_ATTR_WRITE to sync any modifications back.

> +
> +	nmi_exit();
> +}
> +
> +static void *alloc_to_stack_pointer(void *alloc)
> +{
> +	return alloc ? alloc + SSE_STACK_SIZE : NULL;
> +}
> +
> +static void *stack_pointer_to_alloc(void *stack)
> +{
> +	return stack - SSE_STACK_SIZE;

If arch_sse_free_event() is called to clean up after a partial initialization
failure, arch_evt->stack could be NULL. Does subtracting SSE_STACK_SIZE
from NULL create an invalid pointer that could cause a crash when passed
to vfree() or kfree()?

> +}
[ ... ]
> +static void arch_sse_stack_cpu_sync(struct sse_event_arch_data *arch_evt)
> +{
> +	void *p_stack = arch_evt->stack;
> +	unsigned long stack = (unsigned long)stack_pointer_to_alloc(p_stack);
> +	unsigned long stack_end = stack + SSE_STACK_SIZE;
> +
> +	/*
> +	 * Flush the tlb to avoid taking any exception when accessing the
> +	 * vmapped stack inside the SSE handler
> +	 */
> +	if (sse_event_is_global(arch_evt->evt_id))
> +		flush_tlb_kernel_range(stack, stack_end);
> +	else
> +		local_flush_tlb_kernel_range(stack, stack_end);
> +}

This function flushes the TLB for the vmap-backed arch_evt->stack to avoid
taking a vmalloc exception inside the non-reentrant SSE handler.

However, when Shadow Call Stack is enabled, scs_alloc() also allocates a
vmap-backed stack. Does arch_evt->shadow_stack need a similar TLB flush to
prevent vmalloc faults when pushing return addresses?

[ ... ]
> diff --git a/arch/riscv/kernel/sbi_sse_entry.S b/arch/riscv/kernel/sbi_sse_entry.S
> index 0000000000000..59ec62607adcb
> --- /dev/null
> +++ b/arch/riscv/kernel/sbi_sse_entry.S
[ ... ]
> +#ifdef CONFIG_SMP
> +.Lfind_hart_id_slowpath:
> +
> +	/* Restore current task struct from __sbi_sse_entry_task */
> +	li t1, ASM_NR_CPUS
> +	/* Slowpath to find the CPU id associated to the hart id */
> +	la t0, __cpuid_to_hartid_map

Should t3 be initialized to 0 here before the loop begins?

If the fastpath check fails earlier, t3 retains its previous value from
arch_evt->cpu_id. When a match is found at index N during the loop, the
calculated CPU ID would be arch_evt->cpu_id + N instead of N. Could this
incorrect index lead to an out-of-bounds pointer being loaded into tp?

> +
> +.Lhart_id_loop:
> +	REG_L t2, 0(t0)
> +	beq t2, a6, .Lcpu_id_found
> +
> +	/* Increment pointer and CPU number */
> +	addi t3, t3, 1
> +	addi t0, t0, RISCV_SZPTR
> +	bltu t3, t1, .Lhart_id_loop

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v9 3/5] drivers: firmware: add riscv SSE support
  2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
  2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
  2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
@ 2026-05-09 13:10 ` Zhanpeng Zhang
  2026-05-09 14:38   ` sashiko-bot
  2026-05-13  7:49   ` Himanshu Chauhan
  2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
  2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
  4 siblings, 2 replies; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:10 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Himanshu Chauhan, Zhanpeng Zhang,
	Conor Dooley

From: Clément Léger <cleger@rivosinc.com>

Add a driver-level interface for RISC-V SSE support.  The interface lets
Linux clients register handlers and enable, disable and complete SSE
events.  It will be used by PMU and GHES drivers.

Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
 MAINTAINERS                            |  16 +
 drivers/firmware/Kconfig               |   1 +
 drivers/firmware/Makefile              |   1 +
 drivers/firmware/riscv/Kconfig         |  15 +
 drivers/firmware/riscv/Makefile        |   3 +
 drivers/firmware/riscv/riscv_sbi_sse.c | 694 +++++++++++++++++++++++++
 include/linux/riscv_sbi_sse.h          |  57 ++
 7 files changed, 787 insertions(+)
 create mode 100644 drivers/firmware/riscv/Kconfig
 create mode 100644 drivers/firmware/riscv/Makefile
 create mode 100644 drivers/firmware/riscv/riscv_sbi_sse.c
 create mode 100644 include/linux/riscv_sbi_sse.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16..c247acc56d1c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22914,6 +22914,13 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
 F:	Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
 F:	drivers/iommu/riscv/
 
+RISC-V FIRMWARE DRIVERS
+M:	Conor Dooley <conor@kernel.org>
+L:	linux-riscv@lists.infradead.org
+S:	Maintained
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
+F:	drivers/firmware/riscv/
+
 RISC-V MICROCHIP SUPPORT
 M:	Conor Dooley <conor.dooley@microchip.com>
 M:	Daire McNamara <daire.mcnamara@microchip.com>
@@ -22999,6 +23006,15 @@ F:	arch/riscv/boot/dts/spacemit/
 N:	spacemit
 K:	spacemit
 
+RISC-V SSE DRIVER
+M:	Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
+R:	Himanshu Chauhan <himanshu@thechauhan.dev>
+R:	Yunhui Cui <cuiyunhui@bytedance.com>
+L:	linux-riscv@lists.infradead.org
+S:	Maintained
+F:	drivers/firmware/riscv/riscv_sbi_sse.c
+F:	include/linux/riscv_sbi_sse.h
+
 RISC-V TENSTORRENT SoC SUPPORT
 M:	Drew Fustini <dfustini@oss.tenstorrent.com>
 M:	Joel Stanley <jms@oss.tenstorrent.com>
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index bbd2155d8483..1894df87b08e 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -294,6 +294,7 @@ source "drivers/firmware/meson/Kconfig"
 source "drivers/firmware/microchip/Kconfig"
 source "drivers/firmware/psci/Kconfig"
 source "drivers/firmware/qcom/Kconfig"
+source "drivers/firmware/riscv/Kconfig"
 source "drivers/firmware/samsung/Kconfig"
 source "drivers/firmware/smccc/Kconfig"
 source "drivers/firmware/tegra/Kconfig"
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 4ddec2820c96..6cdd84570ea7 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -34,6 +34,7 @@ obj-y				+= efi/
 obj-y				+= imx/
 obj-y				+= psci/
 obj-y				+= qcom/
+obj-y				+= riscv/
 obj-y				+= samsung/
 obj-y				+= smccc/
 obj-y				+= tegra/
diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
new file mode 100644
index 000000000000..ed5b663ac5f9
--- /dev/null
+++ b/drivers/firmware/riscv/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Risc-V Specific firmware drivers"
+depends on RISCV
+
+config RISCV_SBI_SSE
+	bool "Enable SBI Supervisor Software Events support"
+	depends on RISCV_SBI
+	default y
+	help
+	  The Supervisor Software Events support allows the SBI to deliver
+	  NMI-like notifications to the supervisor mode software. When enabled,
+	  this option provides support to register callbacks on specific SSE
+	  events.
+
+endmenu
diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
new file mode 100644
index 000000000000..c8795d4bbb2e
--- /dev/null
+++ b/drivers/firmware/riscv/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_RISCV_SBI_SSE)		+= riscv_sbi_sse.o
diff --git a/drivers/firmware/riscv/riscv_sbi_sse.c b/drivers/firmware/riscv/riscv_sbi_sse.c
new file mode 100644
index 000000000000..830011ddd33f
--- /dev/null
+++ b/drivers/firmware/riscv/riscv_sbi_sse.c
@@ -0,0 +1,694 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+
+#define pr_fmt(fmt) "sse: " fmt
+
+#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
+#include <linux/cpu_pm.h>
+#include <linux/hardirq.h>
+#include <linux/list.h>
+#include <linux/percpu-defs.h>
+#include <linux/reboot.h>
+#include <linux/riscv_sbi_sse.h>
+#include <linux/slab.h>
+
+#include <asm/sbi.h>
+#include <asm/sse.h>
+
+struct sse_event {
+	struct list_head list;
+	u32 evt_id;
+	u32 priority;
+	sse_event_handler_fn *handler;
+	void *handler_arg;
+	/* Only valid for global events */
+	unsigned int cpu;
+
+	union {
+		struct sse_registered_event *global;
+		struct sse_registered_event __percpu *local;
+	};
+};
+
+static int sse_hp_state;
+static bool sse_available __ro_after_init;
+static DEFINE_SPINLOCK(events_list_lock);
+static LIST_HEAD(events);
+static DEFINE_MUTEX(sse_mutex);
+
+struct sse_registered_event {
+	struct sse_event_arch_data arch;
+	struct sse_event *event;
+	unsigned long attr;
+	bool is_enabled;
+};
+
+void sse_handle_event(struct sse_event_arch_data *arch_event,
+		      struct pt_regs *regs)
+{
+	int ret;
+	struct sse_registered_event *reg_evt =
+		container_of(arch_event, struct sse_registered_event, arch);
+	struct sse_event *evt = reg_evt->event;
+
+	ret = evt->handler(evt->evt_id, evt->handler_arg, regs);
+	if (ret)
+		pr_warn("event %x handler failed with error %d\n", evt->evt_id, ret);
+}
+
+static struct sse_event *sse_event_get(u32 evt)
+{
+	struct sse_event *event = NULL;
+
+	scoped_guard(spinlock, &events_list_lock) {
+		list_for_each_entry(event, &events, list) {
+			if (event->evt_id == evt)
+				return event;
+		}
+	}
+
+	return NULL;
+}
+
+static phys_addr_t sse_event_get_attr_phys(struct sse_registered_event *reg_evt)
+{
+	phys_addr_t phys;
+	void *addr = &reg_evt->attr;
+
+	if (sse_event_is_global(reg_evt->event->evt_id))
+		phys = virt_to_phys(addr);
+	else
+		phys = per_cpu_ptr_to_phys(addr);
+
+	return phys;
+}
+
+static struct sse_registered_event *sse_get_reg_evt(struct sse_event *event)
+{
+	if (sse_event_is_global(event->evt_id))
+		return event->global;
+	else
+		return per_cpu_ptr(event->local, smp_processor_id());
+}
+
+static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
+{
+	struct sbiret ret;
+	u32 evt = event->evt_id;
+	struct sse_registered_event *reg_evt = sse_get_reg_evt(event);
+
+	ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
+	if (ret.error) {
+		pr_warn("Failed to execute func %lx, event %x, error %ld\n",
+			func, evt, ret.error);
+		return sbi_err_map_linux_errno(ret.error);
+	}
+
+	if (func == SBI_SSE_EVENT_DISABLE)
+		reg_evt->is_enabled = false;
+	else if (func == SBI_SSE_EVENT_ENABLE)
+		reg_evt->is_enabled = true;
+
+	return 0;
+}
+
+int sse_event_disable_local(struct sse_event *event)
+{
+	return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
+}
+EXPORT_SYMBOL_GPL(sse_event_disable_local);
+
+int sse_event_enable_local(struct sse_event *event)
+{
+	return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
+}
+EXPORT_SYMBOL_GPL(sse_event_enable_local);
+
+static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
+				      unsigned long attr_id, unsigned long *val)
+{
+	struct sbiret sret;
+	u32 evt = reg_evt->event->evt_id;
+	unsigned long phys;
+
+	phys = sse_event_get_attr_phys(reg_evt);
+
+	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, 1,
+			 phys, 0, 0);
+	if (sret.error) {
+		pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
+			 attr_id, sret.error);
+		return sbi_err_map_linux_errno(sret.error);
+	}
+
+	*val = reg_evt->attr;
+
+	return 0;
+}
+
+static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
+				     unsigned long attr_id, unsigned long val)
+{
+	struct sbiret sret;
+	u32 evt = reg_evt->event->evt_id;
+	unsigned long phys;
+
+	reg_evt->attr = val;
+	phys = sse_event_get_attr_phys(reg_evt);
+
+	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt, attr_id, 1,
+			 phys, 0, 0);
+	if (sret.error)
+		pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
+			 attr_id, sret.error);
+
+	return sbi_err_map_linux_errno(sret.error);
+}
+
+static void sse_global_event_update_cpu(struct sse_event *event,
+					unsigned int cpu)
+{
+	struct sse_registered_event *reg_evt = event->global;
+
+	event->cpu = cpu;
+	arch_sse_event_update_cpu(&reg_evt->arch, cpu);
+}
+
+static int sse_event_set_target_cpu_nolock(struct sse_event *event,
+					   unsigned int cpu)
+{
+	unsigned long hart_id = cpuid_to_hartid_map(cpu);
+	struct sse_registered_event *reg_evt = event->global;
+	u32 evt = event->evt_id;
+	bool was_enabled;
+	int ret;
+
+	if (!sse_event_is_global(evt))
+		return -EINVAL;
+
+	was_enabled = reg_evt->is_enabled;
+	if (was_enabled)
+		sse_event_disable_local(event);
+
+	ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PREFERRED_HART,
+					hart_id);
+	if (ret == 0)
+		sse_global_event_update_cpu(event, cpu);
+
+	if (was_enabled)
+		sse_event_enable_local(event);
+
+	return 0;
+}
+
+int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
+{
+	int ret;
+
+	scoped_guard(mutex, &sse_mutex) {
+		scoped_guard(cpus_read_lock) {
+			if (!cpu_online(cpu))
+				return -EINVAL;
+
+			ret = sse_event_set_target_cpu_nolock(event, cpu);
+		}
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sse_event_set_target_cpu);
+
+static int sse_event_init_registered(unsigned int cpu,
+				     struct sse_registered_event *reg_evt,
+				     struct sse_event *event)
+{
+	reg_evt->event = event;
+
+	return arch_sse_init_event(&reg_evt->arch, event->evt_id, cpu);
+}
+
+static void sse_event_free_registered(struct sse_registered_event *reg_evt)
+{
+	arch_sse_free_event(&reg_evt->arch);
+}
+
+static int sse_event_alloc_global(struct sse_event *event)
+{
+	int err;
+	struct sse_registered_event *reg_evt;
+
+	reg_evt = kzalloc_obj(*reg_evt, GFP_KERNEL);
+	if (!reg_evt)
+		return -ENOMEM;
+
+	event->global = reg_evt;
+	err = sse_event_init_registered(smp_processor_id(), reg_evt, event);
+	if (err)
+		kfree(reg_evt);
+
+	return err;
+}
+
+static int sse_event_alloc_local(struct sse_event *event)
+{
+	int err;
+	unsigned int cpu, err_cpu;
+	struct sse_registered_event *reg_evt;
+	struct sse_registered_event __percpu *reg_evts;
+
+	reg_evts = alloc_percpu(struct sse_registered_event);
+	if (!reg_evts)
+		return -ENOMEM;
+
+	event->local = reg_evts;
+
+	for_each_possible_cpu(cpu) {
+		reg_evt = per_cpu_ptr(reg_evts, cpu);
+		err = sse_event_init_registered(cpu, reg_evt, event);
+		if (err) {
+			err_cpu = cpu;
+			goto err_free_per_cpu;
+		}
+	}
+
+	return 0;
+
+err_free_per_cpu:
+	for_each_possible_cpu(cpu) {
+		if (cpu == err_cpu)
+			break;
+		reg_evt = per_cpu_ptr(reg_evts, cpu);
+		sse_event_free_registered(reg_evt);
+	}
+
+	free_percpu(reg_evts);
+
+	return err;
+}
+
+static struct sse_event *sse_event_alloc(u32 evt, u32 priority,
+					 sse_event_handler_fn *handler,
+					 void *arg)
+{
+	int err;
+	struct sse_event *event;
+
+	event = kzalloc_obj(*event, GFP_KERNEL);
+	if (!event)
+		return ERR_PTR(-ENOMEM);
+
+	event->evt_id = evt;
+	event->priority = priority;
+	event->handler_arg = arg;
+	event->handler = handler;
+
+	if (sse_event_is_global(evt))
+		err = sse_event_alloc_global(event);
+	else
+		err = sse_event_alloc_local(event);
+
+	if (err) {
+		kfree(event);
+		return ERR_PTR(err);
+	}
+
+	return event;
+}
+
+static int sse_sbi_register_event(struct sse_event *event,
+				  struct sse_registered_event *reg_evt)
+{
+	int ret;
+
+	ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
+					event->priority);
+	if (ret)
+		return ret;
+
+	return arch_sse_register_event(&reg_evt->arch);
+}
+
+static int sse_event_register_local(struct sse_event *event)
+{
+	int ret;
+	struct sse_registered_event *reg_evt;
+
+	reg_evt = per_cpu_ptr(event->local, smp_processor_id());
+	ret = sse_sbi_register_event(event, reg_evt);
+	if (ret)
+		pr_debug("Failed to register event %x: err %d\n", event->evt_id,
+			 ret);
+
+	return ret;
+}
+
+static int sse_sbi_unregister_event(struct sse_event *event)
+{
+	return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
+}
+
+struct sse_per_cpu_evt {
+	struct sse_event *event;
+	unsigned long func;
+	cpumask_t error;
+};
+
+static void sse_event_per_cpu_func(void *info)
+{
+	int ret;
+	struct sse_per_cpu_evt *cpu_evt = info;
+
+	if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
+		ret = sse_event_register_local(cpu_evt->event);
+	else
+		ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
+
+	if (ret)
+		cpumask_set_cpu(smp_processor_id(), &cpu_evt->error);
+}
+
+static void sse_event_free(struct sse_event *event)
+{
+	unsigned int cpu;
+	struct sse_registered_event *reg_evt;
+
+	if (sse_event_is_global(event->evt_id)) {
+		sse_event_free_registered(event->global);
+		kfree(event->global);
+	} else {
+		for_each_possible_cpu(cpu) {
+			reg_evt = per_cpu_ptr(event->local, cpu);
+			sse_event_free_registered(reg_evt);
+		}
+		free_percpu(event->local);
+	}
+
+	kfree(event);
+}
+
+static int sse_on_each_cpu(struct sse_event *event, unsigned long func,
+			   unsigned long revert_func)
+{
+	struct sse_per_cpu_evt cpu_evt;
+
+	cpu_evt.event = event;
+	cpumask_clear(&cpu_evt.error);
+	cpu_evt.func = func;
+	on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+	/*
+	 * If there are some error reported by CPUs, revert event state on the
+	 * other ones
+	 */
+	if (!cpumask_empty(&cpu_evt.error)) {
+		cpumask_t revert;
+
+		cpumask_andnot(&revert, cpu_online_mask, &cpu_evt.error);
+		cpu_evt.func = revert_func;
+		on_each_cpu_mask(&revert, sse_event_per_cpu_func, &cpu_evt, 1);
+
+		return -EIO;
+	}
+
+	return 0;
+}
+
+int sse_event_enable(struct sse_event *event)
+{
+	int ret = 0;
+
+	scoped_guard(mutex, &sse_mutex) {
+		scoped_guard(cpus_read_lock) {
+			if (sse_event_is_global(event->evt_id)) {
+				ret = sse_event_enable_local(event);
+			} else {
+				ret = sse_on_each_cpu(event,
+						      SBI_SSE_EVENT_ENABLE,
+						      SBI_SSE_EVENT_DISABLE);
+			}
+		}
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sse_event_enable);
+
+static int sse_events_mask(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_HART_MASK, 0, 0, 0, 0, 0, 0);
+
+	return sbi_err_map_linux_errno(ret.error);
+}
+
+static int sse_events_unmask(void)
+{
+	struct sbiret ret;
+
+	ret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_HART_UNMASK, 0, 0, 0, 0, 0, 0);
+
+	return sbi_err_map_linux_errno(ret.error);
+}
+
+static void sse_event_disable_nolock(struct sse_event *event)
+{
+	struct sse_per_cpu_evt cpu_evt;
+
+	if (sse_event_is_global(event->evt_id)) {
+		sse_event_disable_local(event);
+	} else {
+		cpu_evt.event = event;
+		cpu_evt.func = SBI_SSE_EVENT_DISABLE;
+		on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+	}
+}
+
+void sse_event_disable(struct sse_event *event)
+{
+	scoped_guard(mutex, &sse_mutex) {
+		scoped_guard(cpus_read_lock) {
+			sse_event_disable_nolock(event);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(sse_event_disable);
+
+struct sse_event *sse_event_register(u32 evt, u32 priority,
+				     sse_event_handler_fn *handler, void *arg)
+{
+	struct sse_event *event;
+	int cpu;
+	int ret = 0;
+
+	if (!sse_available)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	guard(mutex)(&sse_mutex);
+	if (sse_event_get(evt))
+		return ERR_PTR(-EEXIST);
+
+	event = sse_event_alloc(evt, priority, handler, arg);
+	if (IS_ERR(event))
+		return event;
+
+	scoped_guard(cpus_read_lock) {
+		if (sse_event_is_global(evt)) {
+			unsigned long preferred_hart;
+
+			ret = sse_event_attr_get_no_lock(event->global,
+							 SBI_SSE_ATTR_PREFERRED_HART,
+							 &preferred_hart);
+			if (ret)
+				goto err_event_free;
+
+			cpu = riscv_hartid_to_cpuid(preferred_hart);
+			sse_global_event_update_cpu(event, cpu);
+
+			ret = sse_sbi_register_event(event, event->global);
+			if (ret)
+				goto err_event_free;
+
+		} else {
+			ret = sse_on_each_cpu(event, SBI_SSE_EVENT_REGISTER,
+					      SBI_SSE_EVENT_DISABLE);
+			if (ret)
+				goto err_event_free;
+		}
+	}
+
+	scoped_guard(spinlock, &events_list_lock)
+		list_add(&event->list, &events);
+
+	return event;
+
+err_event_free:
+	sse_event_free(event);
+
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(sse_event_register);
+
+static void sse_event_unregister_nolock(struct sse_event *event)
+{
+	struct sse_per_cpu_evt cpu_evt;
+
+	if (sse_event_is_global(event->evt_id)) {
+		sse_sbi_unregister_event(event);
+	} else {
+		cpu_evt.event = event;
+		cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
+		on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+	}
+}
+
+void sse_event_unregister(struct sse_event *event)
+{
+	scoped_guard(mutex, &sse_mutex) {
+		scoped_guard(cpus_read_lock)
+			sse_event_unregister_nolock(event);
+
+		scoped_guard(spinlock, &events_list_lock)
+			list_del(&event->list);
+
+		sse_event_free(event);
+	}
+}
+EXPORT_SYMBOL_GPL(sse_event_unregister);
+
+static int sse_cpu_online(unsigned int cpu)
+{
+	struct sse_event *event;
+
+	scoped_guard(spinlock, &events_list_lock) {
+		list_for_each_entry(event, &events, list) {
+			if (sse_event_is_global(event->evt_id))
+				continue;
+
+			sse_event_register_local(event);
+			if (sse_get_reg_evt(event))
+				sse_event_enable_local(event);
+		}
+	}
+
+	/* Ready to handle events. Unmask SSE. */
+	return sse_events_unmask();
+}
+
+static int sse_cpu_teardown(unsigned int cpu)
+{
+	int ret = 0;
+	unsigned int next_cpu;
+	struct sse_event *event;
+	struct sse_registered_event *reg_evt;
+
+	/* Mask the sse events */
+	ret = sse_events_mask();
+	if (ret)
+		return ret;
+
+	scoped_guard(spinlock, &events_list_lock) {
+		list_for_each_entry(event, &events, list) {
+			/* Disable local event on current cpu */
+			if (!sse_event_is_global(event->evt_id)) {
+				reg_evt = sse_get_reg_evt(event);
+				if (reg_evt->is_enabled)
+					sse_event_disable_local(event);
+
+				sse_sbi_unregister_event(event);
+				continue;
+			}
+
+			if (event->cpu != smp_processor_id())
+				continue;
+
+			/* Update destination hart for global event */
+			next_cpu = cpumask_any_but(cpu_online_mask, cpu);
+			ret = sse_event_set_target_cpu_nolock(event, next_cpu);
+		}
+	}
+
+	return ret;
+}
+
+static int sse_pm_notifier(struct notifier_block *nb, unsigned long action,
+			   void *data)
+{
+	WARN_ON_ONCE(preemptible());
+
+	switch (action) {
+	case CPU_PM_ENTER:
+		sse_events_mask();
+		break;
+	case CPU_PM_EXIT:
+	case CPU_PM_ENTER_FAILED:
+		sse_events_unmask();
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block sse_pm_nb = {
+	.notifier_call = sse_pm_notifier,
+};
+
+/*
+ * Mask all CPUs and unregister all events on panic, reboot or kexec.
+ */
+static int sse_reboot_notifier(struct notifier_block *nb, unsigned long action,
+			       void *data)
+{
+	cpuhp_remove_state(sse_hp_state);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block sse_reboot_nb = {
+	.notifier_call = sse_reboot_notifier,
+};
+
+static int __init sse_init(void)
+{
+	int ret;
+
+	if (sbi_probe_extension(SBI_EXT_SSE) <= 0) {
+		pr_info("Missing SBI SSE extension\n");
+		return -EOPNOTSUPP;
+	}
+	pr_info("SBI SSE extension detected\n");
+
+	ret = cpu_pm_register_notifier(&sse_pm_nb);
+	if (ret) {
+		pr_warn("Failed to register CPU PM notifier...\n");
+		return ret;
+	}
+
+	ret = register_reboot_notifier(&sse_reboot_nb);
+	if (ret) {
+		pr_warn("Failed to register reboot notifier...\n");
+		goto remove_cpupm;
+	}
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/sse:online",
+				sse_cpu_online, sse_cpu_teardown);
+	if (ret < 0)
+		goto remove_reboot;
+
+	sse_hp_state = ret;
+	sse_available = true;
+
+	return 0;
+
+remove_reboot:
+	unregister_reboot_notifier(&sse_reboot_nb);
+
+remove_cpupm:
+	cpu_pm_unregister_notifier(&sse_pm_nb);
+
+	return ret;
+}
+arch_initcall(sse_init);
diff --git a/include/linux/riscv_sbi_sse.h b/include/linux/riscv_sbi_sse.h
new file mode 100644
index 000000000000..84165cb5f2d0
--- /dev/null
+++ b/include/linux/riscv_sbi_sse.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+
+#ifndef __LINUX_RISCV_SBI_SSE_H
+#define __LINUX_RISCV_SBI_SSE_H
+
+#include <linux/types.h>
+#include <linux/linkage.h>
+
+struct sse_event;
+struct pt_regs;
+
+typedef int (sse_event_handler_fn)(u32 event_num, void *arg,
+				   struct pt_regs *regs);
+
+#ifdef CONFIG_RISCV_SBI_SSE
+
+struct sse_event *sse_event_register(u32 event_num, u32 priority,
+				     sse_event_handler_fn *handler, void *arg);
+
+void sse_event_unregister(struct sse_event *evt);
+
+int sse_event_set_target_cpu(struct sse_event *sse_evt, unsigned int cpu);
+
+int sse_event_enable(struct sse_event *sse_evt);
+
+void sse_event_disable(struct sse_event *sse_evt);
+
+int sse_event_enable_local(struct sse_event *sse_evt);
+int sse_event_disable_local(struct sse_event *sse_evt);
+
+#else
+static inline struct sse_event *sse_event_register(u32 event_num, u32 priority,
+						   sse_event_handler_fn *handler,
+						   void *arg)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void sse_event_unregister(struct sse_event *evt) {}
+
+static inline int sse_event_set_target_cpu(struct sse_event *sse_evt,
+					   unsigned int cpu)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int sse_event_enable(struct sse_event *sse_evt)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void sse_event_disable(struct sse_event *sse_evt) {}
+#endif
+#endif /* __LINUX_RISCV_SBI_SSE_H */
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 3/5] drivers: firmware: add riscv SSE support
  2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
@ 2026-05-09 14:38   ` sashiko-bot
  2026-05-13  7:49   ` Himanshu Chauhan
  1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-09 14:38 UTC (permalink / raw)
  To: Zhanpeng Zhang; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 12 potential issue(s) to consider:
- [Critical] Missing braces around `scoped_guard(cpus_read_lock)` in `sse_event_unregister` releases lock prematurely, leading to a race condition and use-after-free.
- [High] Event registration rollback uses the wrong revert function, leaking firmware registrations and causing use-after-free.
- [High] Missing validation of `riscv_hartid_to_cpuid` return value leads to out-of-bounds array access.
- [High] `sse_cpu_teardown` causes out-of-bounds array access on single-CPU teardown and fails to unregister global events across reboots.
- [High] 64-bit physical addresses are silently truncated to 32 bits on RV32.
- [High] CPU hotplug unconditionally enables local events, ignoring driver-defined disabled states.
- [High] Early release of `cpus_read_lock` in `sse_event_register` causes missed event registration on hotplugged CPUs.
- [High] `sse_event_alloc_global` calls `smp_processor_id()` in a preemptible context.
- [High] Missing synchronization in `sse_event_unregister()` leads to use-after-free with concurrently executing hardware event handlers.
- [Medium] `sse_event_set_target_cpu_nolock` silently ignores attribute write failures.
- [Medium] Mixing `goto` error handling with `guard()` and `scoped_guard()` declarations violates subsystem cleanup guidelines.
- [Medium] The commit message claims clients can 'complete SSE events', but no such API exists.
--

commit 1f43f4083e7681521e63a564e1ff80519c1569f5
Author: Clément Léger <cleger@rivosinc.com>
Subject: drivers: firmware: add riscv SSE support

This commit adds a driver-level interface for RISC-V SSE support, allowing
Linux clients to register handlers and manage SSE events. It supports both
local and global events and handles CPU hotplug and PM transitions.

>     Add a driver-level interface for RISC-V SSE support.  The interface lets
>     Linux clients register handlers and enable, disable and complete SSE
>     events.  It will be used by PMU and GHES drivers.

The commit message states the interface lets clients complete SSE events.
However, there does not appear to be any API provided to complete an event.

Could the commit message be updated to reflect the actual implementation?

> diff --git a/drivers/firmware/riscv/riscv_sbi_sse.c b/drivers/firmware/riscv/riscv_sbi_sse.c
> new file mode 100644
> index 0000000000000..830011ddd33f7
> --- /dev/null
> +++ b/drivers/firmware/riscv/riscv_sbi_sse.c
> @@ -0,0 +1,694 @@
[ ... ]
> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
> +				      unsigned long attr_id, unsigned long *val)
> +{
> +	struct sbiret sret;
> +	u32 evt = reg_evt->event->evt_id;
> +	unsigned long phys;
> +
> +	phys = sse_event_get_attr_phys(reg_evt);

Can this physical address be truncated on RV32 systems?

When CONFIG_ARCH_PHYS_ADDR_T_64BIT is enabled, physical addresses are 64 bits,
but unsigned long is 32 bits. Will the firmware receive a corrupted 32-bit
physical address since the ABI expects 64-bit values to be passed in two
registers on RV32?

[ ... ]
> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
> +				     unsigned long attr_id, unsigned long val)
> +{
> +	struct sbiret sret;
> +	u32 evt = reg_evt->event->evt_id;
> +	unsigned long phys;
> +
> +	reg_evt->attr = val;
> +	phys = sse_event_get_attr_phys(reg_evt);

Is this subject to the same RV32 physical address truncation mentioned above?

[ ... ]
> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
> +					   unsigned int cpu)
> +{
> +	unsigned long hart_id = cpuid_to_hartid_map(cpu);
> +	struct sse_registered_event *reg_evt = event->global;
> +	u32 evt = event->evt_id;
> +	bool was_enabled;
> +	int ret;
> +
> +	if (!sse_event_is_global(evt))
> +		return -EINVAL;
> +
> +	was_enabled = reg_evt->is_enabled;
> +	if (was_enabled)
> +		sse_event_disable_local(event);
> +
> +	ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PREFERRED_HART,
> +					hart_id);
> +	if (ret == 0)
> +		sse_global_event_update_cpu(event, cpu);
> +
> +	if (was_enabled)
> +		sse_event_enable_local(event);
> +
> +	return 0;
> +}

Should this function return the error code instead of unconditionally returning 0?

If the SBI call to set the preferred hart fails, the error is masked from the
caller, leaving the kernel thinking the migration succeeded while the firmware
state is out of sync.

[ ... ]
> +static int sse_event_alloc_global(struct sse_event *event)
> +{
> +	int err;
> +	struct sse_registered_event *reg_evt;
> +
> +	reg_evt = kzalloc_obj(*reg_evt, GFP_KERNEL);
> +	if (!reg_evt)
> +		return -ENOMEM;
> +
> +	event->global = reg_evt;
> +	err = sse_event_init_registered(smp_processor_id(), reg_evt, event);

Is it safe to call smp_processor_id() here?

sse_event_alloc_global() is invoked from sse_event_register(), which only
holds a mutex and runs with preemption enabled. Could this trigger a warning
if CONFIG_DEBUG_PREEMPT is enabled, or allow the thread to migrate
immediately after, yielding an invalid CPU ID?

[ ... ]
> +struct sse_event *sse_event_register(u32 evt, u32 priority,
> +				     sse_event_handler_fn *handler, void *arg)
> +{
> +	struct sse_event *event;
> +	int cpu;
> +	int ret = 0;
> +
> +	if (!sse_available)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	guard(mutex)(&sse_mutex);

This isn't a bug, but is it intended to mix goto-based error handling with
scope-based guard declarations in the same function?

The cleanup subsystem documentation recommends against this pattern as it can
create confusing ownership semantics.

> +	if (sse_event_get(evt))
> +		return ERR_PTR(-EEXIST);
> +
> +	event = sse_event_alloc(evt, priority, handler, arg);
> +	if (IS_ERR(event))
> +		return event;
> +
> +	scoped_guard(cpus_read_lock) {
> +		if (sse_event_is_global(evt)) {
> +			unsigned long preferred_hart;
> +
> +			ret = sse_event_attr_get_no_lock(event->global,
> +							 SBI_SSE_ATTR_PREFERRED_HART,
> +							 &preferred_hart);
> +			if (ret)
> +				goto err_event_free;
> +
> +			cpu = riscv_hartid_to_cpuid(preferred_hart);

Can riscv_hartid_to_cpuid() return -ENOENT here?

If the firmware returns an unknown or unmapped hart ID, it appears the
negative value is passed to sse_global_event_update_cpu(), which uses it to
index the logical hart array via cpuid_to_hartid_map(), potentially causing
an out-of-bounds memory read.

> +			sse_global_event_update_cpu(event, cpu);
> +
> +			ret = sse_sbi_register_event(event, event->global);
> +			if (ret)
> +				goto err_event_free;
> +
> +		} else {
> +			ret = sse_on_each_cpu(event, SBI_SSE_EVENT_REGISTER,
> +					      SBI_SSE_EVENT_DISABLE);

Should the revert function be SBI_SSE_EVENT_UNREGISTER instead of
SBI_SSE_EVENT_DISABLE?

Disabling an event merely masks it without unregistering it from the firmware.
If registration fails on any CPU, does this leave the firmware holding an
active registration pointing to freed memory after the error path frees the
event structures?

> +			if (ret)
> +				goto err_event_free;
> +		}
> +	}
> +
> +	scoped_guard(spinlock, &events_list_lock)
> +		list_add(&event->list, &events);

Because the cpus_read_lock is released before acquiring events_list_lock to
add the event to the global list, could a CPU hotplug operation occur in
this window?

If so, sse_cpu_online() would iterate over the list and miss the new event,
resulting in the event not being registered on the new CPU.

> +
> +	return event;
> +
> +err_event_free:
> +	sse_event_free(event);
> +
> +	return ERR_PTR(ret);
> +}
[ ... ]
> +void sse_event_unregister(struct sse_event *event)
> +{
> +	scoped_guard(mutex, &sse_mutex) {
> +		scoped_guard(cpus_read_lock)
> +			sse_event_unregister_nolock(event);

Is there a missing compound block around the cpus_read_lock guard?

Without braces, the lock is dropped before acquiring events_list_lock. Could a
concurrent CPU hotplug operation acquire the list lock, find the partially
unregistered event still in the list, and erroneously register and enable
it in the firmware for the new CPU?

> +
> +		scoped_guard(spinlock, &events_list_lock)
> +			list_del(&event->list);
> +
> +		sse_event_free(event);

Since SBI events are delivered as NMI-like notifications, could an event have
just fired and its handler be actively executing on another CPU?

Without synchronization like waiting for active handlers or RCU grace periods,
it seems sse_handle_event() might dereference evt->handler and
evt->handler_arg after the memory has been freed.

> +	}
> +}
[ ... ]
> +static int sse_cpu_online(unsigned int cpu)
> +{
> +	struct sse_event *event;
> +
> +	scoped_guard(spinlock, &events_list_lock) {
> +		list_for_each_entry(event, &events, list) {
> +			if (sse_event_is_global(event->evt_id))
> +				continue;
> +
> +			sse_event_register_local(event);
> +			if (sse_get_reg_evt(event))
> +				sse_event_enable_local(event);

Should this check reg_evt->is_enabled before unconditionally enabling the event?

If a driver registers an event but intentionally leaves it disabled, it looks
like it will unexpectedly become enabled on the newly hotplugged CPU.

> +		}
> +	}
> +
> +	/* Ready to handle events. Unmask SSE. */
> +	return sse_events_unmask();
> +}
> +
> +static int sse_cpu_teardown(unsigned int cpu)
> +{
> +	int ret = 0;
> +	unsigned int next_cpu;
> +	struct sse_event *event;
> +	struct sse_registered_event *reg_evt;
> +
> +	/* Mask the sse events */
> +	ret = sse_events_mask();
> +	if (ret)
> +		return ret;
> +
> +	scoped_guard(spinlock, &events_list_lock) {
> +		list_for_each_entry(event, &events, list) {
> +			/* Disable local event on current cpu */
> +			if (!sse_event_is_global(event->evt_id)) {
> +				reg_evt = sse_get_reg_evt(event);
> +				if (reg_evt->is_enabled)
> +					sse_event_disable_local(event);
> +
> +				sse_sbi_unregister_event(event);
> +				continue;
> +			}
> +
> +			if (event->cpu != smp_processor_id())
> +				continue;
> +
> +			/* Update destination hart for global event */
> +			next_cpu = cpumask_any_but(cpu_online_mask, cpu);
> +			ret = sse_event_set_target_cpu_nolock(event, next_cpu);

What happens if there are no other CPUs online during teardown?

If cpumask_any_but() returns a value greater than or equal to nr_cpu_ids,
passing it to sse_event_set_target_cpu_nolock() might result in an
out-of-bounds array read in cpuid_to_hartid_map().

Furthermore, by migrating rather than unregistering global events on reboot,
does this break kexec by leaving the event actively registered for the next
kernel?

> +		}
> +	}
> +
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 3/5] drivers: firmware: add riscv SSE support
  2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
  2026-05-09 14:38   ` sashiko-bot
@ 2026-05-13  7:49   ` Himanshu Chauhan
  1 sibling, 0 replies; 11+ messages in thread
From: Himanshu Chauhan @ 2026-05-13  7:49 UTC (permalink / raw)
  To: Zhanpeng Zhang
  Cc: linux-riscv, linux-kernel, linux-perf-users, linux-kselftest, kvm,
	kvm-riscv, Paul Walmsley, Palmer Dabbelt, Albert Ou,
	Alexandre Ghiti, Conor Dooley, Clément Léger,
	Himanshu Chauhan, Will Deacon, Mark Rutland, Atish Patra,
	Anup Patel, Shuah Khan, Yunhui Cui, Zhu Yuan, Himanshu Chauhan,
	Conor Dooley

Hi Zhanpeng,

On Sat, May 9, 2026 at 6:41 PM Zhanpeng Zhang
<zhangzhanpeng.jasper@bytedance.com> wrote:
>
> From: Clément Léger <cleger@rivosinc.com>
>
> Add a driver-level interface for RISC-V SSE support.  The interface lets
> Linux clients register handlers and enable, disable and complete SSE
> events.  It will be used by PMU and GHES drivers.
>
> Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
> Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
> ---
>  MAINTAINERS                            |  16 +
>  drivers/firmware/Kconfig               |   1 +
>  drivers/firmware/Makefile              |   1 +
>  drivers/firmware/riscv/Kconfig         |  15 +
>  drivers/firmware/riscv/Makefile        |   3 +
>  drivers/firmware/riscv/riscv_sbi_sse.c | 694 +++++++++++++++++++++++++
>  include/linux/riscv_sbi_sse.h          |  57 ++
>  7 files changed, 787 insertions(+)
>  create mode 100644 drivers/firmware/riscv/Kconfig
>  create mode 100644 drivers/firmware/riscv/Makefile
>  create mode 100644 drivers/firmware/riscv/riscv_sbi_sse.c
>  create mode 100644 include/linux/riscv_sbi_sse.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2fb1c75afd16..c247acc56d1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22914,6 +22914,13 @@ T:     git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
>  F:     Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>  F:     drivers/iommu/riscv/
>
> +RISC-V FIRMWARE DRIVERS
> +M:     Conor Dooley <conor@kernel.org>
> +L:     linux-riscv@lists.infradead.org
> +S:     Maintained
> +T:     git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
> +F:     drivers/firmware/riscv/
> +
>  RISC-V MICROCHIP SUPPORT
>  M:     Conor Dooley <conor.dooley@microchip.com>
>  M:     Daire McNamara <daire.mcnamara@microchip.com>
> @@ -22999,6 +23006,15 @@ F:     arch/riscv/boot/dts/spacemit/
>  N:     spacemit
>  K:     spacemit
>
> +RISC-V SSE DRIVER
> +M:     Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
> +R:     Himanshu Chauhan <himanshu@thechauhan.dev>

If Clement is not interested in maintaining the SSE implementation,
please add me as the co-maintainer not just reviewer for SSE.

Regards
Himanshu

> +R:     Yunhui Cui <cuiyunhui@bytedance.com>
> +L:     linux-riscv@lists.infradead.org
> +S:     Maintained
> +F:     drivers/firmware/riscv/riscv_sbi_sse.c
> +F:     include/linux/riscv_sbi_sse.h
> +
>  RISC-V TENSTORRENT SoC SUPPORT
>  M:     Drew Fustini <dfustini@oss.tenstorrent.com>
>  M:     Joel Stanley <jms@oss.tenstorrent.com>
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index bbd2155d8483..1894df87b08e 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -294,6 +294,7 @@ source "drivers/firmware/meson/Kconfig"
>  source "drivers/firmware/microchip/Kconfig"
>  source "drivers/firmware/psci/Kconfig"
>  source "drivers/firmware/qcom/Kconfig"
> +source "drivers/firmware/riscv/Kconfig"
>  source "drivers/firmware/samsung/Kconfig"
>  source "drivers/firmware/smccc/Kconfig"
>  source "drivers/firmware/tegra/Kconfig"
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 4ddec2820c96..6cdd84570ea7 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -34,6 +34,7 @@ obj-y                         += efi/
>  obj-y                          += imx/
>  obj-y                          += psci/
>  obj-y                          += qcom/
> +obj-y                          += riscv/
>  obj-y                          += samsung/
>  obj-y                          += smccc/
>  obj-y                          += tegra/
> diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
> new file mode 100644
> index 000000000000..ed5b663ac5f9
> --- /dev/null
> +++ b/drivers/firmware/riscv/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Risc-V Specific firmware drivers"
> +depends on RISCV
> +
> +config RISCV_SBI_SSE
> +       bool "Enable SBI Supervisor Software Events support"
> +       depends on RISCV_SBI
> +       default y
> +       help
> +         The Supervisor Software Events support allows the SBI to deliver
> +         NMI-like notifications to the supervisor mode software. When enabled,
> +         this option provides support to register callbacks on specific SSE
> +         events.
> +
> +endmenu
> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
> new file mode 100644
> index 000000000000..c8795d4bbb2e
> --- /dev/null
> +++ b/drivers/firmware/riscv/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_RISCV_SBI_SSE)            += riscv_sbi_sse.o
> diff --git a/drivers/firmware/riscv/riscv_sbi_sse.c b/drivers/firmware/riscv/riscv_sbi_sse.c
> new file mode 100644
> index 000000000000..830011ddd33f
> --- /dev/null
> +++ b/drivers/firmware/riscv/riscv_sbi_sse.c
> @@ -0,0 +1,694 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2025 Rivos Inc.
> + */
> +
> +#define pr_fmt(fmt) "sse: " fmt
> +
> +#include <linux/cpu.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/cpu_pm.h>
> +#include <linux/hardirq.h>
> +#include <linux/list.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/reboot.h>
> +#include <linux/riscv_sbi_sse.h>
> +#include <linux/slab.h>
> +
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> +
> +struct sse_event {
> +       struct list_head list;
> +       u32 evt_id;
> +       u32 priority;
> +       sse_event_handler_fn *handler;
> +       void *handler_arg;
> +       /* Only valid for global events */
> +       unsigned int cpu;
> +
> +       union {
> +               struct sse_registered_event *global;
> +               struct sse_registered_event __percpu *local;
> +       };
> +};
> +
> +static int sse_hp_state;
> +static bool sse_available __ro_after_init;
> +static DEFINE_SPINLOCK(events_list_lock);
> +static LIST_HEAD(events);
> +static DEFINE_MUTEX(sse_mutex);
> +
> +struct sse_registered_event {
> +       struct sse_event_arch_data arch;
> +       struct sse_event *event;
> +       unsigned long attr;
> +       bool is_enabled;
> +};
> +
> +void sse_handle_event(struct sse_event_arch_data *arch_event,
> +                     struct pt_regs *regs)
> +{
> +       int ret;
> +       struct sse_registered_event *reg_evt =
> +               container_of(arch_event, struct sse_registered_event, arch);
> +       struct sse_event *evt = reg_evt->event;
> +
> +       ret = evt->handler(evt->evt_id, evt->handler_arg, regs);
> +       if (ret)
> +               pr_warn("event %x handler failed with error %d\n", evt->evt_id, ret);
> +}
> +
> +static struct sse_event *sse_event_get(u32 evt)
> +{
> +       struct sse_event *event = NULL;
> +
> +       scoped_guard(spinlock, &events_list_lock) {
> +               list_for_each_entry(event, &events, list) {
> +                       if (event->evt_id == evt)
> +                               return event;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static phys_addr_t sse_event_get_attr_phys(struct sse_registered_event *reg_evt)
> +{
> +       phys_addr_t phys;
> +       void *addr = &reg_evt->attr;
> +
> +       if (sse_event_is_global(reg_evt->event->evt_id))
> +               phys = virt_to_phys(addr);
> +       else
> +               phys = per_cpu_ptr_to_phys(addr);
> +
> +       return phys;
> +}
> +
> +static struct sse_registered_event *sse_get_reg_evt(struct sse_event *event)
> +{
> +       if (sse_event_is_global(event->evt_id))
> +               return event->global;
> +       else
> +               return per_cpu_ptr(event->local, smp_processor_id());
> +}
> +
> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
> +{
> +       struct sbiret ret;
> +       u32 evt = event->evt_id;
> +       struct sse_registered_event *reg_evt = sse_get_reg_evt(event);
> +
> +       ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
> +       if (ret.error) {
> +               pr_warn("Failed to execute func %lx, event %x, error %ld\n",
> +                       func, evt, ret.error);
> +               return sbi_err_map_linux_errno(ret.error);
> +       }
> +
> +       if (func == SBI_SSE_EVENT_DISABLE)
> +               reg_evt->is_enabled = false;
> +       else if (func == SBI_SSE_EVENT_ENABLE)
> +               reg_evt->is_enabled = true;
> +
> +       return 0;
> +}
> +
> +int sse_event_disable_local(struct sse_event *event)
> +{
> +       return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
> +}
> +EXPORT_SYMBOL_GPL(sse_event_disable_local);
> +
> +int sse_event_enable_local(struct sse_event *event)
> +{
> +       return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
> +}
> +EXPORT_SYMBOL_GPL(sse_event_enable_local);
> +
> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
> +                                     unsigned long attr_id, unsigned long *val)
> +{
> +       struct sbiret sret;
> +       u32 evt = reg_evt->event->evt_id;
> +       unsigned long phys;
> +
> +       phys = sse_event_get_attr_phys(reg_evt);
> +
> +       sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, 1,
> +                        phys, 0, 0);
> +       if (sret.error) {
> +               pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
> +                        attr_id, sret.error);
> +               return sbi_err_map_linux_errno(sret.error);
> +       }
> +
> +       *val = reg_evt->attr;
> +
> +       return 0;
> +}
> +
> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
> +                                    unsigned long attr_id, unsigned long val)
> +{
> +       struct sbiret sret;
> +       u32 evt = reg_evt->event->evt_id;
> +       unsigned long phys;
> +
> +       reg_evt->attr = val;
> +       phys = sse_event_get_attr_phys(reg_evt);
> +
> +       sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt, attr_id, 1,
> +                        phys, 0, 0);
> +       if (sret.error)
> +               pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
> +                        attr_id, sret.error);
> +
> +       return sbi_err_map_linux_errno(sret.error);
> +}
> +
> +static void sse_global_event_update_cpu(struct sse_event *event,
> +                                       unsigned int cpu)
> +{
> +       struct sse_registered_event *reg_evt = event->global;
> +
> +       event->cpu = cpu;
> +       arch_sse_event_update_cpu(&reg_evt->arch, cpu);
> +}
> +
> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
> +                                          unsigned int cpu)
> +{
> +       unsigned long hart_id = cpuid_to_hartid_map(cpu);
> +       struct sse_registered_event *reg_evt = event->global;
> +       u32 evt = event->evt_id;
> +       bool was_enabled;
> +       int ret;
> +
> +       if (!sse_event_is_global(evt))
> +               return -EINVAL;
> +
> +       was_enabled = reg_evt->is_enabled;
> +       if (was_enabled)
> +               sse_event_disable_local(event);
> +
> +       ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PREFERRED_HART,
> +                                       hart_id);
> +       if (ret == 0)
> +               sse_global_event_update_cpu(event, cpu);
> +
> +       if (was_enabled)
> +               sse_event_enable_local(event);
> +
> +       return 0;
> +}
> +
> +int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
> +{
> +       int ret;
> +
> +       scoped_guard(mutex, &sse_mutex) {
> +               scoped_guard(cpus_read_lock) {
> +                       if (!cpu_online(cpu))
> +                               return -EINVAL;
> +
> +                       ret = sse_event_set_target_cpu_nolock(event, cpu);
> +               }
> +       }
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(sse_event_set_target_cpu);
> +
> +static int sse_event_init_registered(unsigned int cpu,
> +                                    struct sse_registered_event *reg_evt,
> +                                    struct sse_event *event)
> +{
> +       reg_evt->event = event;
> +
> +       return arch_sse_init_event(&reg_evt->arch, event->evt_id, cpu);
> +}
> +
> +static void sse_event_free_registered(struct sse_registered_event *reg_evt)
> +{
> +       arch_sse_free_event(&reg_evt->arch);
> +}
> +
> +static int sse_event_alloc_global(struct sse_event *event)
> +{
> +       int err;
> +       struct sse_registered_event *reg_evt;
> +
> +       reg_evt = kzalloc_obj(*reg_evt, GFP_KERNEL);
> +       if (!reg_evt)
> +               return -ENOMEM;
> +
> +       event->global = reg_evt;
> +       err = sse_event_init_registered(smp_processor_id(), reg_evt, event);
> +       if (err)
> +               kfree(reg_evt);
> +
> +       return err;
> +}
> +
> +static int sse_event_alloc_local(struct sse_event *event)
> +{
> +       int err;
> +       unsigned int cpu, err_cpu;
> +       struct sse_registered_event *reg_evt;
> +       struct sse_registered_event __percpu *reg_evts;
> +
> +       reg_evts = alloc_percpu(struct sse_registered_event);
> +       if (!reg_evts)
> +               return -ENOMEM;
> +
> +       event->local = reg_evts;
> +
> +       for_each_possible_cpu(cpu) {
> +               reg_evt = per_cpu_ptr(reg_evts, cpu);
> +               err = sse_event_init_registered(cpu, reg_evt, event);
> +               if (err) {
> +                       err_cpu = cpu;
> +                       goto err_free_per_cpu;
> +               }
> +       }
> +
> +       return 0;
> +
> +err_free_per_cpu:
> +       for_each_possible_cpu(cpu) {
> +               if (cpu == err_cpu)
> +                       break;
> +               reg_evt = per_cpu_ptr(reg_evts, cpu);
> +               sse_event_free_registered(reg_evt);
> +       }
> +
> +       free_percpu(reg_evts);
> +
> +       return err;
> +}
> +
> +static struct sse_event *sse_event_alloc(u32 evt, u32 priority,
> +                                        sse_event_handler_fn *handler,
> +                                        void *arg)
> +{
> +       int err;
> +       struct sse_event *event;
> +
> +       event = kzalloc_obj(*event, GFP_KERNEL);
> +       if (!event)
> +               return ERR_PTR(-ENOMEM);
> +
> +       event->evt_id = evt;
> +       event->priority = priority;
> +       event->handler_arg = arg;
> +       event->handler = handler;
> +
> +       if (sse_event_is_global(evt))
> +               err = sse_event_alloc_global(event);
> +       else
> +               err = sse_event_alloc_local(event);
> +
> +       if (err) {
> +               kfree(event);
> +               return ERR_PTR(err);
> +       }
> +
> +       return event;
> +}
> +
> +static int sse_sbi_register_event(struct sse_event *event,
> +                                 struct sse_registered_event *reg_evt)
> +{
> +       int ret;
> +
> +       ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
> +                                       event->priority);
> +       if (ret)
> +               return ret;
> +
> +       return arch_sse_register_event(&reg_evt->arch);
> +}
> +
> +static int sse_event_register_local(struct sse_event *event)
> +{
> +       int ret;
> +       struct sse_registered_event *reg_evt;
> +
> +       reg_evt = per_cpu_ptr(event->local, smp_processor_id());
> +       ret = sse_sbi_register_event(event, reg_evt);
> +       if (ret)
> +               pr_debug("Failed to register event %x: err %d\n", event->evt_id,
> +                        ret);
> +
> +       return ret;
> +}
> +
> +static int sse_sbi_unregister_event(struct sse_event *event)
> +{
> +       return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
> +}
> +
> +struct sse_per_cpu_evt {
> +       struct sse_event *event;
> +       unsigned long func;
> +       cpumask_t error;
> +};
> +
> +static void sse_event_per_cpu_func(void *info)
> +{
> +       int ret;
> +       struct sse_per_cpu_evt *cpu_evt = info;
> +
> +       if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
> +               ret = sse_event_register_local(cpu_evt->event);
> +       else
> +               ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
> +
> +       if (ret)
> +               cpumask_set_cpu(smp_processor_id(), &cpu_evt->error);
> +}
> +
> +static void sse_event_free(struct sse_event *event)
> +{
> +       unsigned int cpu;
> +       struct sse_registered_event *reg_evt;
> +
> +       if (sse_event_is_global(event->evt_id)) {
> +               sse_event_free_registered(event->global);
> +               kfree(event->global);
> +       } else {
> +               for_each_possible_cpu(cpu) {
> +                       reg_evt = per_cpu_ptr(event->local, cpu);
> +                       sse_event_free_registered(reg_evt);
> +               }
> +               free_percpu(event->local);
> +       }
> +
> +       kfree(event);
> +}
> +
> +static int sse_on_each_cpu(struct sse_event *event, unsigned long func,
> +                          unsigned long revert_func)
> +{
> +       struct sse_per_cpu_evt cpu_evt;
> +
> +       cpu_evt.event = event;
> +       cpumask_clear(&cpu_evt.error);
> +       cpu_evt.func = func;
> +       on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> +       /*
> +        * If there are some error reported by CPUs, revert event state on the
> +        * other ones
> +        */
> +       if (!cpumask_empty(&cpu_evt.error)) {
> +               cpumask_t revert;
> +
> +               cpumask_andnot(&revert, cpu_online_mask, &cpu_evt.error);
> +               cpu_evt.func = revert_func;
> +               on_each_cpu_mask(&revert, sse_event_per_cpu_func, &cpu_evt, 1);
> +
> +               return -EIO;
> +       }
> +
> +       return 0;
> +}
> +
> +int sse_event_enable(struct sse_event *event)
> +{
> +       int ret = 0;
> +
> +       scoped_guard(mutex, &sse_mutex) {
> +               scoped_guard(cpus_read_lock) {
> +                       if (sse_event_is_global(event->evt_id)) {
> +                               ret = sse_event_enable_local(event);
> +                       } else {
> +                               ret = sse_on_each_cpu(event,
> +                                                     SBI_SSE_EVENT_ENABLE,
> +                                                     SBI_SSE_EVENT_DISABLE);
> +                       }
> +               }
> +       }
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(sse_event_enable);
> +
> +static int sse_events_mask(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_HART_MASK, 0, 0, 0, 0, 0, 0);
> +
> +       return sbi_err_map_linux_errno(ret.error);
> +}
> +
> +static int sse_events_unmask(void)
> +{
> +       struct sbiret ret;
> +
> +       ret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_HART_UNMASK, 0, 0, 0, 0, 0, 0);
> +
> +       return sbi_err_map_linux_errno(ret.error);
> +}
> +
> +static void sse_event_disable_nolock(struct sse_event *event)
> +{
> +       struct sse_per_cpu_evt cpu_evt;
> +
> +       if (sse_event_is_global(event->evt_id)) {
> +               sse_event_disable_local(event);
> +       } else {
> +               cpu_evt.event = event;
> +               cpu_evt.func = SBI_SSE_EVENT_DISABLE;
> +               on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> +       }
> +}
> +
> +void sse_event_disable(struct sse_event *event)
> +{
> +       scoped_guard(mutex, &sse_mutex) {
> +               scoped_guard(cpus_read_lock) {
> +                       sse_event_disable_nolock(event);
> +               }
> +       }
> +}
> +EXPORT_SYMBOL_GPL(sse_event_disable);
> +
> +struct sse_event *sse_event_register(u32 evt, u32 priority,
> +                                    sse_event_handler_fn *handler, void *arg)
> +{
> +       struct sse_event *event;
> +       int cpu;
> +       int ret = 0;
> +
> +       if (!sse_available)
> +               return ERR_PTR(-EOPNOTSUPP);
> +
> +       guard(mutex)(&sse_mutex);
> +       if (sse_event_get(evt))
> +               return ERR_PTR(-EEXIST);
> +
> +       event = sse_event_alloc(evt, priority, handler, arg);
> +       if (IS_ERR(event))
> +               return event;
> +
> +       scoped_guard(cpus_read_lock) {
> +               if (sse_event_is_global(evt)) {
> +                       unsigned long preferred_hart;
> +
> +                       ret = sse_event_attr_get_no_lock(event->global,
> +                                                        SBI_SSE_ATTR_PREFERRED_HART,
> +                                                        &preferred_hart);
> +                       if (ret)
> +                               goto err_event_free;
> +
> +                       cpu = riscv_hartid_to_cpuid(preferred_hart);
> +                       sse_global_event_update_cpu(event, cpu);
> +
> +                       ret = sse_sbi_register_event(event, event->global);
> +                       if (ret)
> +                               goto err_event_free;
> +
> +               } else {
> +                       ret = sse_on_each_cpu(event, SBI_SSE_EVENT_REGISTER,
> +                                             SBI_SSE_EVENT_DISABLE);
> +                       if (ret)
> +                               goto err_event_free;
> +               }
> +       }
> +
> +       scoped_guard(spinlock, &events_list_lock)
> +               list_add(&event->list, &events);
> +
> +       return event;
> +
> +err_event_free:
> +       sse_event_free(event);
> +
> +       return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(sse_event_register);
> +
> +static void sse_event_unregister_nolock(struct sse_event *event)
> +{
> +       struct sse_per_cpu_evt cpu_evt;
> +
> +       if (sse_event_is_global(event->evt_id)) {
> +               sse_sbi_unregister_event(event);
> +       } else {
> +               cpu_evt.event = event;
> +               cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
> +               on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> +       }
> +}
> +
> +void sse_event_unregister(struct sse_event *event)
> +{
> +       scoped_guard(mutex, &sse_mutex) {
> +               scoped_guard(cpus_read_lock)
> +                       sse_event_unregister_nolock(event);
> +
> +               scoped_guard(spinlock, &events_list_lock)
> +                       list_del(&event->list);
> +
> +               sse_event_free(event);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(sse_event_unregister);
> +
> +static int sse_cpu_online(unsigned int cpu)
> +{
> +       struct sse_event *event;
> +
> +       scoped_guard(spinlock, &events_list_lock) {
> +               list_for_each_entry(event, &events, list) {
> +                       if (sse_event_is_global(event->evt_id))
> +                               continue;
> +
> +                       sse_event_register_local(event);
> +                       if (sse_get_reg_evt(event))
> +                               sse_event_enable_local(event);
> +               }
> +       }
> +
> +       /* Ready to handle events. Unmask SSE. */
> +       return sse_events_unmask();
> +}
> +
> +static int sse_cpu_teardown(unsigned int cpu)
> +{
> +       int ret = 0;
> +       unsigned int next_cpu;
> +       struct sse_event *event;
> +       struct sse_registered_event *reg_evt;
> +
> +       /* Mask the sse events */
> +       ret = sse_events_mask();
> +       if (ret)
> +               return ret;
> +
> +       scoped_guard(spinlock, &events_list_lock) {
> +               list_for_each_entry(event, &events, list) {
> +                       /* Disable local event on current cpu */
> +                       if (!sse_event_is_global(event->evt_id)) {
> +                               reg_evt = sse_get_reg_evt(event);
> +                               if (reg_evt->is_enabled)
> +                                       sse_event_disable_local(event);
> +
> +                               sse_sbi_unregister_event(event);
> +                               continue;
> +                       }
> +
> +                       if (event->cpu != smp_processor_id())
> +                               continue;
> +
> +                       /* Update destination hart for global event */
> +                       next_cpu = cpumask_any_but(cpu_online_mask, cpu);
> +                       ret = sse_event_set_target_cpu_nolock(event, next_cpu);
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +static int sse_pm_notifier(struct notifier_block *nb, unsigned long action,
> +                          void *data)
> +{
> +       WARN_ON_ONCE(preemptible());
> +
> +       switch (action) {
> +       case CPU_PM_ENTER:
> +               sse_events_mask();
> +               break;
> +       case CPU_PM_EXIT:
> +       case CPU_PM_ENTER_FAILED:
> +               sse_events_unmask();
> +               break;
> +       default:
> +               return NOTIFY_DONE;
> +       }
> +
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block sse_pm_nb = {
> +       .notifier_call = sse_pm_notifier,
> +};
> +
> +/*
> + * Mask all CPUs and unregister all events on panic, reboot or kexec.
> + */
> +static int sse_reboot_notifier(struct notifier_block *nb, unsigned long action,
> +                              void *data)
> +{
> +       cpuhp_remove_state(sse_hp_state);
> +
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block sse_reboot_nb = {
> +       .notifier_call = sse_reboot_notifier,
> +};
> +
> +static int __init sse_init(void)
> +{
> +       int ret;
> +
> +       if (sbi_probe_extension(SBI_EXT_SSE) <= 0) {
> +               pr_info("Missing SBI SSE extension\n");
> +               return -EOPNOTSUPP;
> +       }
> +       pr_info("SBI SSE extension detected\n");
> +
> +       ret = cpu_pm_register_notifier(&sse_pm_nb);
> +       if (ret) {
> +               pr_warn("Failed to register CPU PM notifier...\n");
> +               return ret;
> +       }
> +
> +       ret = register_reboot_notifier(&sse_reboot_nb);
> +       if (ret) {
> +               pr_warn("Failed to register reboot notifier...\n");
> +               goto remove_cpupm;
> +       }
> +
> +       ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/sse:online",
> +                               sse_cpu_online, sse_cpu_teardown);
> +       if (ret < 0)
> +               goto remove_reboot;
> +
> +       sse_hp_state = ret;
> +       sse_available = true;
> +
> +       return 0;
> +
> +remove_reboot:
> +       unregister_reboot_notifier(&sse_reboot_nb);
> +
> +remove_cpupm:
> +       cpu_pm_unregister_notifier(&sse_pm_nb);
> +
> +       return ret;
> +}
> +arch_initcall(sse_init);
> diff --git a/include/linux/riscv_sbi_sse.h b/include/linux/riscv_sbi_sse.h
> new file mode 100644
> index 000000000000..84165cb5f2d0
> --- /dev/null
> +++ b/include/linux/riscv_sbi_sse.h
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2025 Rivos Inc.
> + */
> +
> +#ifndef __LINUX_RISCV_SBI_SSE_H
> +#define __LINUX_RISCV_SBI_SSE_H
> +
> +#include <linux/types.h>
> +#include <linux/linkage.h>
> +
> +struct sse_event;
> +struct pt_regs;
> +
> +typedef int (sse_event_handler_fn)(u32 event_num, void *arg,
> +                                  struct pt_regs *regs);
> +
> +#ifdef CONFIG_RISCV_SBI_SSE
> +
> +struct sse_event *sse_event_register(u32 event_num, u32 priority,
> +                                    sse_event_handler_fn *handler, void *arg);
> +
> +void sse_event_unregister(struct sse_event *evt);
> +
> +int sse_event_set_target_cpu(struct sse_event *sse_evt, unsigned int cpu);
> +
> +int sse_event_enable(struct sse_event *sse_evt);
> +
> +void sse_event_disable(struct sse_event *sse_evt);
> +
> +int sse_event_enable_local(struct sse_event *sse_evt);
> +int sse_event_disable_local(struct sse_event *sse_evt);
> +
> +#else
> +static inline struct sse_event *sse_event_register(u32 event_num, u32 priority,
> +                                                  sse_event_handler_fn *handler,
> +                                                  void *arg)
> +{
> +       return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +static inline void sse_event_unregister(struct sse_event *evt) {}
> +
> +static inline int sse_event_set_target_cpu(struct sse_event *sse_evt,
> +                                          unsigned int cpu)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static inline int sse_event_enable(struct sse_event *sse_evt)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static inline void sse_event_disable(struct sse_event *sse_evt) {}
> +#endif
> +#endif /* __LINUX_RISCV_SBI_SSE_H */
> --
> 2.50.1 (Apple Git-155)
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v9 4/5] perf: RISC-V: add support for SSE event
  2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
                   ` (2 preceding siblings ...)
  2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
@ 2026-05-09 13:10 ` Zhanpeng Zhang
  2026-05-09 15:16   ` sashiko-bot
  2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
  4 siblings, 1 reply; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:10 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Zhanpeng Zhang

From: Clément Léger <cleger@rivosinc.com>

In order to use SSE within PMU drivers, register an SSE handler for the
local PMU event. Reuse the existing overflow IRQ handler and pass
appropriate pt_regs. Add a config option RISCV_PMU_SBI_SSE to select event
delivery via SSE events.

When the SSE path is used, also honor the return value from
perf_event_overflow(). If perf core throttles or disables an event, do not
immediately restart the overflowed counters from the SSE handler.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
---
 drivers/perf/Kconfig           | 10 +++++
 drivers/perf/riscv_pmu.c       | 23 +++++++++++
 drivers/perf/riscv_pmu_sbi.c   | 78 ++++++++++++++++++++++++++++++++-----
 include/linux/perf/riscv_pmu.h |  5 +++
 4 files changed, 104 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index ab90932fc2d0..b6c58475091c 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -105,6 +105,16 @@ config RISCV_PMU_SBI
 	  full perf feature support i.e. counter overflow, privilege mode
 	  filtering, counter configuration.
 
+config RISCV_PMU_SBI_SSE
+	depends on RISCV_PMU && RISCV_SBI_SSE
+	bool "RISC-V PMU SSE events"
+	default n
+	help
+	  Say y if you want to use SSE events to deliver PMU interrupts. This
+	  provides a way to profile the kernel at any level by using NMI-like
+	  SSE events. Since SSE events can be intrusive, this option allows
+	  selecting them only when needed.
+
 config STARFIVE_STARLINK_PMU
 	depends on ARCH_STARFIVE || COMPILE_TEST
 	depends on 64BIT
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index 8e3cd0f35336..a48e286d9394 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -13,6 +13,7 @@
 #include <linux/irqdesc.h>
 #include <linux/perf/riscv_pmu.h>
 #include <linux/printk.h>
+#include <linux/riscv_sbi_sse.h>
 #include <linux/smp.h>
 #include <linux/sched_clock.h>
 
@@ -254,6 +255,24 @@ void riscv_pmu_start(struct perf_event *event, int flags)
 	perf_event_update_userpage(event);
 }
 
+#ifdef CONFIG_RISCV_PMU_SBI_SSE
+static void riscv_pmu_disable(struct pmu *pmu)
+{
+	struct riscv_pmu *rvpmu = to_riscv_pmu(pmu);
+
+	if (rvpmu->sse_evt)
+		sse_event_disable_local(rvpmu->sse_evt);
+}
+
+static void riscv_pmu_enable(struct pmu *pmu)
+{
+	struct riscv_pmu *rvpmu = to_riscv_pmu(pmu);
+
+	if (rvpmu->sse_evt)
+		sse_event_enable_local(rvpmu->sse_evt);
+}
+#endif
+
 static int riscv_pmu_add(struct perf_event *event, int flags)
 {
 	struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
@@ -411,6 +430,10 @@ struct riscv_pmu *riscv_pmu_alloc(void)
 		.event_mapped	= riscv_pmu_event_mapped,
 		.event_unmapped	= riscv_pmu_event_unmapped,
 		.event_idx	= riscv_pmu_event_idx,
+#ifdef CONFIG_RISCV_PMU_SBI_SSE
+		.pmu_enable	= riscv_pmu_enable,
+		.pmu_disable	= riscv_pmu_disable,
+#endif
 		.add		= riscv_pmu_add,
 		.del		= riscv_pmu_del,
 		.start		= riscv_pmu_start,
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 385af5e6e6d0..ac10ebd73c7c 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -17,6 +17,7 @@
 #include <linux/irqdomain.h>
 #include <linux/of_irq.h>
 #include <linux/of.h>
+#include <linux/riscv_sbi_sse.h>
 #include <linux/cpu_pm.h>
 #include <linux/sched/clock.h>
 #include <linux/soc/andes/irq.h>
@@ -1038,10 +1039,10 @@ static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
 		pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
 }
 
-static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
+static irqreturn_t pmu_sbi_ovf_handler(struct cpu_hw_events *cpu_hw_evt,
+				       struct pt_regs *regs, bool from_sse)
 {
 	struct perf_sample_data data;
-	struct pt_regs *regs;
 	struct hw_perf_event *hw_evt;
 	union sbi_pmu_ctr_info *info;
 	int lidx, hidx, fidx;
@@ -1049,7 +1050,7 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 	struct perf_event *event;
+	int ev_overflow = 0;
 	u64 overflow;
 	u64 overflowed_ctrs = 0;
-	struct cpu_hw_events *cpu_hw_evt = dev;
 	u64 start_clock = sched_clock();
 	struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
 
@@ -1059,13 +1059,15 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 	/* Firmware counter don't support overflow yet */
 	fidx = find_first_bit(cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS);
 	if (fidx == RISCV_MAX_COUNTERS) {
-		csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num));
+		if (!from_sse)
+			csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num));
 		return IRQ_NONE;
 	}
 
 	event = cpu_hw_evt->events[fidx];
 	if (!event) {
-		ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
+		if (!from_sse)
+			ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
 		return IRQ_NONE;
 	}
 
@@ -1080,16 +1082,16 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 
 	/*
 	 * Overflow interrupt pending bit should only be cleared after stopping
-	 * all the counters to avoid any race condition.
+	 * all the counters to avoid any race condition. When using SSE,
+	 * interrupt is cleared when stopping counters.
 	 */
-	ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
+	if (!from_sse)
+		ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
 
 	/* No overflow bit is set */
 	if (!overflow)
 		return IRQ_NONE;
 
-	regs = get_irq_regs();
-
 	for_each_set_bit(lidx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
 		struct perf_event *event = cpu_hw_evt->events[lidx];
 
@@ -1133,18 +1136,65 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
 			 * TODO: We will need to stop the guest counters once
 			 * virtualization support is added.
 			 */
-			perf_event_overflow(event, &data, regs);
+			ev_overflow |= perf_event_overflow(event, &data, regs);
 		}
 		/* Reset the state as we are going to start the counter after the loop */
 		hw_evt->state = 0;
 	}
 
-	pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
+	if (!ev_overflow || !from_sse)
+		pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
+
 	perf_sample_event_took(sched_clock() - start_clock);
 
 	return IRQ_HANDLED;
 }
 
+static irqreturn_t pmu_sbi_ovf_irq_handler(int irq, void *dev)
+{
+	return pmu_sbi_ovf_handler(dev, get_irq_regs(), false);
+}
+
+#ifdef CONFIG_RISCV_PMU_SBI_SSE
+static int pmu_sbi_ovf_sse_handler(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct cpu_hw_events __percpu *hw_events = arg;
+	struct cpu_hw_events *hw_event = raw_cpu_ptr(hw_events);
+
+	pmu_sbi_ovf_handler(hw_event, regs, true);
+
+	return 0;
+}
+
+static int pmu_sbi_setup_sse(struct riscv_pmu *pmu)
+{
+	int ret;
+	struct sse_event *evt;
+	struct cpu_hw_events __percpu *hw_events = pmu->hw_events;
+
+	evt = sse_event_register(SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW, 0,
+				 pmu_sbi_ovf_sse_handler, hw_events);
+	if (IS_ERR(evt))
+		return PTR_ERR(evt);
+
+	ret = sse_event_enable(evt);
+	if (ret) {
+		sse_event_unregister(evt);
+		return ret;
+	}
+
+	pr_info("using SSE for PMU event delivery\n");
+	pmu->sse_evt = evt;
+
+	return ret;
+}
+#else
+static int pmu_sbi_setup_sse(struct riscv_pmu *pmu)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
 {
 	struct riscv_pmu *pmu = hlist_entry_safe(node, struct riscv_pmu, node);
@@ -1195,6 +1242,10 @@ static int pmu_sbi_setup_irqs(struct riscv_pmu *pmu, struct platform_device *pde
 	struct cpu_hw_events __percpu *hw_events = pmu->hw_events;
 	struct irq_domain *domain = NULL;
 
+	ret = pmu_sbi_setup_sse(pmu);
+	if (!ret)
+		return 0;
+
 	if (riscv_isa_extension_available(NULL, SSCOFPMF)) {
 		riscv_pmu_irq_num = RV_IRQ_PMU;
 		riscv_pmu_use_irq = true;
@@ -1229,7 +1280,7 @@ static int pmu_sbi_setup_irqs(struct riscv_pmu *pmu, struct platform_device *pde
 		return -ENODEV;
 	}
 
-	ret = request_percpu_irq(riscv_pmu_irq, pmu_sbi_ovf_handler, "riscv-pmu", hw_events);
+	ret = request_percpu_irq(riscv_pmu_irq, pmu_sbi_ovf_irq_handler, "riscv-pmu", hw_events);
 	if (ret) {
 		pr_err("registering percpu irq failed [%d]\n", ret);
 		return ret;
diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h
index f82a28040594..08fdcf6baf4e 100644
--- a/include/linux/perf/riscv_pmu.h
+++ b/include/linux/perf/riscv_pmu.h
@@ -28,6 +28,8 @@
 
 #define RISCV_PMU_CONFIG1_GUEST_EVENTS 0x1
 
+struct sse_event;
+
 struct cpu_hw_events {
 	/* currently enabled events */
 	int			n_events;
@@ -54,6 +56,9 @@ struct riscv_pmu {
 	char		*name;
 
 	irqreturn_t	(*handle_irq)(int irq_num, void *dev);
+#ifdef CONFIG_RISCV_PMU_SBI_SSE
+	struct sse_event *sse_evt;
+#endif
 
 	unsigned long	cmask;
 	u64		(*ctr_read)(struct perf_event *event);
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 4/5] perf: RISC-V: add support for SSE event
  2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
@ 2026-05-09 15:16   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-09 15:16 UTC (permalink / raw)
  To: Zhanpeng Zhang; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Skipping `pmu_sbi_start_overflow_mask()` when an event is throttled leaves all active PMU counters permanently stopped in hardware.
- [High] Missing unregistration of the SSE event on PMU teardown or probe failure causes a memory leak and use-after-free.
- [Medium] Executing synchronous SBI ecalls in `pmu_enable` and `pmu_disable` introduces severe context switch latency.
- [Medium] The `RISCV_PMU_SBI_SSE` config lacks a required dependency on `RISCV_PMU_SBI`, violating Kconfig consistency.
--

commit 81e6d410a510416892cbd91339e3c8f19af5b1b2
Author: Clément Léger <cleger@rivosinc.com>

perf: RISC-V: add support for SSE event

This commit adds support for delivering local PMU interrupts via SSE
events. It registers an SSE handler for the local PMU event, honors
throttle returns from perf core, and adds a new RISCV_PMU_SBI_SSE
Kconfig option.

> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index ab90932fc2d01..a92846bcffaca 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -105,6 +105,16 @@ config RISCV_PMU_SBI
>  	  full perf feature support i.e. counter overflow, privilege mode
>  	  filtering, counter configuration.
>  
> +config RISCV_PMU_SBI_SSE
> +	depends on RISCV_PMU && RISCV_SBI_SSE

Should this depend on RISCV_PMU_SBI instead of RISCV_PMU? 

The implementation for pmu_sbi_setup_sse() is located in riscv_pmu_sbi.c,
which is only compiled if RISCV_PMU_SBI is enabled. If a user enables
RISCV_PMU and RISCV_SBI_SSE but leaves RISCV_PMU_SBI disabled, it seems
like this feature would be selectable but its functions would do nothing.

> +	bool "RISC-V PMU SSE events"

[ ... ]

> diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
> index 8e3cd0f35336f..a48e286d9394f 100644
> --- a/drivers/perf/riscv_pmu.c
> +++ b/drivers/perf/riscv_pmu.c
> @@ -254,6 +255,24 @@ void riscv_pmu_start(struct perf_event *event, int flags)
>  	perf_event_update_userpage(event);
>  }
>  
> +#ifdef CONFIG_RISCV_PMU_SBI_SSE
> +static void riscv_pmu_disable(struct pmu *pmu)
> +{
> +	struct riscv_pmu *rvpmu = to_riscv_pmu(pmu);
> +
> +	if (rvpmu->sse_evt)
> +		sse_event_disable_local(rvpmu->sse_evt);
> +}

Does mapping pmu_disable and pmu_enable to sse_event_disable_local and
sse_event_enable_local introduce significant context switch latency?

Since the perf core frequently invokes these callbacks during task
context switches, performing a synchronous SBI ecall to trap into the
machine-mode firmware on every switch might introduce a massive
performance regression.

> +
> +static void riscv_pmu_enable(struct pmu *pmu)
> +{
> +	struct riscv_pmu *rvpmu = to_riscv_pmu(pmu);
> +
> +	if (rvpmu->sse_evt)
> +		sse_event_enable_local(rvpmu->sse_evt);
> +}
> +#endif

[ ... ]

> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 385af5e6e6d0d..e6f5a619a2c08 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -1133,18 +1136,65 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
>  			 * TODO: We will need to stop the guest counters once
>  			 * virtualization support is added.
>  			 */
> -			perf_event_overflow(event, &data, regs);
> +			ev_overflow |= perf_event_overflow(event, &data, regs);
>  		}
>  		/* Reset the state as we are going to start the counter after the loop */
>  		hw_evt->state = 0;
>  	}
>  
> -	pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);
> +	if (!ev_overflow || !from_sse)
> +		pmu_sbi_start_overflow_mask(pmu, overflowed_ctrs);

When perf_event_overflow() throttles an event and returns a non-zero value,
ev_overflow is set to 1. For SSE events, won't this cause the condition to
evaluate to false and completely skip calling pmu_sbi_start_overflow_mask()?

Since pmu_sbi_stop_hw_ctrs() is called at the beginning of the handler,
it seems like skipping this would leave all active hardware counters
permanently stopped in hardware, with the perf core unaware because the
event state was just reset inside the loop.

> +
>  	perf_sample_event_took(sched_clock() - start_clock);
>  
>  	return IRQ_HANDLED;
>  }

[ ... ]

> +#ifdef CONFIG_RISCV_PMU_SBI_SSE
> +static int pmu_sbi_ovf_sse_handler(u32 evt, void *arg, struct pt_regs *regs)
> +{
> +	struct cpu_hw_events __percpu *hw_events = arg;
> +	struct cpu_hw_events *hw_event = raw_cpu_ptr(hw_events);
> +
> +	pmu_sbi_ovf_handler(hw_event, regs, true);
> +
> +	return 0;
> +}
> +
> +static int pmu_sbi_setup_sse(struct riscv_pmu *pmu)
> +{
> +	int ret;
> +	struct sse_event *evt;
> +	struct cpu_hw_events __percpu *hw_events = pmu->hw_events;
> +
> +	evt = sse_event_register(SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW, 0,
> +				 pmu_sbi_ovf_sse_handler, hw_events);

Is there a corresponding sse_event_unregister() call missing in the
PMU teardown or error cleanup paths?

If the PMU device probe fails after registration, or if the driver is
subsequently unbound, won't the registered SSE event remain active in the
firmware? A subsequent hardware overflow might trigger the handler and
dereference the freed hw_events pointer, causing a use-after-free.

> +	if (IS_ERR(evt))
> +		return PTR_ERR(evt);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v9 5/5] selftests/riscv: add SSE test module
  2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
                   ` (3 preceding siblings ...)
  2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
@ 2026-05-09 13:10 ` Zhanpeng Zhang
  2026-05-09 15:57   ` sashiko-bot
  4 siblings, 1 reply; 11+ messages in thread
From: Zhanpeng Zhang @ 2026-05-09 13:10 UTC (permalink / raw)
  To: linux-riscv
  Cc: linux-kernel, linux-perf-users, linux-kselftest, kvm, kvm-riscv,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Conor Dooley, Clément Léger, Himanshu Chauhan,
	Will Deacon, Mark Rutland, Atish Patra, Anup Patel, Shuah Khan,
	Yunhui Cui, Zhu Yuan, Zhanpeng Zhang

From: Clément Léger <cleger@rivosinc.com>

This module, once loaded, will execute a series of tests using the SSE
framework. The provided script will check for any error reported by the
test module.

Add stress={0,1,2} modes so the same module can cover repeated handler
entry, SBI calls from a handler, PMU activity and re-injection.  Also make
the tests tolerate events that are already registered, and keep the
attribute read buffer lifetime local to the SBI call.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Co-developed-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
---
 MAINTAINERS                                   |   1 +
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/sse/Makefile    |   5 +
 .../selftests/riscv/sse/module/Makefile       |  16 +
 .../riscv/sse/module/riscv_sse_test.c         | 769 ++++++++++++++++++
 .../selftests/riscv/sse/run_sse_test.sh       |  48 ++
 6 files changed, 840 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/sse/Makefile
 create mode 100644 tools/testing/selftests/riscv/sse/module/Makefile
 create mode 100644 tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
 create mode 100644 tools/testing/selftests/riscv/sse/run_sse_test.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index c247acc56d1c..e4bb9b880c26 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22902,6 +22902,7 @@ C:	irc://irc.libera.chat/riscv
 P:	Documentation/arch/riscv/patch-acceptance.rst
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
 F:	arch/riscv/
+F:	tools/testing/selftests/riscv/
 N:	riscv
 K:	riscv
 
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 5671b4405a12..43c7c8f97676 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= abi hwprobe mm sigreturn vector cfi
+RISCV_SUBTARGETS ?= abi hwprobe mm sigreturn vector cfi sse
 else
 RISCV_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/riscv/sse/Makefile b/tools/testing/selftests/riscv/sse/Makefile
new file mode 100644
index 000000000000..67eaee06f213
--- /dev/null
+++ b/tools/testing/selftests/riscv/sse/Makefile
@@ -0,0 +1,5 @@
+TEST_GEN_MODS_DIR := module
+
+TEST_FILES := run_sse_test.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/riscv/sse/module/Makefile b/tools/testing/selftests/riscv/sse/module/Makefile
new file mode 100644
index 000000000000..02018f083456
--- /dev/null
+++ b/tools/testing/selftests/riscv/sse/module/Makefile
@@ -0,0 +1,16 @@
+TESTMODS_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST)))))
+KDIR ?= /lib/modules/$(shell uname -r)/build
+
+obj-m += riscv_sse_test.o
+
+# Ensure that KDIR exists, otherwise skip the compilation
+modules:
+ifneq ("$(wildcard $(KDIR))", "")
+	$(Q)$(MAKE) -C $(KDIR) modules KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
+
+# Ensure that KDIR exists, otherwise skip the clean target
+clean:
+ifneq ("$(wildcard $(KDIR))", "")
+	$(Q)$(MAKE) -C $(KDIR) clean KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
diff --git a/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
new file mode 100644
index 000000000000..c0efdfc5467c
--- /dev/null
+++ b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
@@ -0,0 +1,769 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2025 Rivos Inc.
+ */
+
+#define pr_fmt(fmt) "riscv_sse_test: " fmt
+
+#include <linux/array_size.h>
+#include <linux/atomic.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/hrtimer.h>
+#include <linux/io.h>
+#include <linux/jiffies.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/riscv_sbi_sse.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+
+#include <asm/sbi.h>
+#include <asm/sse.h>
+
+#define RUN_LOOP_COUNT		1000
+#define SSE_FAILED_PREFIX	"FAILED: "
+#define STRESS_DURATION_MS	3000
+#define STRESS_INJECT_NS	10000
+#define STRESS_REINJECT_DEPTH	10
+#define sse_err(...)		pr_err(SSE_FAILED_PREFIX __VA_ARGS__)
+
+enum sse_stress_mode {
+	SSE_STRESS_OFF,
+	SSE_STRESS_AFTER_SMOKE,
+	SSE_STRESS_ONLY,
+};
+
+static int stress;
+module_param(stress, int, 0444);
+MODULE_PARM_DESC(stress, "Stress mode: 0=off, 1=after smoke, 2=stress only");
+
+struct sse_event_desc {
+	u32 evt_id;
+	const char *name;
+	bool can_inject;
+};
+
+static struct sse_event_desc sse_event_descs[] = {
+	{
+		.evt_id = SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS,
+		.name = "local_high_prio_ras",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP,
+		.name = "local_double_trap",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS,
+		.name = "global_high_prio_ras",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW,
+		.name = "local_pmu_overflow",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS,
+		.name = "local_low_prio_ras",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS,
+		.name = "global_low_prio_ras",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED,
+		.name = "local_software_injected",
+	},
+	{
+		.evt_id = SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED,
+		.name = "global_software_injected",
+	}
+};
+
+static struct sse_event_desc *sse_get_evt_desc(u32 evt)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sse_event_descs); i++) {
+		if (sse_event_descs[i].evt_id == evt)
+			return &sse_event_descs[i];
+	}
+
+	return NULL;
+}
+
+static const char *sse_evt_name(u32 evt)
+{
+	struct sse_event_desc *desc = sse_get_evt_desc(evt);
+
+	return desc ? desc->name : NULL;
+}
+
+static bool sse_test_can_inject_event(u32 evt)
+{
+	struct sse_event_desc *desc = sse_get_evt_desc(evt);
+
+	return desc ? desc->can_inject : false;
+}
+
+static struct sbiret sbi_sse_ecall(int fid, unsigned long arg0, unsigned long arg1)
+{
+	return sbi_ecall(SBI_EXT_SSE, fid, arg0, arg1, 0, 0, 0, 0);
+}
+
+static int sse_event_attr_get(u32 evt, unsigned long attr_id,
+			      unsigned long *val)
+{
+	struct sbiret sret;
+	unsigned long *attr_buf, phys;
+
+	attr_buf = kmalloc_obj(*attr_buf, GFP_KERNEL);
+	if (!attr_buf)
+		return -ENOMEM;
+
+	phys = virt_to_phys(attr_buf);
+
+	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, 1,
+			 phys, 0, 0);
+	if (sret.error) {
+		kfree(attr_buf);
+		return sbi_err_map_linux_errno(sret.error);
+	}
+
+	*val = *attr_buf;
+	kfree(attr_buf);
+
+	return 0;
+}
+
+static int sse_test_signal(u32 evt, unsigned int cpu)
+{
+	unsigned int hart_id = cpuid_to_hartid_map(cpu);
+	struct sbiret ret;
+
+	ret = sbi_sse_ecall(SBI_SSE_EVENT_INJECT, evt, hart_id);
+	if (ret.error) {
+		sse_err("Failed to signal event %x, error %ld\n", evt, ret.error);
+		return sbi_err_map_linux_errno(ret.error);
+	}
+
+	return 0;
+}
+
+static int sse_test_inject_event(struct sse_event *event, u32 evt, unsigned int cpu)
+{
+	int res;
+	unsigned long status;
+
+	if (sse_event_is_global(evt)) {
+		/*
+		 * Due to the fact the completion might happen faster than
+		 * the call to SBI_SSE_COMPLETE in the handler, if the event was
+		 * running on another CPU, we need to wait for the event status
+		 * to be !RUNNING.
+		 */
+		do {
+			res = sse_event_attr_get(evt, SBI_SSE_ATTR_STATUS, &status);
+			if (res) {
+				sse_err("Failed to get status for evt %x, error %d\n", evt, res);
+				return res;
+			}
+			status = status & SBI_SSE_ATTR_STATUS_STATE_MASK;
+		} while (status == SBI_SSE_STATE_RUNNING);
+
+		res = sse_event_set_target_cpu(event, cpu);
+		if (res) {
+			sse_err("Failed to set cpu for evt %x, error %d\n", evt, res);
+			return res;
+		}
+	}
+
+	return sse_test_signal(evt, cpu);
+}
+
+struct fast_test_arg {
+	u32 evt;
+	int cpu;
+	bool completion;
+};
+
+static int sse_test_handler(u32 evt, void *arg, struct pt_regs *regs)
+{
+	int ret = 0;
+	struct fast_test_arg *targ = arg;
+	u32 test_evt = READ_ONCE(targ->evt);
+	int cpu = READ_ONCE(targ->cpu);
+
+	if (evt != test_evt) {
+		sse_err("Received SSE event id %x instead of %x\n", test_evt, evt);
+		ret = -EINVAL;
+	}
+
+	if (cpu != smp_processor_id()) {
+		sse_err("Received SSE event %d on CPU %d instead of %d\n", evt, smp_processor_id(),
+			cpu);
+		ret = -EINVAL;
+	}
+
+	WRITE_ONCE(targ->completion, true);
+
+	return ret;
+}
+
+static void sse_run_fast_test(struct fast_test_arg *test_arg, struct sse_event *event, u32 evt)
+{
+	unsigned long timeout;
+	int ret, cpu;
+
+	for_each_online_cpu(cpu) {
+		WRITE_ONCE(test_arg->completion, false);
+		WRITE_ONCE(test_arg->cpu, cpu);
+		/* Test arg is used on another CPU */
+		smp_wmb();
+
+		ret = sse_test_inject_event(event, evt, cpu);
+		if (ret) {
+			sse_err("event %s injection failed, err %d\n", sse_evt_name(evt), ret);
+			return;
+		}
+
+		timeout = jiffies + HZ / 100;
+		/* We can not use <linux/completion.h> since they are not NMI safe */
+		while (!READ_ONCE(test_arg->completion) &&
+		       time_before(jiffies, timeout)) {
+			cpu_relax();
+		}
+		if (!time_before(jiffies, timeout)) {
+			sse_err("Failed to wait for event %s completion on CPU %d\n",
+				sse_evt_name(evt), cpu);
+			return;
+		}
+	}
+}
+
+static void sse_test_injection_fast(void)
+{
+	int i, ret = 0, j;
+	u32 evt;
+	struct fast_test_arg test_arg;
+	struct sse_event *event;
+
+	pr_info("Starting SSE test (fast)\n");
+
+	for (i = 0; i < ARRAY_SIZE(sse_event_descs); i++) {
+		evt = sse_event_descs[i].evt_id;
+		WRITE_ONCE(test_arg.evt, evt);
+
+		if (!sse_event_descs[i].can_inject)
+			continue;
+
+		event = sse_event_register(evt, 0, sse_test_handler,
+					   (void *)&test_arg);
+		if (IS_ERR(event)) {
+			if (PTR_ERR(event) == -EEXIST) {
+				pr_info("Event %s already registered, skipping\n",
+					sse_evt_name(evt));
+				continue;
+			}
+			sse_err("Failed to register event %s, err %ld\n", sse_evt_name(evt),
+				PTR_ERR(event));
+			continue;
+		}
+
+		ret = sse_event_enable(event);
+		if (ret) {
+			sse_err("Failed to enable event %s, err %d\n", sse_evt_name(evt), ret);
+			goto err_unregister;
+		}
+
+		pr_info("Starting testing event %s\n", sse_evt_name(evt));
+
+		for (j = 0; j < RUN_LOOP_COUNT; j++)
+			sse_run_fast_test(&test_arg, event, evt);
+
+		pr_info("Finished testing event %s\n", sse_evt_name(evt));
+
+		sse_event_disable(event);
+err_unregister:
+		sse_event_unregister(event);
+	}
+	pr_info("Finished SSE test (fast)\n");
+}
+
+struct priority_test_arg {
+	unsigned long evt;
+	struct sse_event *event;
+	bool called;
+	u32 prio;
+	struct priority_test_arg *next_evt_arg;
+	void (*check_func)(struct priority_test_arg *arg);
+};
+
+static int sse_hi_priority_test_handler(u32 evt, void *arg,
+					struct pt_regs *regs)
+{
+	struct priority_test_arg *targ = arg;
+	struct priority_test_arg *next = READ_ONCE(targ->next_evt_arg);
+
+	WRITE_ONCE(targ->called, 1);
+
+	if (next) {
+		sse_test_signal(next->evt, smp_processor_id());
+		if (!READ_ONCE(next->called)) {
+			sse_err("Higher priority event %s was not handled %s\n",
+				sse_evt_name(next->evt), sse_evt_name(evt));
+		}
+	}
+
+	return 0;
+}
+
+static int sse_low_priority_test_handler(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct priority_test_arg *targ = arg;
+	struct priority_test_arg *next = READ_ONCE(targ->next_evt_arg);
+
+	WRITE_ONCE(targ->called, 1);
+
+	if (next) {
+		sse_test_signal(next->evt, smp_processor_id());
+		if (READ_ONCE(next->called)) {
+			sse_err("Lower priority event %s was handle before %s\n",
+				sse_evt_name(next->evt), sse_evt_name(evt));
+		}
+	}
+
+	return 0;
+}
+
+static void sse_test_injection_priority_arg(struct priority_test_arg *args, unsigned int args_size,
+					    sse_event_handler_fn handler, const char *test_name)
+{
+	unsigned int i;
+	int ret;
+	struct sse_event *event;
+	struct priority_test_arg *arg, *first_arg = NULL, *prev_arg = NULL;
+
+	pr_info("Starting SSE priority test (%s)\n", test_name);
+	for (i = 0; i < args_size; i++) {
+		arg = &args[i];
+
+		if (!sse_test_can_inject_event(arg->evt))
+			continue;
+
+		WRITE_ONCE(arg->called, false);
+		WRITE_ONCE(arg->next_evt_arg, NULL);
+		WRITE_ONCE(arg->event, NULL);
+
+		event = sse_event_register(arg->evt, arg->prio, handler, (void *)arg);
+		if (IS_ERR(event)) {
+			if (PTR_ERR(event) == -EEXIST) {
+				pr_info("Event %s already registered, skipping\n",
+					sse_evt_name(arg->evt));
+				continue;
+			}
+			sse_err("Failed to register event %s, err %ld\n", sse_evt_name(arg->evt),
+				PTR_ERR(event));
+			goto release_events;
+		}
+		arg->event = event;
+
+		if (sse_event_is_global(arg->evt)) {
+			/* Target event at current CPU */
+			ret = sse_event_set_target_cpu(event, smp_processor_id());
+			if (ret) {
+				sse_err("Failed to set event %s target CPU, err %d\n",
+					sse_evt_name(arg->evt), ret);
+				goto release_events;
+			}
+		}
+
+		ret = sse_event_enable(event);
+		if (ret) {
+			sse_err("Failed to enable event %s, err %d\n", sse_evt_name(arg->evt), ret);
+			goto release_events;
+		}
+
+		if (prev_arg)
+			WRITE_ONCE(prev_arg->next_evt_arg, arg);
+
+		prev_arg = arg;
+
+		if (!first_arg)
+			first_arg = arg;
+	}
+
+	if (!first_arg) {
+		pr_info("No injectable event available for %s priority test\n",
+			test_name);
+		return;
+	}
+
+	/* Inject first event, handler should trigger the others in chain. */
+	ret = sse_test_inject_event(first_arg->event, first_arg->evt, smp_processor_id());
+	if (ret) {
+		sse_err("SSE event %s injection failed\n", sse_evt_name(first_arg->evt));
+		goto release_events;
+	}
+
+	/*
+	 * Event are injected directly on the current CPU after calling sse_test_inject_event()
+	 * so that execution is preempted right away, no need to wait for timeout.
+	 */
+	arg = first_arg;
+	while (arg) {
+		if (!READ_ONCE(arg->called)) {
+			sse_err("Event %s handler was not called\n",
+				sse_evt_name(arg->evt));
+			ret = -EINVAL;
+		}
+
+		event = arg->event;
+		arg = READ_ONCE(arg->next_evt_arg);
+	}
+
+release_events:
+
+	arg = first_arg;
+	while (arg) {
+		event = arg->event;
+		if (!event)
+			break;
+
+		sse_event_disable(event);
+		sse_event_unregister(event);
+		arg = READ_ONCE(arg->next_evt_arg);
+	}
+
+	pr_info("Finished SSE priority test (%s)\n", test_name);
+}
+
+static void sse_test_injection_priority(void)
+{
+	struct priority_test_arg default_hi_prio_args[] = {
+		{ .evt = SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED },
+		{ .evt = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP },
+		{ .evt = SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS },
+	};
+
+	struct priority_test_arg default_low_prio_args[] = {
+		{ .evt = SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW },
+		{ .evt = SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS },
+		{ .evt = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED },
+
+	};
+	struct priority_test_arg set_prio_args[] = {
+		{ .evt = SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED, .prio = 5 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED, .prio = 10 },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_LOW_PRIO_RAS, .prio = 15 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_LOW_PRIO_RAS, .prio = 20 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW, .prio = 25 },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS, .prio = 30 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_DOUBLE_TRAP, .prio = 35 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS, .prio = 40 },
+	};
+
+	struct priority_test_arg same_prio_args[] = {
+		{ .evt = SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW, .prio = 0 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_HIGH_PRIO_RAS, .prio = 10 },
+		{ .evt = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED, .prio = 10 },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_SOFTWARE_INJECTED, .prio = 10 },
+		{ .evt = SBI_SSE_EVENT_GLOBAL_HIGH_PRIO_RAS, .prio = 20 },
+	};
+
+	sse_test_injection_priority_arg(default_hi_prio_args, ARRAY_SIZE(default_hi_prio_args),
+					sse_hi_priority_test_handler, "high");
+
+	sse_test_injection_priority_arg(default_low_prio_args, ARRAY_SIZE(default_low_prio_args),
+					sse_low_priority_test_handler, "low");
+
+	sse_test_injection_priority_arg(set_prio_args, ARRAY_SIZE(set_prio_args),
+					sse_low_priority_test_handler, "set");
+
+	sse_test_injection_priority_arg(same_prio_args, ARRAY_SIZE(same_prio_args),
+					sse_low_priority_test_handler, "same_prio_args");
+}
+
+static bool sse_get_inject_status(u32 evt)
+{
+	int ret;
+	unsigned long val;
+
+	/* Check if injection is supported */
+	ret = sse_event_attr_get(evt, SBI_SSE_ATTR_STATUS, &val);
+	if (ret)
+		return false;
+
+	return !!(val & BIT(SBI_SSE_ATTR_STATUS_INJECT_OFFSET));
+}
+
+static void sse_init_events(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sse_event_descs); i++) {
+		struct sse_event_desc *desc = &sse_event_descs[i];
+
+		desc->can_inject = sse_get_inject_status(desc->evt_id);
+		if (!desc->can_inject)
+			pr_info("Can not inject event %s, tests using this event will be skipped\n",
+				desc->name);
+	}
+}
+
+struct stress_test_ctx {
+	struct sse_event *event;
+	struct hrtimer timer;
+	struct hrtimer stop_timer;
+	struct task_struct *monitor_task;
+	wait_queue_head_t wait_q;
+	atomic_t inject_count;
+	atomic_t handler_count;
+	u32 evt_id;
+	int layer;
+	bool running;
+	bool test_done;
+};
+
+static struct stress_test_ctx stress_ctx;
+static DEFINE_PER_CPU(int, stress_reinject_cpu_depth);
+
+static int stress_handler_empty(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct stress_test_ctx *ctx = arg;
+
+	atomic_inc(&ctx->handler_count);
+
+	return 0;
+}
+
+static int stress_handler_ecall(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct stress_test_ctx *ctx = arg;
+
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_NUM_COUNTERS, 0, 0, 0, 0, 0, 0);
+	atomic_inc(&ctx->handler_count);
+
+	return 0;
+}
+
+static int stress_handler_pmu(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct stress_test_ctx *ctx = arg;
+
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_STOP, 3, 1, 0, 0, 0, 0);
+	sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, 3, 1, 0, 0, 0, 0);
+	atomic_inc(&ctx->handler_count);
+
+	return 0;
+}
+
+static int stress_handler_reinject(u32 evt, void *arg, struct pt_regs *regs)
+{
+	struct stress_test_ctx *ctx = arg;
+	int *depth = this_cpu_ptr(&stress_reinject_cpu_depth);
+
+	(*depth)++;
+	if (*depth < STRESS_REINJECT_DEPTH)
+		sse_test_signal(evt, smp_processor_id());
+	else
+		*depth = 0;
+
+	atomic_inc(&ctx->handler_count);
+
+	return 0;
+}
+
+static sse_event_handler_fn *stress_handlers[] = {
+	stress_handler_empty,
+	stress_handler_ecall,
+	stress_handler_pmu,
+	stress_handler_reinject,
+};
+
+static const char * const stress_layer_names[] = {
+	"empty handler",
+	"SBI ecall in handler",
+	"PMU stop/start in handler",
+	"self re-inject",
+};
+
+static enum hrtimer_restart stress_timer_callback(struct hrtimer *timer)
+{
+	struct stress_test_ctx *ctx = container_of(timer, struct stress_test_ctx, timer);
+
+	if (!READ_ONCE(ctx->running))
+		return HRTIMER_NORESTART;
+
+	sse_test_signal(ctx->evt_id, smp_processor_id());
+	atomic_inc(&ctx->inject_count);
+	hrtimer_forward_now(timer, ns_to_ktime(STRESS_INJECT_NS));
+
+	return HRTIMER_RESTART;
+}
+
+static enum hrtimer_restart stress_stop_timer_callback(struct hrtimer *timer)
+{
+	struct stress_test_ctx *ctx;
+
+	ctx = container_of(timer, struct stress_test_ctx, stop_timer);
+	WRITE_ONCE(ctx->test_done, true);
+	wake_up(&ctx->wait_q);
+
+	return HRTIMER_NORESTART;
+}
+
+static int stress_monitor_thread(void *data)
+{
+	struct stress_test_ctx *ctx = data;
+	unsigned long last_inject = 0, last_handler = 0;
+
+	while (!kthread_should_stop() && READ_ONCE(ctx->running)) {
+		unsigned long inject = atomic_read(&ctx->inject_count);
+		unsigned long handler = atomic_read(&ctx->handler_count);
+
+		pr_info("stress layer %d: inject=%lu (+%lu), handler=%lu (+%lu)\n",
+			ctx->layer, inject, inject - last_inject,
+			handler, handler - last_handler);
+
+		last_inject = inject;
+		last_handler = handler;
+
+		schedule_timeout_interruptible(HZ);
+	}
+
+	return 0;
+}
+
+static int sse_stress_test_layer(int layer)
+{
+	struct sse_event *event;
+	int ret;
+
+	if (layer < 0 || layer >= ARRAY_SIZE(stress_handlers))
+		return -EINVAL;
+
+	pr_info("Starting SSE stress layer %d (%s)\n",
+		layer, stress_layer_names[layer]);
+
+	memset(&stress_ctx, 0, sizeof(stress_ctx));
+	stress_ctx.evt_id = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED;
+	stress_ctx.layer = layer;
+	WRITE_ONCE(stress_ctx.running, true);
+	atomic_set(&stress_ctx.inject_count, 0);
+	atomic_set(&stress_ctx.handler_count, 0);
+	init_waitqueue_head(&stress_ctx.wait_q);
+
+	event = sse_event_register(stress_ctx.evt_id, 0,
+				   stress_handlers[layer], &stress_ctx);
+	if (IS_ERR(event)) {
+		sse_err("Failed to register stress event, err %ld\n",
+			PTR_ERR(event));
+		return PTR_ERR(event);
+	}
+
+	stress_ctx.event = event;
+
+	ret = sse_event_enable(event);
+	if (ret) {
+		sse_err("Failed to enable stress event, err %d\n", ret);
+		goto err_unregister;
+	}
+
+	stress_ctx.monitor_task = kthread_run(stress_monitor_thread,
+					      &stress_ctx, "sse_stress_mon");
+	if (IS_ERR(stress_ctx.monitor_task)) {
+		ret = PTR_ERR(stress_ctx.monitor_task);
+		sse_err("Failed to create stress monitor thread, err %d\n", ret);
+		goto err_disable;
+	}
+
+	hrtimer_setup(&stress_ctx.timer, stress_timer_callback,
+		      CLOCK_MONOTONIC, HRTIMER_MODE_PINNED);
+	hrtimer_start(&stress_ctx.timer, ns_to_ktime(STRESS_INJECT_NS),
+		      HRTIMER_MODE_REL_PINNED);
+
+	hrtimer_setup(&stress_ctx.stop_timer, stress_stop_timer_callback,
+		      CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	hrtimer_start(&stress_ctx.stop_timer, ms_to_ktime(STRESS_DURATION_MS),
+		      HRTIMER_MODE_REL);
+
+	wait_event(stress_ctx.wait_q, READ_ONCE(stress_ctx.test_done));
+
+	WRITE_ONCE(stress_ctx.running, false);
+	hrtimer_cancel(&stress_ctx.timer);
+	hrtimer_cancel(&stress_ctx.stop_timer);
+	kthread_stop(stress_ctx.monitor_task);
+
+	pr_info("Finished SSE stress layer %d (%s): inject=%d, handler=%d\n",
+		layer, stress_layer_names[layer],
+		atomic_read(&stress_ctx.inject_count),
+		atomic_read(&stress_ctx.handler_count));
+
+err_disable:
+	sse_event_disable(event);
+err_unregister:
+	sse_event_unregister(event);
+
+	return ret;
+}
+
+static void sse_stress_test_all_layers(void)
+{
+	int i, ret;
+
+	pr_info("Starting SSE stress tests: duration=%d ms, interval=%d ns\n",
+		STRESS_DURATION_MS, STRESS_INJECT_NS);
+
+	for (i = 0; i < ARRAY_SIZE(stress_handlers); i++) {
+		ret = sse_stress_test_layer(i);
+		if (ret)
+			sse_err("Stress layer %d failed, err %d\n", i, ret);
+
+		msleep(100);
+	}
+
+	pr_info("Finished SSE stress tests\n");
+}
+
+static int __init sse_test_init(void)
+{
+	if (stress < SSE_STRESS_OFF || stress > SSE_STRESS_ONLY) {
+		sse_err("Invalid stress mode %d\n", stress);
+		return -EINVAL;
+	}
+
+	sse_init_events();
+
+	if (stress != SSE_STRESS_ONLY) {
+		sse_test_injection_fast();
+		sse_test_injection_priority();
+	}
+
+	if (stress != SSE_STRESS_OFF)
+		sse_stress_test_all_layers();
+
+	return 0;
+}
+
+static void __exit sse_test_exit(void)
+{
+}
+
+module_init(sse_test_init);
+module_exit(sse_test_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Clément Léger <cleger@rivosinc.com>");
+MODULE_DESCRIPTION("Test module for SSE");
diff --git a/tools/testing/selftests/riscv/sse/run_sse_test.sh b/tools/testing/selftests/riscv/sse/run_sse_test.sh
new file mode 100644
index 000000000000..c8eb1ce51682
--- /dev/null
+++ b/tools/testing/selftests/riscv/sse/run_sse_test.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2025 Rivos Inc.
+
+MODULE_NAME=riscv_sse_test
+DRIVER="./module/${MODULE_NAME}.ko"
+
+check_test_failed_prefix() {
+	if dmesg | grep -q "${MODULE_NAME}: FAILED:";then
+		echo "${MODULE_NAME} failed, please check dmesg"
+		exit 1
+	fi
+}
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+check_test_requirements()
+{
+	uid=$(id -u)
+	if [ $uid -ne 0 ]; then
+		echo "$0: Must be run as root"
+		exit $ksft_skip
+	fi
+
+	if ! which insmod > /dev/null 2>&1; then
+		echo "$0: You need insmod installed"
+		exit $ksft_skip
+	fi
+
+	if [ ! -f $DRIVER ]; then
+		echo "$0: You need to compile ${MODULE_NAME} module"
+		exit $ksft_skip
+	fi
+}
+
+check_test_requirements
+
+if ! insmod $DRIVER "$@" > /dev/null 2>&1; then
+	echo "${MODULE_NAME}: failed to load, please check dmesg"
+	exit 1
+fi
+
+rmmod $MODULE_NAME
+check_test_failed_prefix
+
+exit 0
-- 
2.50.1 (Apple Git-155)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9 5/5] selftests/riscv: add SSE test module
  2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
@ 2026-05-09 15:57   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-09 15:57 UTC (permalink / raw)
  To: Zhanpeng Zhang; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] Event resource leak in error path leads to Use-After-Free of stack memory.
- [High] Use-After-Free of `task_struct` due to unsynchronized `kthread_stop()`.
- [High] Physical address truncation on RV32 leads to hypervisor memory corruption.
- [Medium] The executable test script `run_sse_test.sh` is incorrectly assigned to `TEST_FILES` instead of `TEST_PROGS`.
- [Medium] Time-Of-Check to Time-Of-Use (TOCTOU) race in timeout evaluation causes false test failures.
- [Medium] Using `smp_processor_id()` in a preemptible context triggers kernel warnings and migration races.
- [Medium] Infinite busy-wait loop without a timeout risks test soft-lockups.
--

commit 044a5142115aaf579bd893af3ea7d69bca6b7dfa
Author: Clément Léger <cleger@rivosinc.com>

selftests/riscv: add SSE test module

This module, once loaded, will execute a series of tests using the SSE
framework. The provided script will check for any error reported by the
test module.

> diff --git a/tools/testing/selftests/riscv/sse/Makefile b/tools/testing/selftests/riscv/sse/Makefile
> new file mode 100644
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/sse/Makefile
> @@ -0,0 +1,5 @@
> +TEST_GEN_MODS_DIR := module
> +
> +TEST_FILES := run_sse_test.sh

Should run_sse_test.sh be in TEST_PROGS instead of TEST_FILES?

According to the selftests documentation, executable test scripts intended
to be run directly should be placed in TEST_PROGS, whereas TEST_FILES is
for auxiliary supporting files. Placing it in TEST_FILES might cause it to
be skipped during automated execution.

> diff --git a/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c
> new file mode 100644
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/sse/module/riscv_sse_test.c

[ ... ]

> +static int sse_event_attr_get(u32 evt, unsigned long attr_id,
> +			      unsigned long *val)
> +{
> +	struct sbiret sret;
> +	unsigned long *attr_buf, phys;
> +
> +	attr_buf = kmalloc_obj(*attr_buf, GFP_KERNEL);
> +	if (!attr_buf)
> +		return -ENOMEM;
> +
> +	phys = virt_to_phys(attr_buf);

Will this truncate the physical address on RV32 systems with high memory?

virt_to_phys() returns a phys_addr_t, which can be 64-bit on RV32 (Sv32).
Since phys is an unsigned long, the upper 32 bits might be silently
truncated.

> +
> +	sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt, attr_id, 1,
> +			 phys, 0, 0);

If the buffer resides above the 4GB boundary, could this corrupted physical
address cause the SBI firmware to write the attribute value to the wrong
memory location?

[ ... ]

> +static int sse_test_inject_event(struct sse_event *event, u32 evt, unsigned int cpu)
> +{
> +	int res;
> +	unsigned long status;
> +
> +	if (sse_event_is_global(evt)) {
> +		/*
> +		 * Due to the fact the completion might happen faster than
> +		 * the call to SBI_SSE_COMPLETE in the handler, if the event was
> +		 * running on another CPU, we need to wait for the event status
> +		 * to be !RUNNING.
> +		 */
> +		do {
> +			res = sse_event_attr_get(evt, SBI_SSE_ATTR_STATUS, &status);
> +			if (res) {
> +				sse_err("Failed to get status for evt %x, error %d\n", evt, res);
> +				return res;
> +			}
> +			status = status & SBI_SSE_ATTR_STATUS_STATE_MASK;
> +		} while (status == SBI_SSE_STATE_RUNNING);

Could this loop spin indefinitely if the SBI firmware encounters a bug or
takes an excessive amount of time?

Adding a timeout mechanism might be safer, especially since
sse_event_attr_get() allocates memory with GFP_KERNEL on each iteration.

[ ... ]

> +static void sse_run_fast_test(struct fast_test_arg *test_arg, struct sse_event *event, u32 evt)
> +{
> +	unsigned long timeout;
> +	int ret, cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		WRITE_ONCE(test_arg->completion, false);
> +		WRITE_ONCE(test_arg->cpu, cpu);
> +		/* Test arg is used on another CPU */
> +		smp_wmb();
> +
> +		ret = sse_test_inject_event(event, evt, cpu);
> +		if (ret) {
> +			sse_err("event %s injection failed, err %d\n", sse_evt_name(evt), ret);
> +			return;
> +		}
> +
> +		timeout = jiffies + HZ / 100;
> +		/* We can not use <linux/completion.h> since they are not NMI safe */
> +		while (!READ_ONCE(test_arg->completion) &&
> +		       time_before(jiffies, timeout)) {
> +			cpu_relax();
> +		}
> +		if (!time_before(jiffies, timeout)) {

Can this lead to a false test failure?

If the SSE event successfully completes at the exact same jiffy that the
timeout expires, the loop will exit safely because test_arg->completion is
true. However, the time_before() check right after will evaluate to false,
causing the code to incorrectly report a timeout. Checking
if (!READ_ONCE(test_arg->completion)) directly might be more robust.

> +			sse_err("Failed to wait for event %s completion on CPU %d\n",
> +				sse_evt_name(evt), cpu);
> +			return;
> +		}
> +	}
> +}

[ ... ]

> +static void sse_test_injection_priority_arg(struct priority_test_arg *args, unsigned int args_size,
> +					    sse_event_handler_fn handler, const char *test_name)
> +{
> +	unsigned int i;
> +	int ret;
> +	struct sse_event *event;
> +	struct priority_test_arg *arg, *first_arg = NULL, *prev_arg = NULL;
> +
> +	pr_info("Starting SSE priority test (%s)\n", test_name);
> +	for (i = 0; i < args_size; i++) {
> +		arg = &args[i];
> +
> +		if (!sse_test_can_inject_event(arg->evt))
> +			continue;
> +
> +		WRITE_ONCE(arg->called, false);
> +		WRITE_ONCE(arg->next_evt_arg, NULL);
> +		WRITE_ONCE(arg->event, NULL);
> +
> +		event = sse_event_register(arg->evt, arg->prio, handler, (void *)arg);
> +		if (IS_ERR(event)) {
> +			if (PTR_ERR(event) == -EEXIST) {
> +				pr_info("Event %s already registered, skipping\n",
> +					sse_evt_name(arg->evt));
> +				continue;
> +			}
> +			sse_err("Failed to register event %s, err %ld\n", sse_evt_name(arg->evt),
> +				PTR_ERR(event));
> +			goto release_events;
> +		}
> +		arg->event = event;
> +
> +		if (sse_event_is_global(arg->evt)) {
> +			/* Target event at current CPU */
> +			ret = sse_event_set_target_cpu(event, smp_processor_id());

Is it safe to call smp_processor_id() here?

This function executes during module initialization, which is a preemptible
context. Calling smp_processor_id() without disabling preemption will
trigger a warning. Additionally, if the task migrates between CPUs during
this loop, global events might be bound to different CPUs, which might
violate the assumption that all events in the chain run on the exact same
CPU.

> +			if (ret) {
> +				sse_err("Failed to set event %s target CPU, err %d\n",
> +					sse_evt_name(arg->evt), ret);
> +				goto release_events;
> +			}
> +		}
> +
> +		ret = sse_event_enable(event);
> +		if (ret) {
> +			sse_err("Failed to enable event %s, err %d\n", sse_evt_name(arg->evt), ret);
> +			goto release_events;
> +		}
> +
> +		if (prev_arg)
> +			WRITE_ONCE(prev_arg->next_evt_arg, arg);
> +
> +		prev_arg = arg;
> +
> +		if (!first_arg)
> +			first_arg = arg;
> +	}
> +
> +	if (!first_arg) {
> +		pr_info("No injectable event available for %s priority test\n",
> +			test_name);
> +		return;
> +	}
> +
> +	/* Inject first event, handler should trigger the others in chain. */
> +	ret = sse_test_inject_event(first_arg->event, first_arg->evt, smp_processor_id());
> +	if (ret) {
> +		sse_err("SSE event %s injection failed\n", sse_evt_name(first_arg->evt));
> +		goto release_events;
> +	}
> +
> +	/*
> +	 * Event are injected directly on the current CPU after calling sse_test_inject_event()
> +	 * so that execution is preempted right away, no need to wait for timeout.
> +	 */
> +	arg = first_arg;
> +	while (arg) {
> +		if (!READ_ONCE(arg->called)) {
> +			sse_err("Event %s handler was not called\n",
> +				sse_evt_name(arg->evt));
> +			ret = -EINVAL;
> +		}
> +
> +		event = arg->event;
> +		arg = READ_ONCE(arg->next_evt_arg);
> +	}
> +
> +release_events:
> +
> +	arg = first_arg;
> +	while (arg) {
> +		event = arg->event;
> +		if (!event)
> +			break;
> +
> +		sse_event_disable(event);
> +		sse_event_unregister(event);
> +		arg = READ_ONCE(arg->next_evt_arg);
> +	}

Could this cleanup loop leak an event in the error path?

If sse_event_set_target_cpu() or sse_event_enable() fails, the code jumps
to release_events. However, the current event hasn't been linked into the
next_evt_arg chain yet (since prev_arg is updated after). This causes the
failing event to be skipped during cleanup.

If the firmware triggers this leaked event later, could it result in a
use-after-free? The handler context (arg) points to a local array on the
stack (e.g., default_hi_prio_args), so executing the handler after the
function returns would access destroyed stack memory.

> +
> +	pr_info("Finished SSE priority test (%s)\n", test_name);
> +}

[ ... ]

> +static int sse_stress_test_layer(int layer)
> +{
> +	struct sse_event *event;
> +	int ret;
> +
> +	if (layer < 0 || layer >= ARRAY_SIZE(stress_handlers))
> +		return -EINVAL;
> +
> +	pr_info("Starting SSE stress layer %d (%s)\n",
> +		layer, stress_layer_names[layer]);
> +
> +	memset(&stress_ctx, 0, sizeof(stress_ctx));
> +	stress_ctx.evt_id = SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED;
> +	stress_ctx.layer = layer;
> +	WRITE_ONCE(stress_ctx.running, true);
> +	atomic_set(&stress_ctx.inject_count, 0);
> +	atomic_set(&stress_ctx.handler_count, 0);
> +	init_waitqueue_head(&stress_ctx.wait_q);
> +
> +	event = sse_event_register(stress_ctx.evt_id, 0,
> +				   stress_handlers[layer], &stress_ctx);
> +	if (IS_ERR(event)) {
> +		sse_err("Failed to register stress event, err %ld\n",
> +			PTR_ERR(event));
> +		return PTR_ERR(event);
> +	}
> +
> +	stress_ctx.event = event;
> +
> +	ret = sse_event_enable(event);
> +	if (ret) {
> +		sse_err("Failed to enable stress event, err %d\n", ret);
> +		goto err_unregister;
> +	}
> +
> +	stress_ctx.monitor_task = kthread_run(stress_monitor_thread,
> +					      &stress_ctx, "sse_stress_mon");
> +	if (IS_ERR(stress_ctx.monitor_task)) {
> +		ret = PTR_ERR(stress_ctx.monitor_task);
> +		sse_err("Failed to create stress monitor thread, err %d\n", ret);
> +		goto err_disable;
> +	}
> +
> +	hrtimer_setup(&stress_ctx.timer, stress_timer_callback,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_PINNED);
> +	hrtimer_start(&stress_ctx.timer, ns_to_ktime(STRESS_INJECT_NS),
> +		      HRTIMER_MODE_REL_PINNED);
> +
> +	hrtimer_setup(&stress_ctx.stop_timer, stress_stop_timer_callback,
> +		      CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	hrtimer_start(&stress_ctx.stop_timer, ms_to_ktime(STRESS_DURATION_MS),
> +		      HRTIMER_MODE_REL);
> +
> +	wait_event(stress_ctx.wait_q, READ_ONCE(stress_ctx.test_done));
> +
> +	WRITE_ONCE(stress_ctx.running, false);
> +	hrtimer_cancel(&stress_ctx.timer);
> +	hrtimer_cancel(&stress_ctx.stop_timer);
> +	kthread_stop(stress_ctx.monitor_task);

Can setting stress_ctx.running to false before calling kthread_stop() lead
to a use-after-free?

If stress_monitor_thread() wakes up and observes running == false, it will
exit its loop and terminate naturally, dropping its reference to the
task_struct. Since the caller didn't take an extra reference via
get_task_struct(), the task_struct might be freed before kthread_stop() is
called.

> +
> +	pr_info("Finished SSE stress layer %d (%s): inject=%d, handler=%d\n",
> +		layer, stress_layer_names[layer],
> +		atomic_read(&stress_ctx.inject_count),
> +		atomic_read(&stress_ctx.handler_count));
> +
> +err_disable:
> +	sse_event_disable(event);
> +err_unregister:
> +	sse_event_unregister(event);
> +
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1778331862.git.zhangzhanpeng.jasper@bytedance.com?part=5

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-13  7:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 13:09 [PATCH v9 0/5] riscv: add SBI Supervisor Software Events support Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 1/5] riscv: add SBI SSE extension definitions Zhanpeng Zhang
2026-05-09 13:09 ` [PATCH v9 2/5] riscv: add support for SBI Supervisor Software Events extension Zhanpeng Zhang
2026-05-09 13:53   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 3/5] drivers: firmware: add riscv SSE support Zhanpeng Zhang
2026-05-09 14:38   ` sashiko-bot
2026-05-13  7:49   ` Himanshu Chauhan
2026-05-09 13:10 ` [PATCH v9 4/5] perf: RISC-V: add support for SSE event Zhanpeng Zhang
2026-05-09 15:16   ` sashiko-bot
2026-05-09 13:10 ` [PATCH v9 5/5] selftests/riscv: add SSE test module Zhanpeng Zhang
2026-05-09 15:57   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox