* [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events
@ 2024-12-06 16:30 Clément Léger
2024-12-06 16:30 ` [PATCH v3 1/4] riscv: add SBI SSE extension definitions Clément Léger
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-06 16:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel
Cc: Clément Léger, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
The SBI Supervisor Software Events (SSE) extensions provides a mechanism
to inject software events from an SBI implementation to supervisor
software such that it preempts all other supervisor level traps and
interrupts. This specification is introduced by the SBI v3.0
specification[1].
Various events are defined and can be send asynchronously to supervisor
software (RAS, PMU, DEBUG, Asynchronous page fault) from SBI as well
as platform specific events. Events can be either local (per-hart) or
global. Events can be nested on top of each other based on priority and
can interrupt the kernel at any time.
First patch adds the SSE definitions. Second one adds support for SSE
at arch level (entry code and stack allocations) and third one at driver
level. Finally, the last patch add support for SSE events in the SBI PMU
driver. Additional testing for that part is highly welcomed since there
are a lot of possible path that needs to be exercised.
Amongst the specific points that needs to be handle is the interruption
at any point of the kernel execution and more specifically at the
beggining of exception handling. Due to the fact that the exception entry
implementation uses the SCRATCH CSR as both the current task struct and
as the temporary register to switch the stack and save register, it is
difficult to reliably get the current task struct if we get interrupted
at this specific moment (ie, it might contain 0, the task pointer or tp).
A fixup-like mechanism is not possible due to the nested nature of SSE
which makes it really hard to obtain the original interruption site. In
order to retrieve the task in a reliable maneer, add an additional
__sse_entry_task per_cpu array which stores the current task. Ideally,
we would need to modify the way we retrieve/store the current task in
exception handling so that it does not depend on the place where it's
interrupted.
Contrary to pseudo NMI [2], SSE does not modifies the way interrupts are
handled and does not adds any overhead to existing code. Moreover, it
provides "true" NMI-like interrupts which can interrupt the kernel at
any time (even in exception handling). This is particularly crucial for
RAS errors which needs to be handled as fast as possible to avoid any
fault propagation.
OpenSBI SSE support is already upstream.
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/riscv-sbi.pdf [1]
---
Changes in v3:
- Split arch/driver support
- Fix potential register failure reporting
- Set a few pr_err as pr_debug
- Allow CONFIG_RISCV_SSE to be disabled
- Fix build without CONFIG_RISCV_SSE
- Remove fixup-like mechanism and use a per-cpu array
- Fixed SSCRATCH being corrupted when interrupting the kernel in early
exception path.
- Split SSE assembly from entry.S
- Add Himanchu SSE mask/unmask and runtime PM support.
- Disable user memory access/floating point/vector in SSE handler
- Rebased on master
v2: https://lore.kernel.org/linux-riscv/20240112111720.2975069-1-cleger@rivosinc.com/
Changes in v2:
- Implemented specification v2
- Fix various error handling cases
- Added shadow stack support
v1: https://lore.kernel.org/linux-riscv/20231026143122.279437-1-cleger@rivosinc.com/
Clément Léger (4):
riscv: add SBI SSE extension definitions
riscv: add support for SBI Supervisor Software Events extension
drivers: firmware: add riscv SSE support
perf: RISC-V: add support for SSE event
MAINTAINERS | 14 +
arch/riscv/include/asm/asm.h | 14 +-
arch/riscv/include/asm/sbi.h | 62 +++
arch/riscv/include/asm/scs.h | 7 +
arch/riscv/include/asm/sse.h | 38 ++
arch/riscv/include/asm/switch_to.h | 14 +
arch/riscv/include/asm/thread_info.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 12 +
arch/riscv/kernel/sse.c | 134 ++++++
arch/riscv/kernel/sse_entry.S | 171 +++++++
drivers/firmware/Kconfig | 1 +
drivers/firmware/Makefile | 1 +
drivers/firmware/riscv/Kconfig | 15 +
drivers/firmware/riscv/Makefile | 3 +
drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++
drivers/perf/riscv_pmu_sbi.c | 51 +-
include/linux/riscv_sse.h | 56 +++
18 files changed, 1273 insertions(+), 13 deletions(-)
create mode 100644 arch/riscv/include/asm/sse.h
create mode 100644 arch/riscv/kernel/sse.c
create mode 100644 arch/riscv/kernel/sse_entry.S
create mode 100644 drivers/firmware/riscv/Kconfig
create mode 100644 drivers/firmware/riscv/Makefile
create mode 100644 drivers/firmware/riscv/riscv_sse.c
create mode 100644 include/linux/riscv_sse.h
--
2.45.2
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 1/4] riscv: add SBI SSE extension definitions
2024-12-06 16:30 [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events Clément Léger
@ 2024-12-06 16:30 ` Clément Léger
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
` (2 subsequent siblings)
3 siblings, 0 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-06 16:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel
Cc: Clément Léger, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
Add needed definitions for SBI Supervisor Software Events extension [1].
This extension enables the SBI to inject events into supervisor software
much like ARM SDEI.
[1] https://lists.riscv.org/g/tech-prs/message/515
Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
arch/riscv/include/asm/sbi.h | 62 ++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 6c82318065cf..032dde350d40 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -35,6 +35,7 @@ enum sbi_ext_id {
SBI_EXT_DBCN = 0x4442434E,
SBI_EXT_STA = 0x535441,
SBI_EXT_NACL = 0x4E41434C,
+ SBI_EXT_SSE = 0x535345,
/* Experimentals extensions must lie within this range */
SBI_EXT_EXPERIMENTAL_START = 0x08000000,
@@ -401,6 +402,63 @@ enum sbi_ext_nacl_feature {
#define SBI_NACL_SHMEM_SRET_X(__i) ((__riscv_xlen / 8) * (__i))
#define SBI_NACL_SHMEM_SRET_X_LAST 31
+enum sbi_ext_sse_fid {
+ SBI_SSE_EVENT_ATTR_READ = 0,
+ SBI_SSE_EVENT_ATTR_WRITE,
+ SBI_SSE_EVENT_REGISTER,
+ SBI_SSE_EVENT_UNREGISTER,
+ SBI_SSE_EVENT_ENABLE,
+ SBI_SSE_EVENT_DISABLE,
+ SBI_SSE_EVENT_COMPLETE,
+ SBI_SSE_EVENT_SIGNAL,
+ SBI_SSE_EVENT_HART_UNMASK,
+ SBI_SSE_EVENT_HART_MASK,
+};
+
+enum sbi_sse_state {
+ SBI_SSE_STATE_UNUSED = 0,
+ SBI_SSE_STATE_REGISTERED = 1,
+ SBI_SSE_STATE_ENABLED = 2,
+ SBI_SSE_STATE_RUNNING = 3,
+};
+
+/* SBI SSE Event Attributes. */
+enum sbi_sse_attr_id {
+ SBI_SSE_ATTR_STATUS = 0x00000000,
+ SBI_SSE_ATTR_PRIO = 0x00000001,
+ SBI_SSE_ATTR_CONFIG = 0x00000002,
+ SBI_SSE_ATTR_PREFERRED_HART = 0x00000003,
+ SBI_SSE_ATTR_ENTRY_PC = 0x00000004,
+ SBI_SSE_ATTR_ENTRY_ARG = 0x00000005,
+ SBI_SSE_ATTR_INTERRUPTED_SEPC = 0x00000006,
+ SBI_SSE_ATTR_INTERRUPTED_FLAGS = 0x00000007,
+ SBI_SSE_ATTR_INTERRUPTED_A6 = 0x00000008,
+ SBI_SSE_ATTR_INTERRUPTED_A7 = 0x00000009,
+
+ SBI_SSE_ATTR_MAX = 0x0000000A
+};
+
+#define SBI_SSE_ATTR_STATUS_STATE_OFFSET 0
+#define SBI_SSE_ATTR_STATUS_STATE_MASK 0x3
+#define SBI_SSE_ATTR_STATUS_PENDING_OFFSET 2
+#define SBI_SSE_ATTR_STATUS_INJECT_OFFSET 3
+
+#define SBI_SSE_ATTR_CONFIG_ONESHOT (1 << 0)
+
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_SSTATUS_SPP (1 << 0)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_SSTATUS_SPIE (1 << 1)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_HSTATUS_SPV (1 << 2)
+#define SBI_SSE_ATTR_INTERRUPTED_FLAGS_HSTATUS_SPVP (1 << 3)
+
+#define SBI_SSE_EVENT_LOCAL_RAS 0x00000000
+#define SBI_SSE_EVENT_GLOBAL_RAS 0x00008000
+#define SBI_SSE_EVENT_LOCAL_PMU 0x00010000
+#define SBI_SSE_EVENT_LOCAL_SOFTWARE 0xffff0000
+#define SBI_SSE_EVENT_GLOBAL_SOFTWARE 0xffff8000
+
+#define SBI_SSE_EVENT_PLATFORM (1 << 14)
+#define SBI_SSE_EVENT_GLOBAL (1 << 15)
+
/* SBI spec version fields */
#define SBI_SPEC_VERSION_DEFAULT 0x1
#define SBI_SPEC_VERSION_MAJOR_SHIFT 24
@@ -418,6 +476,8 @@ enum sbi_ext_nacl_feature {
#define SBI_ERR_ALREADY_STARTED -7
#define SBI_ERR_ALREADY_STOPPED -8
#define SBI_ERR_NO_SHMEM -9
+#define SBI_ERR_INVALID_STATE -10
+#define SBI_ERR_BAD_RANGE -11
extern unsigned long sbi_spec_version;
struct sbiret {
@@ -504,6 +564,8 @@ static inline int sbi_err_map_linux_errno(int err)
case SBI_ERR_DENIED:
return -EPERM;
case SBI_ERR_INVALID_PARAM:
+ case SBI_ERR_BAD_RANGE:
+ case SBI_ERR_INVALID_STATE:
return -EINVAL;
case SBI_ERR_INVALID_ADDRESS:
return -EFAULT;
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2024-12-06 16:30 [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events Clément Léger
2024-12-06 16:30 ` [PATCH v3 1/4] riscv: add SBI SSE extension definitions Clément Léger
@ 2024-12-06 16:30 ` Clément Léger
2024-12-10 4:51 ` Himanshu Chauhan
` (2 more replies)
2024-12-06 16:30 ` [PATCH v3 3/4] drivers: firmware: add riscv SSE support Clément Léger
2024-12-06 16:31 ` [PATCH v3 4/4] perf: RISC-V: add support for SSE event Clément Léger
3 siblings, 3 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-06 16:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel
Cc: Clément Léger, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
The SBI SSE extension allows the supervisor software to be notified by
the SBI of specific events that are not maskable. The context switch is
handled partially by the firmware which will save registers a6 and a7.
When entering kernel we can rely on these 2 registers to setup the stack
and save all the registers.
Since SSE events can be delivered at any time to the kernel (including
during exception handling, we need a way to locate the current_task for
context tracking. On RISC-V, it is sotred in scratch when in user space
or tp when in kernel space (in which case SSCRATCH is zero). But at a
at the beginning of exception handling, SSCRATCH is used to swap tp and
check the origin of the exception. If interrupted at that point, then,
there is no way to reliably know were is located the current
task_struct. Even checking the interruption location won't work as SSE
event can be nested on top of each other so the original interruption
site might be lost at some point. In order to retrieve it reliably,
store the current task in an additionnal __sse_entry_task per_cpu array.
This array is then used to retrieve the current task based on the
hart ID that is passed to the SSE event handler in a6.
That being said, the way the current task struct is stored should
probably be reworked to find a better reliable alternative.
Since each events (and each CPU for local events) have their own
context and can preempt each other, allocate a stack (and a shadow stack
if needed for each of them (and for each cpu for local events).
When completing the event, if we were coming from kernel with interrupts
disabled, simply return there. If coming from userspace or kernel with
interrupts enabled, simulate an interrupt exception by setting IE_SIE in
CSR_IP to allow delivery of signals to user task. For instance this can
happen, when a RAS event has been generated by a user application and a
SIGBUS has been sent to a task.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
arch/riscv/include/asm/asm.h | 14 ++-
arch/riscv/include/asm/scs.h | 7 ++
arch/riscv/include/asm/sse.h | 38 ++++++
arch/riscv/include/asm/switch_to.h | 14 +++
arch/riscv/include/asm/thread_info.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 12 ++
arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
9 files changed, 389 insertions(+), 3 deletions(-)
create mode 100644 arch/riscv/include/asm/sse.h
create mode 100644 arch/riscv/kernel/sse.c
create mode 100644 arch/riscv/kernel/sse_entry.S
diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index 776354895b81..de8427c58f02 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -89,16 +89,24 @@
#define PER_CPU_OFFSET_SHIFT 3
#endif
-.macro asm_per_cpu dst sym tmp
- REG_L \tmp, TASK_TI_CPU_NUM(tp)
- slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
+.macro asm_per_cpu_with_cpu dst sym tmp cpu
+ slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
la \dst, __per_cpu_offset
add \dst, \dst, \tmp
REG_L \tmp, 0(\dst)
la \dst, \sym
add \dst, \dst, \tmp
.endm
+
+.macro asm_per_cpu dst sym tmp
+ REG_L \tmp, TASK_TI_CPU_NUM(tp)
+ asm_per_cpu_with_cpu \dst \sym \tmp \tmp
+.endm
#else /* CONFIG_SMP */
+.macro asm_per_cpu_with_cpu dst sym tmp cpu
+ la \dst, \sym
+.endm
+
.macro asm_per_cpu dst sym tmp
la \dst, \sym
.endm
diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
index 0e45db78b24b..62344daad73d 100644
--- a/arch/riscv/include/asm/scs.h
+++ b/arch/riscv/include/asm/scs.h
@@ -18,6 +18,11 @@
load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
.endm
+/* Load the per-CPU IRQ shadow call stack to gp. */
+.macro scs_load_sse_stack reg_evt
+ REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
+.endm
+
/* Load task_scs_sp(current) to gp. */
.macro scs_load_current
REG_L gp, TASK_TI_SCS_SP(tp)
@@ -41,6 +46,8 @@
.endm
.macro scs_load_irq_stack tmp
.endm
+.macro scs_load_sse_stack reg_evt
+.endm
.macro scs_load_current
.endm
.macro scs_load_current_if_task_changed prev
diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
new file mode 100644
index 000000000000..431a19d4cd9c
--- /dev/null
+++ b/arch/riscv/include/asm/sse.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+#ifndef __ASM_SSE_H
+#define __ASM_SSE_H
+
+#ifdef CONFIG_RISCV_SSE
+
+struct sse_event_interrupted_state {
+ unsigned long a6;
+ unsigned long a7;
+};
+
+struct sse_event_arch_data {
+ void *stack;
+ void *shadow_stack;
+ unsigned long tmp;
+ struct sse_event_interrupted_state interrupted;
+ unsigned long interrupted_state_phys;
+ u32 evt_id;
+};
+
+struct sse_registered_event;
+int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id,
+ int cpu);
+void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
+int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
+
+void sse_handle_event(struct sse_event_arch_data *arch_evt,
+ struct pt_regs *regs);
+asmlinkage void handle_sse(void);
+asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
+ struct pt_regs *reg);
+
+#endif
+
+#endif
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 94e33216b2d9..e166fabe04ab 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct task_struct *next)
:: "r" (next->thread.envcfg) : "memory");
}
+#ifdef CONFIG_RISCV_SSE
+DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
+
+static inline void __switch_sse_entry_task(struct task_struct *next)
+{
+ __this_cpu_write(__sse_entry_task, next);
+}
+#else
+static inline void __switch_sse_entry_task(struct task_struct *next)
+{
+}
+#endif
+
extern struct task_struct *__switch_to(struct task_struct *,
struct task_struct *);
@@ -122,6 +135,7 @@ do { \
if (switch_to_should_flush_icache(__next)) \
local_flush_icache_all(); \
__switch_to_envcfg(__next); \
+ __switch_sse_entry_task(__next); \
((last) = __switch_to(__prev, __next)); \
} while (0)
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index f5916a70879a..28e9805e61fc 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -36,6 +36,7 @@
#define OVERFLOW_STACK_SIZE SZ_4K
#define IRQ_STACK_SIZE THREAD_SIZE
+#define SSE_STACK_SIZE THREAD_SIZE
#ifndef __ASSEMBLY__
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 063d1faf5a53..1e8fb83b1162 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
+obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
ifeq ($(CONFIG_RISCV_SBI), y)
obj-$(CONFIG_SMP) += sbi-ipi.o
obj-$(CONFIG_SMP) += cpu_ops_sbi.o
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index e89455a6a0e5..60590a3d9519 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -14,6 +14,8 @@
#include <asm/ptrace.h>
#include <asm/cpu_ops_sbi.h>
#include <asm/stacktrace.h>
+#include <asm/sbi.h>
+#include <asm/sse.h>
#include <asm/suspend.h>
void asm_offsets(void);
@@ -511,4 +513,14 @@ void asm_offsets(void)
DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
#endif
+
+#ifdef CONFIG_RISCV_SSE
+ OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
+ OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
+ OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
+
+ DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
+ DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
+ DEFINE(NR_CPUS, NR_CPUS);
+#endif
}
diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
new file mode 100644
index 000000000000..b48ae69dad8d
--- /dev/null
+++ b/arch/riscv/kernel/sse.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+#include <linux/nmi.h>
+#include <linux/scs.h>
+#include <linux/bitfield.h>
+#include <linux/riscv_sse.h>
+#include <linux/percpu-defs.h>
+
+#include <asm/asm-prototypes.h>
+#include <asm/switch_to.h>
+#include <asm/irq_stack.h>
+#include <asm/sbi.h>
+#include <asm/sse.h>
+
+DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
+
+void __weak sse_handle_event(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
+{
+}
+
+void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
+{
+ nmi_enter();
+
+ /* Retrieve missing GPRs from SBI */
+ sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
+ SBI_SSE_ATTR_INTERRUPTED_A6,
+ (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
+ arch_evt->interrupted_state_phys, 0, 0);
+
+ memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt->interrupted));
+
+ sse_handle_event(arch_evt, regs);
+
+ /*
+ * The SSE delivery path does not uses the "standard" exception path and
+ * thus does not process any pending signal/softirqs. Some drivers might
+ * enqueue pending work that needs to be handled as soon as possible.
+ * For that purpose, set the software interrupt pending bit which will
+ * be serviced once interrupts are reenabled
+ */
+ csr_set(CSR_IP, IE_SIE);
+
+ nmi_exit();
+}
+
+#ifdef CONFIG_VMAP_STACK
+static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
+{
+ return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
+}
+
+static void sse_stack_free(unsigned long *stack)
+{
+ vfree(stack);
+}
+#else /* CONFIG_VMAP_STACK */
+
+static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
+{
+ return kmalloc(size, GFP_KERNEL);
+}
+
+static void sse_stack_free(unsigned long *stack)
+{
+ kfree(stack);
+}
+
+#endif /* CONFIG_VMAP_STACK */
+
+static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
+{
+ void *stack;
+
+ if (!scs_is_enabled())
+ return 0;
+
+ stack = scs_alloc(cpu_to_node(cpu));
+ if (!stack)
+ return 1;
+
+ arch_evt->shadow_stack = stack;
+
+ return 0;
+}
+
+int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
+{
+ void *stack;
+
+ arch_evt->evt_id = evt_id;
+ stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
+ if (!stack)
+ return -ENOMEM;
+
+ arch_evt->stack = stack + SSE_STACK_SIZE;
+
+ if (sse_init_scs(cpu, arch_evt))
+ goto free_stack;
+
+ if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
+ arch_evt->interrupted_state_phys =
+ per_cpu_ptr_to_phys(&arch_evt->interrupted);
+ } else {
+ arch_evt->interrupted_state_phys =
+ virt_to_phys(&arch_evt->interrupted);
+ }
+
+ return 0;
+
+free_stack:
+ sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
+
+ return -ENOMEM;
+}
+
+void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
+{
+ scs_free(arch_evt->shadow_stack);
+ sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
+}
+
+int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
+{
+ struct sbiret sret;
+
+ sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt->evt_id,
+ (unsigned long) handle_sse, (unsigned long) arch_evt,
+ 0, 0, 0);
+
+ return sbi_err_map_linux_errno(sret.error);
+}
diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/sse_entry.S
new file mode 100644
index 000000000000..0b2f890edd89
--- /dev/null
+++ b/arch/riscv/kernel/sse_entry.S
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+
+#include <linux/init.h>
+#include <linux/linkage.h>
+
+#include <asm/asm.h>
+#include <asm/csr.h>
+#include <asm/scs.h>
+
+/* When entering handle_sse, the following registers are set:
+ * a6: contains the hartid
+ * a6: contains struct sse_registered_event pointer
+ */
+SYM_CODE_START(handle_sse)
+ /* Save stack temporarily */
+ REG_S sp, SSE_REG_EVT_TMP(a7)
+ /* Set entry stack */
+ REG_L sp, SSE_REG_EVT_STACK(a7)
+
+ addi sp, sp, -(PT_SIZE_ON_STACK)
+ REG_S ra, PT_RA(sp)
+ REG_S s0, PT_S0(sp)
+ REG_S s1, PT_S1(sp)
+ REG_S s2, PT_S2(sp)
+ REG_S s3, PT_S3(sp)
+ REG_S s4, PT_S4(sp)
+ REG_S s5, PT_S5(sp)
+ REG_S s6, PT_S6(sp)
+ REG_S s7, PT_S7(sp)
+ REG_S s8, PT_S8(sp)
+ REG_S s9, PT_S9(sp)
+ REG_S s10, PT_S10(sp)
+ REG_S s11, PT_S11(sp)
+ REG_S tp, PT_TP(sp)
+ REG_S t0, PT_T0(sp)
+ REG_S t1, PT_T1(sp)
+ REG_S t2, PT_T2(sp)
+ REG_S t3, PT_T3(sp)
+ REG_S t4, PT_T4(sp)
+ REG_S t5, PT_T5(sp)
+ REG_S t6, PT_T6(sp)
+ REG_S gp, PT_GP(sp)
+ REG_S a0, PT_A0(sp)
+ REG_S a1, PT_A1(sp)
+ REG_S a2, PT_A2(sp)
+ REG_S a3, PT_A3(sp)
+ REG_S a4, PT_A4(sp)
+ REG_S a5, PT_A5(sp)
+
+ /* Retrieve entry sp */
+ REG_L a4, SSE_REG_EVT_TMP(a7)
+ /* Save CSRs */
+ csrr a0, CSR_EPC
+ csrr a1, CSR_SSTATUS
+ csrr a2, CSR_STVAL
+ csrr a3, CSR_SCAUSE
+
+ REG_S a0, PT_EPC(sp)
+ REG_S a1, PT_STATUS(sp)
+ REG_S a2, PT_BADADDR(sp)
+ REG_S a3, PT_CAUSE(sp)
+ REG_S a4, PT_SP(sp)
+
+ /* Disable user memory access and floating/vector computing */
+ li t0, SR_SUM | SR_FS_VS
+ csrc CSR_STATUS, t0
+
+ load_global_pointer
+ scs_load_sse_stack a7
+
+ /* Restore current task struct from __sse_entry_task */
+ li t1, NR_CPUS
+ move t3, zero
+
+#ifdef CONFIG_SMP
+ /* Find the CPU id associated to the hart id */
+ la t0, __cpuid_to_hartid_map
+.Lhart_id_loop:
+ REG_L t2, 0(t0)
+ beq t2, a6, .Lcpu_id_found
+
+ /* Increment pointer and CPU number */
+ addi t3, t3, 1
+ addi t0, t0, RISCV_SZPTR
+ bltu t3, t1, .Lhart_id_loop
+
+ /*
+ * This should never happen since we expect the hart_id to match one
+ * of our CPU, but better be safe than sorry
+ */
+ la tp, init_task
+ la a0, sse_hart_id_panic_string
+ la t0, panic
+ jalr t0
+
+.Lcpu_id_found:
+#endif
+ asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
+ REG_L tp, 0(t2)
+
+ move a1, sp /* pt_regs on stack */
+ /* Kernel was interrupted, create stack frame */
+ beqz s1, .Lcall_do_sse
+
+.Lcall_do_sse:
+ /*
+ * Save sscratch for restoration since we might have interrupted the
+ * kernel in early exception path and thus, we don't know the content of
+ * sscratch.
+ */
+ csrr s4, CSR_SSCRATCH
+ /* In-kernel scratch is 0 */
+ csrw CSR_SCRATCH, x0
+
+ move a0, a7
+
+ call do_sse
+
+ csrw CSR_SSCRATCH, s4
+
+ REG_L a0, PT_EPC(sp)
+ REG_L a1, PT_STATUS(sp)
+ REG_L a2, PT_BADADDR(sp)
+ REG_L a3, PT_CAUSE(sp)
+ csrw CSR_EPC, a0
+ csrw CSR_SSTATUS, a1
+ csrw CSR_STVAL, a2
+ csrw CSR_SCAUSE, a3
+
+ REG_L ra, PT_RA(sp)
+ REG_L s0, PT_S0(sp)
+ REG_L s1, PT_S1(sp)
+ REG_L s2, PT_S2(sp)
+ REG_L s3, PT_S3(sp)
+ REG_L s4, PT_S4(sp)
+ REG_L s5, PT_S5(sp)
+ REG_L s6, PT_S6(sp)
+ REG_L s7, PT_S7(sp)
+ REG_L s8, PT_S8(sp)
+ REG_L s9, PT_S9(sp)
+ REG_L s10, PT_S10(sp)
+ REG_L s11, PT_S11(sp)
+ REG_L tp, PT_TP(sp)
+ REG_L t0, PT_T0(sp)
+ REG_L t1, PT_T1(sp)
+ REG_L t2, PT_T2(sp)
+ REG_L t3, PT_T3(sp)
+ REG_L t4, PT_T4(sp)
+ REG_L t5, PT_T5(sp)
+ REG_L t6, PT_T6(sp)
+ REG_L gp, PT_GP(sp)
+ REG_L a0, PT_A0(sp)
+ REG_L a1, PT_A1(sp)
+ REG_L a2, PT_A2(sp)
+ REG_L a3, PT_A3(sp)
+ REG_L a4, PT_A4(sp)
+ REG_L a5, PT_A5(sp)
+
+ REG_L sp, PT_SP(sp)
+
+ li a7, SBI_EXT_SSE
+ li a6, SBI_SSE_EVENT_COMPLETE
+ ecall
+
+SYM_CODE_END(handle_sse)
+
+sse_hart_id_panic_string:
+ .ascii "Unable to match hart_id with cpu\0"
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2024-12-06 16:30 [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events Clément Léger
2024-12-06 16:30 ` [PATCH v3 1/4] riscv: add SBI SSE extension definitions Clément Léger
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
@ 2024-12-06 16:30 ` Clément Léger
2024-12-13 5:03 ` Himanshu Chauhan
2025-01-16 13:58 ` Conor Dooley
2024-12-06 16:31 ` [PATCH v3 4/4] perf: RISC-V: add support for SSE event Clément Léger
3 siblings, 2 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-06 16:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel
Cc: Clément Léger, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
Add driver level interface to use RISC-V SSE arch support. This interface
allows registering SSE handlers, and receive them. This will be used by
PMU and GHES driver.
Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
MAINTAINERS | 14 +
drivers/firmware/Kconfig | 1 +
drivers/firmware/Makefile | 1 +
drivers/firmware/riscv/Kconfig | 15 +
drivers/firmware/riscv/Makefile | 3 +
drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++++
include/linux/riscv_sse.h | 56 +++
7 files changed, 781 insertions(+)
create mode 100644 drivers/firmware/riscv/Kconfig
create mode 100644 drivers/firmware/riscv/Makefile
create mode 100644 drivers/firmware/riscv/riscv_sse.c
create mode 100644 include/linux/riscv_sse.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 686109008d8e..a3ddde7fe9fb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20125,6 +20125,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
F: drivers/iommu/riscv/
+RISC-V FIRMWARE DRIVERS
+M: Conor Dooley <conor@kernel.org>
+L: linux-riscv@lists.infradead.org
+S: Maintained
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
+F: drivers/firmware/riscv/*
+
RISC-V MICROCHIP FPGA SUPPORT
M: Conor Dooley <conor.dooley@microchip.com>
M: Daire McNamara <daire.mcnamara@microchip.com>
@@ -20177,6 +20184,13 @@ F: drivers/perf/riscv_pmu.c
F: drivers/perf/riscv_pmu_legacy.c
F: drivers/perf/riscv_pmu_sbi.c
+RISC-V SSE DRIVER
+M: Clément Léger <cleger@rivosinc.com>
+L: linux-riscv@lists.infradead.org
+S: Maintained
+F: drivers/firmware/riscv/riscv_sse.c
+F: include/linux/riscv_sse.h
+
RISC-V THEAD SoC SUPPORT
M: Drew Fustini <drew@pdp7.com>
M: Guo Ren <guoren@kernel.org>
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 71d8b26c4103..9e996a1fd511 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -267,6 +267,7 @@ source "drivers/firmware/meson/Kconfig"
source "drivers/firmware/microchip/Kconfig"
source "drivers/firmware/psci/Kconfig"
source "drivers/firmware/qcom/Kconfig"
+source "drivers/firmware/riscv/Kconfig"
source "drivers/firmware/smccc/Kconfig"
source "drivers/firmware/tegra/Kconfig"
source "drivers/firmware/xilinx/Kconfig"
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 7a8d486e718f..c0f5009949a8 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -33,6 +33,7 @@ obj-y += efi/
obj-y += imx/
obj-y += psci/
obj-y += qcom/
+obj-y += riscv/
obj-y += smccc/
obj-y += tegra/
obj-y += xilinx/
diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
new file mode 100644
index 000000000000..8056ed3262d9
--- /dev/null
+++ b/drivers/firmware/riscv/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+menu "Risc-V Specific firmware drivers"
+depends on RISCV
+
+config RISCV_SSE
+ bool "Enable SBI Supervisor Software Events support"
+ depends on RISCV_SBI
+ default y
+ help
+ The Supervisor Software Events support allow the SBI to deliver
+ NMI-like notifications to the supervisor mode software. When enable,
+ this option provides support to register callbacks on specific SSE
+ events.
+
+endmenu
diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
new file mode 100644
index 000000000000..4ccfcbbc28ea
--- /dev/null
+++ b/drivers/firmware/riscv/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_RISCV_SSE) += riscv_sse.o
diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
new file mode 100644
index 000000000000..c165e32cc9a5
--- /dev/null
+++ b/drivers/firmware/riscv/riscv_sse.c
@@ -0,0 +1,691 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+
+#define pr_fmt(fmt) "sse: " fmt
+
+#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
+#include <linux/cpu_pm.h>
+#include <linux/hardirq.h>
+#include <linux/list.h>
+#include <linux/percpu-defs.h>
+#include <linux/reboot.h>
+#include <linux/riscv_sse.h>
+#include <linux/slab.h>
+
+#include <asm/sbi.h>
+#include <asm/sse.h>
+
+struct sse_event {
+ struct list_head list;
+ u32 evt;
+ u32 priority;
+ sse_event_handler *handler;
+ void *handler_arg;
+ bool is_enabled;
+ /* Only valid for global events */
+ unsigned int cpu;
+
+ union {
+ struct sse_registered_event *global;
+ struct sse_registered_event __percpu *local;
+ };
+};
+
+static int sse_hp_state;
+static bool sse_available;
+static DEFINE_SPINLOCK(events_list_lock);
+static LIST_HEAD(events);
+static DEFINE_MUTEX(sse_mutex);
+
+struct sse_registered_event {
+ struct sse_event_arch_data arch;
+ struct sse_event *evt;
+ unsigned long attr_buf;
+};
+
+void sse_handle_event(struct sse_event_arch_data *arch_event,
+ struct pt_regs *regs)
+{
+ int ret;
+ struct sse_registered_event *reg_evt =
+ container_of(arch_event, struct sse_registered_event, arch);
+ struct sse_event *evt = reg_evt->evt;
+
+ ret = evt->handler(evt->evt, evt->handler_arg, regs);
+ if (ret)
+ pr_warn("event %x handler failed with error %d\n", evt->evt,
+ ret);
+}
+
+static bool sse_event_is_global(u32 evt)
+{
+ return !!(evt & SBI_SSE_EVENT_GLOBAL);
+}
+
+static
+struct sse_event *sse_event_get(u32 evt)
+{
+ struct sse_event *sse_evt = NULL, *tmp;
+
+ scoped_guard(spinlock, &events_list_lock) {
+ list_for_each_entry(tmp, &events, list) {
+ if (tmp->evt == evt) {
+ return sse_evt;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static phys_addr_t sse_event_get_phys(struct sse_registered_event *reg_evt,
+ void *addr)
+{
+ phys_addr_t phys;
+
+ if (sse_event_is_global(reg_evt->evt->evt))
+ phys = virt_to_phys(addr);
+ else
+ phys = per_cpu_ptr_to_phys(addr);
+
+ return phys;
+}
+
+static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
+{
+ struct sbiret ret;
+ u32 evt = event->evt;
+
+ ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
+ if (ret.error)
+ pr_debug("Failed to execute func %lx, event %x, error %ld\n",
+ func, evt, ret.error);
+
+ return sbi_err_map_linux_errno(ret.error);
+}
+
+static int sse_sbi_disable_event(struct sse_event *event)
+{
+ return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
+}
+
+static int sse_sbi_enable_event(struct sse_event *event)
+{
+ return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
+}
+
+static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
+ unsigned long attr_id, unsigned long *val)
+{
+ struct sbiret sret;
+ u32 evt = reg_evt->evt->evt;
+ unsigned long phys;
+
+ phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
+
+ sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt,
+ attr_id, 1, phys, 0, 0);
+ if (sret.error) {
+ pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
+ attr_id, sret.error);
+ return sbi_err_map_linux_errno(sret.error);
+ }
+
+ *val = reg_evt->attr_buf;
+
+ return 0;
+}
+
+static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
+ unsigned long attr_id, unsigned long val)
+{
+ struct sbiret sret;
+ u32 evt = reg_evt->evt->evt;
+ unsigned long phys;
+
+ reg_evt->attr_buf = val;
+ phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
+
+ sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt,
+ attr_id, 1, phys, 0, 0);
+ if (sret.error && sret.error != SBI_ERR_INVALID_STATE) {
+ pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
+ attr_id, sret.error);
+ return sbi_err_map_linux_errno(sret.error);
+ }
+
+ return 0;
+}
+
+static int sse_event_set_target_cpu_nolock(struct sse_event *event,
+ unsigned int cpu)
+{
+ unsigned int hart_id = cpuid_to_hartid_map(cpu);
+ struct sse_registered_event *reg_evt = event->global;
+ u32 evt = event->evt;
+ bool was_enabled;
+ int ret;
+
+ if (!sse_event_is_global(evt))
+ return -EINVAL;
+
+ was_enabled = event->is_enabled;
+ if (was_enabled)
+ sse_sbi_disable_event(event);
+ do {
+ ret = sse_event_attr_set_nolock(reg_evt,
+ SBI_SSE_ATTR_PREFERRED_HART,
+ hart_id);
+ } while (ret == -EINVAL);
+
+ if (ret == 0)
+ event->cpu = cpu;
+
+ if (was_enabled)
+ sse_sbi_enable_event(event);
+
+ return 0;
+}
+
+int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
+{
+ int ret;
+
+ scoped_guard(mutex, &sse_mutex) {
+ cpus_read_lock();
+
+ if (!cpu_online(cpu))
+ return -EINVAL;
+
+ ret = sse_event_set_target_cpu_nolock(event, cpu);
+
+ cpus_read_unlock();
+ }
+
+ return ret;
+}
+
+static int sse_event_init_registered(unsigned int cpu,
+ struct sse_registered_event *reg_evt,
+ struct sse_event *event)
+{
+ reg_evt->evt = event;
+ arch_sse_init_event(®_evt->arch, event->evt, cpu);
+
+ return 0;
+}
+
+static void sse_event_free_registered(struct sse_registered_event *reg_evt)
+{
+ arch_sse_free_event(®_evt->arch);
+}
+
+static int sse_event_alloc_global(struct sse_event *event)
+{
+ int err;
+ struct sse_registered_event *reg_evt;
+
+ reg_evt = kzalloc(sizeof(*reg_evt), GFP_KERNEL);
+ if (!reg_evt)
+ return -ENOMEM;
+
+ event->global = reg_evt;
+ err = sse_event_init_registered(smp_processor_id(), reg_evt,
+ event);
+ if (err)
+ kfree(reg_evt);
+
+ return err;
+}
+
+static int sse_event_alloc_local(struct sse_event *event)
+{
+ int err;
+ unsigned int cpu, err_cpu;
+ struct sse_registered_event *reg_evt;
+ struct sse_registered_event __percpu *reg_evts;
+
+ reg_evts = alloc_percpu(struct sse_registered_event);
+ if (!reg_evts)
+ return -ENOMEM;
+
+ event->local = reg_evts;
+
+ for_each_possible_cpu(cpu) {
+ reg_evt = per_cpu_ptr(reg_evts, cpu);
+ err = sse_event_init_registered(cpu, reg_evt, event);
+ if (err) {
+ err_cpu = cpu;
+ goto err_free_per_cpu;
+ }
+ }
+
+ return 0;
+
+err_free_per_cpu:
+ for_each_possible_cpu(cpu) {
+ if (cpu == err_cpu)
+ break;
+ reg_evt = per_cpu_ptr(reg_evts, cpu);
+ sse_event_free_registered(reg_evt);
+ }
+
+ free_percpu(reg_evts);
+
+ return err;
+}
+
+static struct sse_event *sse_event_alloc(u32 evt,
+ u32 priority,
+ sse_event_handler *handler, void *arg)
+{
+ int err;
+ struct sse_event *event;
+
+ event = kzalloc(sizeof(*event), GFP_KERNEL);
+ if (!event)
+ return ERR_PTR(-ENOMEM);
+
+ event->evt = evt;
+ event->priority = priority;
+ event->handler_arg = arg;
+ event->handler = handler;
+
+ if (sse_event_is_global(evt)) {
+ err = sse_event_alloc_global(event);
+ if (err)
+ goto err_alloc_reg_evt;
+ } else {
+ err = sse_event_alloc_local(event);
+ if (err)
+ goto err_alloc_reg_evt;
+ }
+
+ return event;
+
+err_alloc_reg_evt:
+ kfree(event);
+
+ return ERR_PTR(err);
+}
+
+static int sse_sbi_register_event(struct sse_event *event,
+ struct sse_registered_event *reg_evt)
+{
+ int ret;
+
+ ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
+ event->priority);
+ if (ret)
+ return ret;
+
+ return arch_sse_register_event(®_evt->arch);
+}
+
+static int sse_event_register_local(struct sse_event *event)
+{
+ int ret;
+ struct sse_registered_event *reg_evt = per_cpu_ptr(event->local,
+ smp_processor_id());
+
+ ret = sse_sbi_register_event(event, reg_evt);
+ if (ret)
+ pr_debug("Failed to register event %x: err %d\n", event->evt,
+ ret);
+
+ return ret;
+}
+
+
+static int sse_sbi_unregister_event(struct sse_event *event)
+{
+ return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
+}
+
+struct sse_per_cpu_evt {
+ struct sse_event *event;
+ unsigned long func;
+ atomic_t error;
+};
+
+static void sse_event_per_cpu_func(void *info)
+{
+ int ret;
+ struct sse_per_cpu_evt *cpu_evt = info;
+
+ if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
+ ret = sse_event_register_local(cpu_evt->event);
+ else
+ ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
+
+ if (ret)
+ atomic_set(&cpu_evt->error, ret);
+}
+
+static void sse_event_free(struct sse_event *event)
+{
+ unsigned int cpu;
+ struct sse_registered_event *reg_evt;
+
+ if (sse_event_is_global(event->evt)) {
+ sse_event_free_registered(event->global);
+ kfree(event->global);
+ } else {
+ for_each_possible_cpu(cpu) {
+ reg_evt = per_cpu_ptr(event->local, cpu);
+ sse_event_free_registered(reg_evt);
+ }
+ free_percpu(event->local);
+ }
+
+ kfree(event);
+}
+
+int sse_event_enable(struct sse_event *event)
+{
+ int ret = 0;
+ struct sse_per_cpu_evt cpu_evt;
+
+ scoped_guard(mutex, &sse_mutex) {
+ cpus_read_lock();
+ if (sse_event_is_global(event->evt)) {
+ ret = sse_sbi_enable_event(event);
+ } else {
+ cpu_evt.event = event;
+ atomic_set(&cpu_evt.error, 0);
+ cpu_evt.func = SBI_SSE_EVENT_ENABLE;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+ ret = atomic_read(&cpu_evt.error);
+ if (ret) {
+ cpu_evt.func = SBI_SSE_EVENT_DISABLE;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
+ 1);
+ }
+ }
+ cpus_read_unlock();
+
+ if (ret == 0)
+ event->is_enabled = true;
+ }
+
+ return ret;
+}
+
+static void sse_events_mask(void)
+{
+ sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_MASK, 0, 0, 0, 0, 0, 0);
+}
+
+static void sse_events_unmask(void)
+{
+ sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_UNMASK, 0, 0, 0, 0, 0, 0);
+}
+
+static void sse_event_disable_nolock(struct sse_event *event)
+{
+ struct sse_per_cpu_evt cpu_evt;
+
+ if (sse_event_is_global(event->evt)) {
+ sse_sbi_disable_event(event);
+ } else {
+ cpu_evt.event = event;
+ cpu_evt.func = SBI_SSE_EVENT_DISABLE;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+ }
+}
+
+void sse_event_disable(struct sse_event *event)
+{
+ scoped_guard(mutex, &sse_mutex) {
+ cpus_read_lock();
+ sse_event_disable_nolock(event);
+ event->is_enabled = false;
+ cpus_read_unlock();
+ }
+}
+
+struct sse_event *sse_event_register(u32 evt, u32 priority,
+ sse_event_handler *handler, void *arg)
+{
+ struct sse_per_cpu_evt cpu_evt;
+ struct sse_event *event;
+ int ret = 0;
+
+ if (!sse_available)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ mutex_lock(&sse_mutex);
+ if (sse_event_get(evt)) {
+ pr_debug("Event %x already registered\n", evt);
+ ret = -EEXIST;
+ goto out_unlock;
+ }
+
+ event = sse_event_alloc(evt, priority, handler, arg);
+ if (IS_ERR(event)) {
+ ret = PTR_ERR(event);
+ goto out_unlock;
+ }
+
+ cpus_read_lock();
+ if (sse_event_is_global(evt)) {
+ unsigned long preferred_hart;
+
+ ret = sse_event_attr_get_no_lock(event->global,
+ SBI_SSE_ATTR_PREFERRED_HART,
+ &preferred_hart);
+ if (ret)
+ goto err_event_free;
+ event->cpu = riscv_hartid_to_cpuid(preferred_hart);
+
+ ret = sse_sbi_register_event(event, event->global);
+ if (ret)
+ goto err_event_free;
+
+ } else {
+ cpu_evt.event = event;
+ atomic_set(&cpu_evt.error, 0);
+ cpu_evt.func = SBI_SSE_EVENT_REGISTER;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+ ret = atomic_read(&cpu_evt.error);
+ if (ret) {
+ cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+ goto err_event_free;
+ }
+ }
+ cpus_read_unlock();
+
+ scoped_guard(spinlock, &events_list_lock)
+ list_add(&event->list, &events);
+
+ mutex_unlock(&sse_mutex);
+
+ return event;
+
+err_event_free:
+ cpus_read_unlock();
+ sse_event_free(event);
+out_unlock:
+ mutex_unlock(&sse_mutex);
+
+ return ERR_PTR(ret);
+}
+
+static void sse_event_unregister_nolock(struct sse_event *event)
+{
+ struct sse_per_cpu_evt cpu_evt;
+
+ if (sse_event_is_global(event->evt)) {
+ sse_sbi_unregister_event(event);
+ } else {
+ cpu_evt.event = event;
+ cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
+ on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
+ }
+}
+
+void sse_event_unregister(struct sse_event *event)
+{
+ scoped_guard(mutex, &sse_mutex) {
+ cpus_read_lock();
+ sse_event_unregister_nolock(event);
+ cpus_read_unlock();
+
+ scoped_guard(spinlock, &events_list_lock)
+ list_del(&event->list);
+
+ sse_event_free(event);
+ }
+}
+
+static int sse_cpu_online(unsigned int cpu)
+{
+ struct sse_event *sse_evt;
+
+ scoped_guard(spinlock, &events_list_lock) {
+ list_for_each_entry(sse_evt, &events, list) {
+ if (sse_event_is_global(sse_evt->evt))
+ continue;
+
+ sse_event_register_local(sse_evt);
+ if (sse_evt->is_enabled)
+ sse_sbi_enable_event(sse_evt);
+ }
+ }
+
+ /* Ready to handle events. Unmask SSE. */
+ sse_events_unmask();
+
+ return 0;
+}
+
+static int sse_cpu_teardown(unsigned int cpu)
+{
+ unsigned int next_cpu;
+ struct sse_event *sse_evt;
+
+ /* Mask the sse events */
+ sse_events_mask();
+
+ scoped_guard(spinlock, &events_list_lock) {
+ list_for_each_entry(sse_evt, &events, list) {
+ if (!sse_event_is_global(sse_evt->evt)) {
+
+ if (sse_evt->is_enabled)
+ sse_sbi_disable_event(sse_evt);
+
+ sse_sbi_unregister_event(sse_evt);
+ continue;
+ }
+
+ if (sse_evt->cpu != smp_processor_id())
+ continue;
+
+ /* Update destination hart for global event */
+ next_cpu = cpumask_any_but(cpu_online_mask, cpu);
+ sse_event_set_target_cpu_nolock(sse_evt, next_cpu);
+ }
+ }
+
+ return 0;
+}
+
+static void sse_reset(void)
+{
+ struct sse_event *event = NULL;
+
+ list_for_each_entry(event, &events, list) {
+ sse_event_disable_nolock(event);
+ sse_event_unregister_nolock(event);
+ }
+}
+
+static int sse_pm_notifier(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+ WARN_ON_ONCE(preemptible());
+
+ switch (action) {
+ case CPU_PM_ENTER:
+ sse_events_mask();
+ break;
+ case CPU_PM_EXIT:
+ case CPU_PM_ENTER_FAILED:
+ sse_events_unmask();
+ break;
+ default:
+ return NOTIFY_DONE;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block sse_pm_nb = {
+ .notifier_call = sse_pm_notifier,
+};
+
+/*
+ * Mask all CPUs and unregister all events on panic, reboot or kexec.
+ */
+static int sse_reboot_notifier(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+ cpuhp_remove_state(sse_hp_state);
+
+ sse_reset();
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block sse_reboot_nb = {
+ .notifier_call = sse_reboot_notifier,
+};
+
+static int __init sse_init(void)
+{
+ int cpu, ret;
+
+ if (sbi_probe_extension(SBI_EXT_SSE) <= 0) {
+ pr_err("Missing SBI SSE extension\n");
+ return -EOPNOTSUPP;
+ }
+ pr_info("SBI SSE extension detected\n");
+
+ for_each_possible_cpu(cpu)
+ INIT_LIST_HEAD(&events);
+
+ ret = cpu_pm_register_notifier(&sse_pm_nb);
+ if (ret) {
+ pr_warn("Failed to register CPU PM notifier...\n");
+ return ret;
+ }
+
+ ret = register_reboot_notifier(&sse_reboot_nb);
+ if (ret) {
+ pr_warn("Failed to register reboot notifier...\n");
+ goto remove_cpupm;
+ }
+
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/sse:online",
+ sse_cpu_online, sse_cpu_teardown);
+ if (ret < 0)
+ goto remove_reboot;
+
+ sse_hp_state = ret;
+ sse_available = true;
+
+ return 0;
+
+remove_reboot:
+ unregister_reboot_notifier(&sse_reboot_nb);
+
+remove_cpupm:
+ cpu_pm_unregister_notifier(&sse_pm_nb);
+
+ return ret;
+}
+arch_initcall(sse_init);
diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
new file mode 100644
index 000000000000..c73184074b8c
--- /dev/null
+++ b/include/linux/riscv_sse.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Rivos Inc.
+ */
+
+#ifndef __LINUX_RISCV_SSE_H
+#define __LINUX_RISCV_SSE_H
+
+#include <linux/types.h>
+#include <linux/linkage.h>
+
+struct sse_event;
+struct pt_regs;
+
+typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
+
+#ifdef CONFIG_RISCV_SSE
+
+struct sse_event *sse_event_register(u32 event_num, u32 priority,
+ sse_event_handler *handler, void *arg);
+
+void sse_event_unregister(struct sse_event *evt);
+
+int sse_event_set_target_cpu(struct sse_event *sse_evt, unsigned int cpu);
+
+int sse_event_enable(struct sse_event *sse_evt);
+
+void sse_event_disable(struct sse_event *sse_evt);
+
+#else
+static inline struct sse_event *sse_event_register(u32 event_num, u32 priority,
+ sse_event_handler *handler,
+ void *arg)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static inline void sse_event_unregister(struct sse_event *evt) {}
+
+static inline int sse_event_set_target_cpu(struct sse_event *sse_evt,
+ unsigned int cpu)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int sse_event_enable(struct sse_event *sse_evt)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void sse_event_disable(struct sse_event *sse_evt) {}
+
+
+#endif
+
+#endif /* __LINUX_RISCV_SSE_H */
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 4/4] perf: RISC-V: add support for SSE event
2024-12-06 16:30 [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events Clément Léger
` (2 preceding siblings ...)
2024-12-06 16:30 ` [PATCH v3 3/4] drivers: firmware: add riscv SSE support Clément Léger
@ 2024-12-06 16:31 ` Clément Léger
3 siblings, 0 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-06 16:31 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel
Cc: Clément Léger, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
In order to use SSE within PMU drivers, register a SSE handler for the
local PMU event. Reuse the existing overflow IRQ handler and pass
appropriate pt_regs.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
---
drivers/perf/riscv_pmu_sbi.c | 51 +++++++++++++++++++++++++++++-------
1 file changed, 41 insertions(+), 10 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 1aa303f76cc7..bd7ab15483db 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -17,6 +17,7 @@
#include <linux/irqdomain.h>
#include <linux/of_irq.h>
#include <linux/of.h>
+#include <linux/riscv_sse.h>
#include <linux/cpu_pm.h>
#include <linux/sched/clock.h>
#include <linux/soc/andes/irq.h>
@@ -946,10 +947,10 @@ static void pmu_sbi_start_overflow_mask(struct riscv_pmu *pmu,
pmu_sbi_start_ovf_ctrs_sbi(cpu_hw_evt, ctr_ovf_mask);
}
-static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
+static irqreturn_t pmu_sbi_ovf_handler(struct cpu_hw_events *cpu_hw_evt,
+ struct pt_regs *regs, bool from_sse)
{
struct perf_sample_data data;
- struct pt_regs *regs;
struct hw_perf_event *hw_evt;
union sbi_pmu_ctr_info *info;
int lidx, hidx, fidx;
@@ -957,7 +958,6 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
struct perf_event *event;
u64 overflow;
u64 overflowed_ctrs = 0;
- struct cpu_hw_events *cpu_hw_evt = dev;
u64 start_clock = sched_clock();
struct riscv_pmu_snapshot_data *sdata = cpu_hw_evt->snapshot_addr;
@@ -967,13 +967,15 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
/* Firmware counter don't support overflow yet */
fidx = find_first_bit(cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS);
if (fidx == RISCV_MAX_COUNTERS) {
- csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num));
+ if (!from_sse)
+ csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num));
return IRQ_NONE;
}
event = cpu_hw_evt->events[fidx];
if (!event) {
- ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
+ if (!from_sse)
+ ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
return IRQ_NONE;
}
@@ -988,16 +990,16 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
/*
* Overflow interrupt pending bit should only be cleared after stopping
- * all the counters to avoid any race condition.
+ * all the counters to avoid any race condition. When using SSE,
+ * interrupt is cleared when stopping counters.
*/
- ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
+ if (!from_sse)
+ ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask);
/* No overflow bit is set */
if (!overflow)
return IRQ_NONE;
- regs = get_irq_regs();
-
for_each_set_bit(lidx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
struct perf_event *event = cpu_hw_evt->events[lidx];
@@ -1053,6 +1055,22 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
return IRQ_HANDLED;
}
+static irqreturn_t pmu_sbi_ovf_irq_handler(int irq, void *dev)
+{
+ return pmu_sbi_ovf_handler(dev, get_irq_regs(), false);
+}
+
+static int pmu_sbi_ovf_sse_handler(uint32_t evt, void *arg,
+ struct pt_regs *regs)
+{
+ struct cpu_hw_events __percpu *hw_events = arg;
+ struct cpu_hw_events *hw_event = raw_cpu_ptr(hw_events);
+
+ pmu_sbi_ovf_handler(hw_event, regs, true);
+
+ return 0;
+}
+
static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
{
struct riscv_pmu *pmu = hlist_entry_safe(node, struct riscv_pmu, node);
@@ -1100,9 +1118,22 @@ static int pmu_sbi_dying_cpu(unsigned int cpu, struct hlist_node *node)
static int pmu_sbi_setup_irqs(struct riscv_pmu *pmu, struct platform_device *pdev)
{
int ret;
+ struct sse_event *evt;
struct cpu_hw_events __percpu *hw_events = pmu->hw_events;
struct irq_domain *domain = NULL;
+ evt = sse_event_register(SBI_SSE_EVENT_LOCAL_PMU, 0,
+ pmu_sbi_ovf_sse_handler, hw_events);
+ if (!IS_ERR(evt)) {
+ ret = sse_event_enable(evt);
+ if (!ret) {
+ pr_info("using SSE for PMU event delivery\n");
+ return 0;
+ }
+
+ sse_event_unregister(evt);
+ }
+
if (riscv_isa_extension_available(NULL, SSCOFPMF)) {
riscv_pmu_irq_num = RV_IRQ_PMU;
riscv_pmu_use_irq = true;
@@ -1137,7 +1168,7 @@ static int pmu_sbi_setup_irqs(struct riscv_pmu *pmu, struct platform_device *pde
return -ENODEV;
}
- ret = request_percpu_irq(riscv_pmu_irq, pmu_sbi_ovf_handler, "riscv-pmu", hw_events);
+ ret = request_percpu_irq(riscv_pmu_irq, pmu_sbi_ovf_irq_handler, "riscv-pmu", hw_events);
if (ret) {
pr_err("registering percpu irq failed [%d]\n", ret);
return ret;
--
2.45.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
@ 2024-12-10 4:51 ` Himanshu Chauhan
2025-01-22 12:15 ` Alexandre Ghiti
2025-03-19 17:08 ` Andrew Jones
2 siblings, 0 replies; 22+ messages in thread
From: Himanshu Chauhan @ 2024-12-10 4:51 UTC (permalink / raw)
To: Clement Leger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Anup Patel, Xu Lu, Atish Patra
Hi Clement,
> On 6 Dec 2024, at 22:00, Clément Léger <cleger@rivosinc.com> wrote:
>
> The SBI SSE extension allows the supervisor software to be notified by
> the SBI of specific events that are not maskable. The context switch is
> handled partially by the firmware which will save registers a6 and a7.
> When entering kernel we can rely on these 2 registers to setup the stack
> and save all the registers.
>
> Since SSE events can be delivered at any time to the kernel (including
> during exception handling, we need a way to locate the current_task for
> context tracking. On RISC-V, it is sotred in scratch when in user space
> or tp when in kernel space (in which case SSCRATCH is zero). But at a
> at the beginning of exception handling, SSCRATCH is used to swap tp and
> check the origin of the exception. If interrupted at that point, then,
> there is no way to reliably know were is located the current
> task_struct. Even checking the interruption location won't work as SSE
> event can be nested on top of each other so the original interruption
> site might be lost at some point. In order to retrieve it reliably,
> store the current task in an additionnal __sse_entry_task per_cpu array.
> This array is then used to retrieve the current task based on the
> hart ID that is passed to the SSE event handler in a6.
>
> That being said, the way the current task struct is stored should
> probably be reworked to find a better reliable alternative.
>
> Since each events (and each CPU for local events) have their own
> context and can preempt each other, allocate a stack (and a shadow stack
> if needed for each of them (and for each cpu for local events).
>
> When completing the event, if we were coming from kernel with interrupts
> disabled, simply return there. If coming from userspace or kernel with
> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
> CSR_IP to allow delivery of signals to user task. For instance this can
> happen, when a RAS event has been generated by a user application and a
> SIGBUS has been sent to a task.
>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
> arch/riscv/include/asm/asm.h | 14 ++-
> arch/riscv/include/asm/scs.h | 7 ++
> arch/riscv/include/asm/sse.h | 38 ++++++
> arch/riscv/include/asm/switch_to.h | 14 +++
> arch/riscv/include/asm/thread_info.h | 1 +
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/asm-offsets.c | 12 ++
> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
> 9 files changed, 389 insertions(+), 3 deletions(-)
> create mode 100644 arch/riscv/include/asm/sse.h
> create mode 100644 arch/riscv/kernel/sse.c
> create mode 100644 arch/riscv/kernel/sse_entry.S
>
> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> index 776354895b81..de8427c58f02 100644
> --- a/arch/riscv/include/asm/asm.h
> +++ b/arch/riscv/include/asm/asm.h
> @@ -89,16 +89,24 @@
> #define PER_CPU_OFFSET_SHIFT 3
> #endif
>
> -.macro asm_per_cpu dst sym tmp
> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
> la \dst, __per_cpu_offset
> add \dst, \dst, \tmp
> REG_L \tmp, 0(\dst)
> la \dst, \sym
> add \dst, \dst, \tmp
> .endm
> +
> +.macro asm_per_cpu dst sym tmp
> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
> +.endm
> #else /* CONFIG_SMP */
> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
> + la \dst, \sym
> +.endm
> +
> .macro asm_per_cpu dst sym tmp
> la \dst, \sym
> .endm
> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
> index 0e45db78b24b..62344daad73d 100644
> --- a/arch/riscv/include/asm/scs.h
> +++ b/arch/riscv/include/asm/scs.h
> @@ -18,6 +18,11 @@
> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
> .endm
>
> +/* Load the per-CPU IRQ shadow call stack to gp. */
> +.macro scs_load_sse_stack reg_evt
> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
> +.endm
> +
> /* Load task_scs_sp(current) to gp. */
> .macro scs_load_current
> REG_L gp, TASK_TI_SCS_SP(tp)
> @@ -41,6 +46,8 @@
> .endm
> .macro scs_load_irq_stack tmp
> .endm
> +.macro scs_load_sse_stack reg_evt
> +.endm
> .macro scs_load_current
> .endm
> .macro scs_load_current_if_task_changed prev
> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
> new file mode 100644
> index 000000000000..431a19d4cd9c
> --- /dev/null
> +++ b/arch/riscv/include/asm/sse.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +#ifndef __ASM_SSE_H
> +#define __ASM_SSE_H
> +
> +#ifdef CONFIG_RISCV_SSE
> +
> +struct sse_event_interrupted_state {
> + unsigned long a6;
> + unsigned long a7;
> +};
> +
> +struct sse_event_arch_data {
> + void *stack;
> + void *shadow_stack;
> + unsigned long tmp;
> + struct sse_event_interrupted_state interrupted;
> + unsigned long interrupted_state_phys;
> + u32 evt_id;
> +};
> +
> +struct sse_registered_event;
> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id,
> + int cpu);
> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
> +
> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
> + struct pt_regs *regs);
> +asmlinkage void handle_sse(void);
> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
> + struct pt_regs *reg);
> +
> +#endif
> +
> +#endif
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 94e33216b2d9..e166fabe04ab 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct task_struct *next)
> :: "r" (next->thread.envcfg) : "memory");
> }
>
> +#ifdef CONFIG_RISCV_SSE
> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
> +
> +static inline void __switch_sse_entry_task(struct task_struct *next)
> +{
> + __this_cpu_write(__sse_entry_task, next);
> +}
> +#else
> +static inline void __switch_sse_entry_task(struct task_struct *next)
> +{
> +}
> +#endif
> +
> extern struct task_struct *__switch_to(struct task_struct *,
> struct task_struct *);
>
> @@ -122,6 +135,7 @@ do { \
> if (switch_to_should_flush_icache(__next)) \
> local_flush_icache_all(); \
> __switch_to_envcfg(__next); \
> + __switch_sse_entry_task(__next); \
> ((last) = __switch_to(__prev, __next)); \
> } while (0)
>
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index f5916a70879a..28e9805e61fc 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -36,6 +36,7 @@
> #define OVERFLOW_STACK_SIZE SZ_4K
>
> #define IRQ_STACK_SIZE THREAD_SIZE
> +#define SSE_STACK_SIZE THREAD_SIZE
>
> #ifndef __ASSEMBLY__
>
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 063d1faf5a53..1e8fb83b1162 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
> ifeq ($(CONFIG_RISCV_SBI), y)
> obj-$(CONFIG_SMP) += sbi-ipi.o
> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index e89455a6a0e5..60590a3d9519 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -14,6 +14,8 @@
> #include <asm/ptrace.h>
> #include <asm/cpu_ops_sbi.h>
> #include <asm/stacktrace.h>
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> #include <asm/suspend.h>
>
> void asm_offsets(void);
> @@ -511,4 +513,14 @@ void asm_offsets(void)
> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
> #endif
> +
> +#ifdef CONFIG_RISCV_SSE
> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
> +
> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
> + DEFINE(NR_CPUS, NR_CPUS);
> +#endif
> }
> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
> new file mode 100644
> index 000000000000..b48ae69dad8d
> --- /dev/null
> +++ b/arch/riscv/kernel/sse.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +#include <linux/nmi.h>
> +#include <linux/scs.h>
> +#include <linux/bitfield.h>
> +#include <linux/riscv_sse.h>
> +#include <linux/percpu-defs.h>
> +
> +#include <asm/asm-prototypes.h>
> +#include <asm/switch_to.h>
> +#include <asm/irq_stack.h>
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> +
> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
> +
> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> +}
> +
> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> + nmi_enter();
> +
> + /* Retrieve missing GPRs from SBI */
> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
> + SBI_SSE_ATTR_INTERRUPTED_A6,
> + (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
> + arch_evt->interrupted_state_phys, 0, 0);
> +
> + memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt->interrupted));
> +
> + sse_handle_event(arch_evt, regs);
> +
> + /*
> + * The SSE delivery path does not uses the "standard" exception path and
> + * thus does not process any pending signal/softirqs. Some drivers might
> + * enqueue pending work that needs to be handled as soon as possible.
> + * For that purpose, set the software interrupt pending bit which will
> + * be serviced once interrupts are reenabled
> + */
> + csr_set(CSR_IP, IE_SIE);
> +
> + nmi_exit();
> +}
> +
> +#ifdef CONFIG_VMAP_STACK
> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
> +{
> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
> +}
> +
> +static void sse_stack_free(unsigned long *stack)
> +{
> + vfree(stack);
> +}
> +#else /* CONFIG_VMAP_STACK */
> +
> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
> +{
> + return kmalloc(size, GFP_KERNEL);
> +}
> +
> +static void sse_stack_free(unsigned long *stack)
> +{
> + kfree(stack);
> +}
> +
> +#endif /* CONFIG_VMAP_STACK */
> +
> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
> +{
> + void *stack;
> +
> + if (!scs_is_enabled())
> + return 0;
> +
> + stack = scs_alloc(cpu_to_node(cpu));
> + if (!stack)
> + return 1;
> +
> + arch_evt->shadow_stack = stack;
> +
> + return 0;
> +}
> +
> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
> +{
> + void *stack;
> +
> + arch_evt->evt_id = evt_id;
> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
> + if (!stack)
> + return -ENOMEM;
> +
> + arch_evt->stack = stack + SSE_STACK_SIZE;
> +
> + if (sse_init_scs(cpu, arch_evt))
> + goto free_stack;
> +
> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> + arch_evt->interrupted_state_phys =
> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> + } else {
> + arch_evt->interrupted_state_phys =
> + virt_to_phys(&arch_evt->interrupted);
> + }
> +
> + return 0;
> +
> +free_stack:
> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
> +
> + return -ENOMEM;
> +}
> +
> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
> +{
> + scs_free(arch_evt->shadow_stack);
> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
> +}
> +
> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
> +{
> + struct sbiret sret;
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt->evt_id,
> + (unsigned long) handle_sse, (unsigned long) arch_evt,
> + 0, 0, 0);
> +
> + return sbi_err_map_linux_errno(sret.error);
> +}
> diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/sse_entry.S
> new file mode 100644
> index 000000000000..0b2f890edd89
> --- /dev/null
> +++ b/arch/riscv/kernel/sse_entry.S
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/linkage.h>
> +
> +#include <asm/asm.h>
> +#include <asm/csr.h>
> +#include <asm/scs.h>
> +
> +/* When entering handle_sse, the following registers are set:
> + * a6: contains the hartid
> + * a6: contains struct sse_registered_event pointer
Please fix this comment.
Regards
Himanshu
> + */
> +SYM_CODE_START(handle_sse)
> + /* Save stack temporarily */
> + REG_S sp, SSE_REG_EVT_TMP(a7)
> + /* Set entry stack */
> + REG_L sp, SSE_REG_EVT_STACK(a7)
> +
> + addi sp, sp, -(PT_SIZE_ON_STACK)
> + REG_S ra, PT_RA(sp)
> + REG_S s0, PT_S0(sp)
> + REG_S s1, PT_S1(sp)
> + REG_S s2, PT_S2(sp)
> + REG_S s3, PT_S3(sp)
> + REG_S s4, PT_S4(sp)
> + REG_S s5, PT_S5(sp)
> + REG_S s6, PT_S6(sp)
> + REG_S s7, PT_S7(sp)
> + REG_S s8, PT_S8(sp)
> + REG_S s9, PT_S9(sp)
> + REG_S s10, PT_S10(sp)
> + REG_S s11, PT_S11(sp)
> + REG_S tp, PT_TP(sp)
> + REG_S t0, PT_T0(sp)
> + REG_S t1, PT_T1(sp)
> + REG_S t2, PT_T2(sp)
> + REG_S t3, PT_T3(sp)
> + REG_S t4, PT_T4(sp)
> + REG_S t5, PT_T5(sp)
> + REG_S t6, PT_T6(sp)
> + REG_S gp, PT_GP(sp)
> + REG_S a0, PT_A0(sp)
> + REG_S a1, PT_A1(sp)
> + REG_S a2, PT_A2(sp)
> + REG_S a3, PT_A3(sp)
> + REG_S a4, PT_A4(sp)
> + REG_S a5, PT_A5(sp)
> +
> + /* Retrieve entry sp */
> + REG_L a4, SSE_REG_EVT_TMP(a7)
> + /* Save CSRs */
> + csrr a0, CSR_EPC
> + csrr a1, CSR_SSTATUS
> + csrr a2, CSR_STVAL
> + csrr a3, CSR_SCAUSE
> +
> + REG_S a0, PT_EPC(sp)
> + REG_S a1, PT_STATUS(sp)
> + REG_S a2, PT_BADADDR(sp)
> + REG_S a3, PT_CAUSE(sp)
> + REG_S a4, PT_SP(sp)
> +
> + /* Disable user memory access and floating/vector computing */
> + li t0, SR_SUM | SR_FS_VS
> + csrc CSR_STATUS, t0
> +
> + load_global_pointer
> + scs_load_sse_stack a7
> +
> + /* Restore current task struct from __sse_entry_task */
> + li t1, NR_CPUS
> + move t3, zero
> +
> +#ifdef CONFIG_SMP
> + /* Find the CPU id associated to the hart id */
> + la t0, __cpuid_to_hartid_map
> +.Lhart_id_loop:
> + REG_L t2, 0(t0)
> + beq t2, a6, .Lcpu_id_found
> +
> + /* Increment pointer and CPU number */
> + addi t3, t3, 1
> + addi t0, t0, RISCV_SZPTR
> + bltu t3, t1, .Lhart_id_loop
> +
> + /*
> + * This should never happen since we expect the hart_id to match one
> + * of our CPU, but better be safe than sorry
> + */
> + la tp, init_task
> + la a0, sse_hart_id_panic_string
> + la t0, panic
> + jalr t0
> +
> +.Lcpu_id_found:
> +#endif
> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
> + REG_L tp, 0(t2)
> +
> + move a1, sp /* pt_regs on stack */
> + /* Kernel was interrupted, create stack frame */
> + beqz s1, .Lcall_do_sse
> +
> +.Lcall_do_sse:
> + /*
> + * Save sscratch for restoration since we might have interrupted the
> + * kernel in early exception path and thus, we don't know the content of
> + * sscratch.
> + */
> + csrr s4, CSR_SSCRATCH
> + /* In-kernel scratch is 0 */
> + csrw CSR_SCRATCH, x0
> +
> + move a0, a7
> +
> + call do_sse
> +
> + csrw CSR_SSCRATCH, s4
> +
> + REG_L a0, PT_EPC(sp)
> + REG_L a1, PT_STATUS(sp)
> + REG_L a2, PT_BADADDR(sp)
> + REG_L a3, PT_CAUSE(sp)
> + csrw CSR_EPC, a0
> + csrw CSR_SSTATUS, a1
> + csrw CSR_STVAL, a2
> + csrw CSR_SCAUSE, a3
> +
> + REG_L ra, PT_RA(sp)
> + REG_L s0, PT_S0(sp)
> + REG_L s1, PT_S1(sp)
> + REG_L s2, PT_S2(sp)
> + REG_L s3, PT_S3(sp)
> + REG_L s4, PT_S4(sp)
> + REG_L s5, PT_S5(sp)
> + REG_L s6, PT_S6(sp)
> + REG_L s7, PT_S7(sp)
> + REG_L s8, PT_S8(sp)
> + REG_L s9, PT_S9(sp)
> + REG_L s10, PT_S10(sp)
> + REG_L s11, PT_S11(sp)
> + REG_L tp, PT_TP(sp)
> + REG_L t0, PT_T0(sp)
> + REG_L t1, PT_T1(sp)
> + REG_L t2, PT_T2(sp)
> + REG_L t3, PT_T3(sp)
> + REG_L t4, PT_T4(sp)
> + REG_L t5, PT_T5(sp)
> + REG_L t6, PT_T6(sp)
> + REG_L gp, PT_GP(sp)
> + REG_L a0, PT_A0(sp)
> + REG_L a1, PT_A1(sp)
> + REG_L a2, PT_A2(sp)
> + REG_L a3, PT_A3(sp)
> + REG_L a4, PT_A4(sp)
> + REG_L a5, PT_A5(sp)
> +
> + REG_L sp, PT_SP(sp)
> +
> + li a7, SBI_EXT_SSE
> + li a6, SBI_SSE_EVENT_COMPLETE
> + ecall
> +
> +SYM_CODE_END(handle_sse)
> +
> +sse_hart_id_panic_string:
> + .ascii "Unable to match hart_id with cpu\0"
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2024-12-06 16:30 ` [PATCH v3 3/4] drivers: firmware: add riscv SSE support Clément Léger
@ 2024-12-13 5:03 ` Himanshu Chauhan
2024-12-13 8:33 ` Clément Léger
2025-01-16 13:58 ` Conor Dooley
1 sibling, 1 reply; 22+ messages in thread
From: Himanshu Chauhan @ 2024-12-13 5:03 UTC (permalink / raw)
To: Clément Léger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Anup Patel, Xu Lu, Atish Patra
Hi Clement,
On Fri, Dec 06, 2024 at 05:30:59PM +0100, Clément Léger wrote:
> Add driver level interface to use RISC-V SSE arch support. This interface
> allows registering SSE handlers, and receive them. This will be used by
> PMU and GHES driver.
>
> Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
> MAINTAINERS | 14 +
> drivers/firmware/Kconfig | 1 +
> drivers/firmware/Makefile | 1 +
> drivers/firmware/riscv/Kconfig | 15 +
> drivers/firmware/riscv/Makefile | 3 +
> drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++++
> include/linux/riscv_sse.h | 56 +++
> 7 files changed, 781 insertions(+)
> create mode 100644 drivers/firmware/riscv/Kconfig
> create mode 100644 drivers/firmware/riscv/Makefile
> create mode 100644 drivers/firmware/riscv/riscv_sse.c
> create mode 100644 include/linux/riscv_sse.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 686109008d8e..a3ddde7fe9fb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20125,6 +20125,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
> F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> F: drivers/iommu/riscv/
>
> +RISC-V FIRMWARE DRIVERS
> +M: Conor Dooley <conor@kernel.org>
> +L: linux-riscv@lists.infradead.org
> +S: Maintained
> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
> +F: drivers/firmware/riscv/*
> +
> RISC-V MICROCHIP FPGA SUPPORT
> M: Conor Dooley <conor.dooley@microchip.com>
> M: Daire McNamara <daire.mcnamara@microchip.com>
> @@ -20177,6 +20184,13 @@ F: drivers/perf/riscv_pmu.c
> F: drivers/perf/riscv_pmu_legacy.c
> F: drivers/perf/riscv_pmu_sbi.c
>
> +RISC-V SSE DRIVER
> +M: Clément Léger <cleger@rivosinc.com>
> +L: linux-riscv@lists.infradead.org
> +S: Maintained
> +F: drivers/firmware/riscv/riscv_sse.c
> +F: include/linux/riscv_sse.h
> +
I request you to add me as a reviewer to these SSE files.
Himanshu Chauhan <himanshu@thechauhan.dev>
Thanks
Regards
Himanshu
> RISC-V THEAD SoC SUPPORT
> M: Drew Fustini <drew@pdp7.com>
> M: Guo Ren <guoren@kernel.org>
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index 71d8b26c4103..9e996a1fd511 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -267,6 +267,7 @@ source "drivers/firmware/meson/Kconfig"
> source "drivers/firmware/microchip/Kconfig"
> source "drivers/firmware/psci/Kconfig"
> source "drivers/firmware/qcom/Kconfig"
> +source "drivers/firmware/riscv/Kconfig"
> source "drivers/firmware/smccc/Kconfig"
> source "drivers/firmware/tegra/Kconfig"
> source "drivers/firmware/xilinx/Kconfig"
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 7a8d486e718f..c0f5009949a8 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -33,6 +33,7 @@ obj-y += efi/
> obj-y += imx/
> obj-y += psci/
> obj-y += qcom/
> +obj-y += riscv/
> obj-y += smccc/
> obj-y += tegra/
> obj-y += xilinx/
> diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
> new file mode 100644
> index 000000000000..8056ed3262d9
> --- /dev/null
> +++ b/drivers/firmware/riscv/Kconfig
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +menu "Risc-V Specific firmware drivers"
> +depends on RISCV
> +
> +config RISCV_SSE
> + bool "Enable SBI Supervisor Software Events support"
> + depends on RISCV_SBI
> + default y
> + help
> + The Supervisor Software Events support allow the SBI to deliver
> + NMI-like notifications to the supervisor mode software. When enable,
> + this option provides support to register callbacks on specific SSE
> + events.
> +
> +endmenu
> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
> new file mode 100644
> index 000000000000..4ccfcbbc28ea
> --- /dev/null
> +++ b/drivers/firmware/riscv/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_RISCV_SSE) += riscv_sse.o
> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
> new file mode 100644
> index 000000000000..c165e32cc9a5
> --- /dev/null
> +++ b/drivers/firmware/riscv/riscv_sse.c
> @@ -0,0 +1,691 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#define pr_fmt(fmt) "sse: " fmt
> +
> +#include <linux/cpu.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/cpu_pm.h>
> +#include <linux/hardirq.h>
> +#include <linux/list.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/reboot.h>
> +#include <linux/riscv_sse.h>
> +#include <linux/slab.h>
> +
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> +
> +struct sse_event {
> + struct list_head list;
> + u32 evt;
> + u32 priority;
> + sse_event_handler *handler;
> + void *handler_arg;
> + bool is_enabled;
> + /* Only valid for global events */
> + unsigned int cpu;
> +
> + union {
> + struct sse_registered_event *global;
> + struct sse_registered_event __percpu *local;
> + };
> +};
> +
> +static int sse_hp_state;
> +static bool sse_available;
> +static DEFINE_SPINLOCK(events_list_lock);
> +static LIST_HEAD(events);
> +static DEFINE_MUTEX(sse_mutex);
> +
> +struct sse_registered_event {
> + struct sse_event_arch_data arch;
> + struct sse_event *evt;
> + unsigned long attr_buf;
> +};
> +
> +void sse_handle_event(struct sse_event_arch_data *arch_event,
> + struct pt_regs *regs)
> +{
> + int ret;
> + struct sse_registered_event *reg_evt =
> + container_of(arch_event, struct sse_registered_event, arch);
> + struct sse_event *evt = reg_evt->evt;
> +
> + ret = evt->handler(evt->evt, evt->handler_arg, regs);
> + if (ret)
> + pr_warn("event %x handler failed with error %d\n", evt->evt,
> + ret);
> +}
> +
> +static bool sse_event_is_global(u32 evt)
> +{
> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
> +}
> +
> +static
> +struct sse_event *sse_event_get(u32 evt)
> +{
> + struct sse_event *sse_evt = NULL, *tmp;
> +
> + scoped_guard(spinlock, &events_list_lock) {
> + list_for_each_entry(tmp, &events, list) {
> + if (tmp->evt == evt) {
> + return sse_evt;
> + }
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static phys_addr_t sse_event_get_phys(struct sse_registered_event *reg_evt,
> + void *addr)
> +{
> + phys_addr_t phys;
> +
> + if (sse_event_is_global(reg_evt->evt->evt))
> + phys = virt_to_phys(addr);
> + else
> + phys = per_cpu_ptr_to_phys(addr);
> +
> + return phys;
> +}
> +
> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
> +{
> + struct sbiret ret;
> + u32 evt = event->evt;
> +
> + ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
> + if (ret.error)
> + pr_debug("Failed to execute func %lx, event %x, error %ld\n",
> + func, evt, ret.error);
> +
> + return sbi_err_map_linux_errno(ret.error);
> +}
> +
> +static int sse_sbi_disable_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
> +}
> +
> +static int sse_sbi_enable_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
> +}
> +
> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
> + unsigned long attr_id, unsigned long *val)
> +{
> + struct sbiret sret;
> + u32 evt = reg_evt->evt->evt;
> + unsigned long phys;
> +
> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt,
> + attr_id, 1, phys, 0, 0);
> + if (sret.error) {
> + pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
> + attr_id, sret.error);
> + return sbi_err_map_linux_errno(sret.error);
> + }
> +
> + *val = reg_evt->attr_buf;
> +
> + return 0;
> +}
> +
> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
> + unsigned long attr_id, unsigned long val)
> +{
> + struct sbiret sret;
> + u32 evt = reg_evt->evt->evt;
> + unsigned long phys;
> +
> + reg_evt->attr_buf = val;
> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt,
> + attr_id, 1, phys, 0, 0);
> + if (sret.error && sret.error != SBI_ERR_INVALID_STATE) {
> + pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
> + attr_id, sret.error);
> + return sbi_err_map_linux_errno(sret.error);
> + }
> +
> + return 0;
> +}
> +
> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
> + unsigned int cpu)
> +{
> + unsigned int hart_id = cpuid_to_hartid_map(cpu);
> + struct sse_registered_event *reg_evt = event->global;
> + u32 evt = event->evt;
> + bool was_enabled;
> + int ret;
> +
> + if (!sse_event_is_global(evt))
> + return -EINVAL;
> +
> + was_enabled = event->is_enabled;
> + if (was_enabled)
> + sse_sbi_disable_event(event);
> + do {
> + ret = sse_event_attr_set_nolock(reg_evt,
> + SBI_SSE_ATTR_PREFERRED_HART,
> + hart_id);
> + } while (ret == -EINVAL);
> +
> + if (ret == 0)
> + event->cpu = cpu;
> +
> + if (was_enabled)
> + sse_sbi_enable_event(event);
> +
> + return 0;
> +}
> +
> +int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
> +{
> + int ret;
> +
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> +
> + if (!cpu_online(cpu))
> + return -EINVAL;
> +
> + ret = sse_event_set_target_cpu_nolock(event, cpu);
> +
> + cpus_read_unlock();
> + }
> +
> + return ret;
> +}
> +
> +static int sse_event_init_registered(unsigned int cpu,
> + struct sse_registered_event *reg_evt,
> + struct sse_event *event)
> +{
> + reg_evt->evt = event;
> + arch_sse_init_event(®_evt->arch, event->evt, cpu);
> +
> + return 0;
> +}
> +
> +static void sse_event_free_registered(struct sse_registered_event *reg_evt)
> +{
> + arch_sse_free_event(®_evt->arch);
> +}
> +
> +static int sse_event_alloc_global(struct sse_event *event)
> +{
> + int err;
> + struct sse_registered_event *reg_evt;
> +
> + reg_evt = kzalloc(sizeof(*reg_evt), GFP_KERNEL);
> + if (!reg_evt)
> + return -ENOMEM;
> +
> + event->global = reg_evt;
> + err = sse_event_init_registered(smp_processor_id(), reg_evt,
> + event);
> + if (err)
> + kfree(reg_evt);
> +
> + return err;
> +}
> +
> +static int sse_event_alloc_local(struct sse_event *event)
> +{
> + int err;
> + unsigned int cpu, err_cpu;
> + struct sse_registered_event *reg_evt;
> + struct sse_registered_event __percpu *reg_evts;
> +
> + reg_evts = alloc_percpu(struct sse_registered_event);
> + if (!reg_evts)
> + return -ENOMEM;
> +
> + event->local = reg_evts;
> +
> + for_each_possible_cpu(cpu) {
> + reg_evt = per_cpu_ptr(reg_evts, cpu);
> + err = sse_event_init_registered(cpu, reg_evt, event);
> + if (err) {
> + err_cpu = cpu;
> + goto err_free_per_cpu;
> + }
> + }
> +
> + return 0;
> +
> +err_free_per_cpu:
> + for_each_possible_cpu(cpu) {
> + if (cpu == err_cpu)
> + break;
> + reg_evt = per_cpu_ptr(reg_evts, cpu);
> + sse_event_free_registered(reg_evt);
> + }
> +
> + free_percpu(reg_evts);
> +
> + return err;
> +}
> +
> +static struct sse_event *sse_event_alloc(u32 evt,
> + u32 priority,
> + sse_event_handler *handler, void *arg)
> +{
> + int err;
> + struct sse_event *event;
> +
> + event = kzalloc(sizeof(*event), GFP_KERNEL);
> + if (!event)
> + return ERR_PTR(-ENOMEM);
> +
> + event->evt = evt;
> + event->priority = priority;
> + event->handler_arg = arg;
> + event->handler = handler;
> +
> + if (sse_event_is_global(evt)) {
> + err = sse_event_alloc_global(event);
> + if (err)
> + goto err_alloc_reg_evt;
> + } else {
> + err = sse_event_alloc_local(event);
> + if (err)
> + goto err_alloc_reg_evt;
> + }
> +
> + return event;
> +
> +err_alloc_reg_evt:
> + kfree(event);
> +
> + return ERR_PTR(err);
> +}
> +
> +static int sse_sbi_register_event(struct sse_event *event,
> + struct sse_registered_event *reg_evt)
> +{
> + int ret;
> +
> + ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
> + event->priority);
> + if (ret)
> + return ret;
> +
> + return arch_sse_register_event(®_evt->arch);
> +}
> +
> +static int sse_event_register_local(struct sse_event *event)
> +{
> + int ret;
> + struct sse_registered_event *reg_evt = per_cpu_ptr(event->local,
> + smp_processor_id());
> +
> + ret = sse_sbi_register_event(event, reg_evt);
> + if (ret)
> + pr_debug("Failed to register event %x: err %d\n", event->evt,
> + ret);
> +
> + return ret;
> +}
> +
> +
> +static int sse_sbi_unregister_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
> +}
> +
> +struct sse_per_cpu_evt {
> + struct sse_event *event;
> + unsigned long func;
> + atomic_t error;
> +};
> +
> +static void sse_event_per_cpu_func(void *info)
> +{
> + int ret;
> + struct sse_per_cpu_evt *cpu_evt = info;
> +
> + if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
> + ret = sse_event_register_local(cpu_evt->event);
> + else
> + ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
> +
> + if (ret)
> + atomic_set(&cpu_evt->error, ret);
> +}
> +
> +static void sse_event_free(struct sse_event *event)
> +{
> + unsigned int cpu;
> + struct sse_registered_event *reg_evt;
> +
> + if (sse_event_is_global(event->evt)) {
> + sse_event_free_registered(event->global);
> + kfree(event->global);
> + } else {
> + for_each_possible_cpu(cpu) {
> + reg_evt = per_cpu_ptr(event->local, cpu);
> + sse_event_free_registered(reg_evt);
> + }
> + free_percpu(event->local);
> + }
> +
> + kfree(event);
> +}
> +
> +int sse_event_enable(struct sse_event *event)
> +{
> + int ret = 0;
> + struct sse_per_cpu_evt cpu_evt;
> +
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> + if (sse_event_is_global(event->evt)) {
> + ret = sse_sbi_enable_event(event);
> + } else {
> + cpu_evt.event = event;
> + atomic_set(&cpu_evt.error, 0);
> + cpu_evt.func = SBI_SSE_EVENT_ENABLE;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + ret = atomic_read(&cpu_evt.error);
> + if (ret) {
> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
> + 1);
> + }
> + }
> + cpus_read_unlock();
> +
> + if (ret == 0)
> + event->is_enabled = true;
> + }
> +
> + return ret;
> +}
> +
> +static void sse_events_mask(void)
> +{
> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_MASK, 0, 0, 0, 0, 0, 0);
> +}
> +
> +static void sse_events_unmask(void)
> +{
> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_UNMASK, 0, 0, 0, 0, 0, 0);
> +}
> +
> +static void sse_event_disable_nolock(struct sse_event *event)
> +{
> + struct sse_per_cpu_evt cpu_evt;
> +
> + if (sse_event_is_global(event->evt)) {
> + sse_sbi_disable_event(event);
> + } else {
> + cpu_evt.event = event;
> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + }
> +}
> +
> +void sse_event_disable(struct sse_event *event)
> +{
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> + sse_event_disable_nolock(event);
> + event->is_enabled = false;
> + cpus_read_unlock();
> + }
> +}
> +
> +struct sse_event *sse_event_register(u32 evt, u32 priority,
> + sse_event_handler *handler, void *arg)
> +{
> + struct sse_per_cpu_evt cpu_evt;
> + struct sse_event *event;
> + int ret = 0;
> +
> + if (!sse_available)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + mutex_lock(&sse_mutex);
> + if (sse_event_get(evt)) {
> + pr_debug("Event %x already registered\n", evt);
> + ret = -EEXIST;
> + goto out_unlock;
> + }
> +
> + event = sse_event_alloc(evt, priority, handler, arg);
> + if (IS_ERR(event)) {
> + ret = PTR_ERR(event);
> + goto out_unlock;
> + }
> +
> + cpus_read_lock();
> + if (sse_event_is_global(evt)) {
> + unsigned long preferred_hart;
> +
> + ret = sse_event_attr_get_no_lock(event->global,
> + SBI_SSE_ATTR_PREFERRED_HART,
> + &preferred_hart);
> + if (ret)
> + goto err_event_free;
> + event->cpu = riscv_hartid_to_cpuid(preferred_hart);
> +
> + ret = sse_sbi_register_event(event, event->global);
> + if (ret)
> + goto err_event_free;
> +
> + } else {
> + cpu_evt.event = event;
> + atomic_set(&cpu_evt.error, 0);
> + cpu_evt.func = SBI_SSE_EVENT_REGISTER;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + ret = atomic_read(&cpu_evt.error);
> + if (ret) {
> + cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + goto err_event_free;
> + }
> + }
> + cpus_read_unlock();
> +
> + scoped_guard(spinlock, &events_list_lock)
> + list_add(&event->list, &events);
> +
> + mutex_unlock(&sse_mutex);
> +
> + return event;
> +
> +err_event_free:
> + cpus_read_unlock();
> + sse_event_free(event);
> +out_unlock:
> + mutex_unlock(&sse_mutex);
> +
> + return ERR_PTR(ret);
> +}
> +
> +static void sse_event_unregister_nolock(struct sse_event *event)
> +{
> + struct sse_per_cpu_evt cpu_evt;
> +
> + if (sse_event_is_global(event->evt)) {
> + sse_sbi_unregister_event(event);
> + } else {
> + cpu_evt.event = event;
> + cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + }
> +}
> +
> +void sse_event_unregister(struct sse_event *event)
> +{
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> + sse_event_unregister_nolock(event);
> + cpus_read_unlock();
> +
> + scoped_guard(spinlock, &events_list_lock)
> + list_del(&event->list);
> +
> + sse_event_free(event);
> + }
> +}
> +
> +static int sse_cpu_online(unsigned int cpu)
> +{
> + struct sse_event *sse_evt;
> +
> + scoped_guard(spinlock, &events_list_lock) {
> + list_for_each_entry(sse_evt, &events, list) {
> + if (sse_event_is_global(sse_evt->evt))
> + continue;
> +
> + sse_event_register_local(sse_evt);
> + if (sse_evt->is_enabled)
> + sse_sbi_enable_event(sse_evt);
> + }
> + }
> +
> + /* Ready to handle events. Unmask SSE. */
> + sse_events_unmask();
> +
> + return 0;
> +}
> +
> +static int sse_cpu_teardown(unsigned int cpu)
> +{
> + unsigned int next_cpu;
> + struct sse_event *sse_evt;
> +
> + /* Mask the sse events */
> + sse_events_mask();
> +
> + scoped_guard(spinlock, &events_list_lock) {
> + list_for_each_entry(sse_evt, &events, list) {
> + if (!sse_event_is_global(sse_evt->evt)) {
> +
> + if (sse_evt->is_enabled)
> + sse_sbi_disable_event(sse_evt);
> +
> + sse_sbi_unregister_event(sse_evt);
> + continue;
> + }
> +
> + if (sse_evt->cpu != smp_processor_id())
> + continue;
> +
> + /* Update destination hart for global event */
> + next_cpu = cpumask_any_but(cpu_online_mask, cpu);
> + sse_event_set_target_cpu_nolock(sse_evt, next_cpu);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static void sse_reset(void)
> +{
> + struct sse_event *event = NULL;
> +
> + list_for_each_entry(event, &events, list) {
> + sse_event_disable_nolock(event);
> + sse_event_unregister_nolock(event);
> + }
> +}
> +
> +static int sse_pm_notifier(struct notifier_block *nb, unsigned long action,
> + void *data)
> +{
> + WARN_ON_ONCE(preemptible());
> +
> + switch (action) {
> + case CPU_PM_ENTER:
> + sse_events_mask();
> + break;
> + case CPU_PM_EXIT:
> + case CPU_PM_ENTER_FAILED:
> + sse_events_unmask();
> + break;
> + default:
> + return NOTIFY_DONE;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block sse_pm_nb = {
> + .notifier_call = sse_pm_notifier,
> +};
> +
> +/*
> + * Mask all CPUs and unregister all events on panic, reboot or kexec.
> + */
> +static int sse_reboot_notifier(struct notifier_block *nb, unsigned long action,
> + void *data)
> +{
> + cpuhp_remove_state(sse_hp_state);
> +
> + sse_reset();
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block sse_reboot_nb = {
> + .notifier_call = sse_reboot_notifier,
> +};
> +
> +static int __init sse_init(void)
> +{
> + int cpu, ret;
> +
> + if (sbi_probe_extension(SBI_EXT_SSE) <= 0) {
> + pr_err("Missing SBI SSE extension\n");
> + return -EOPNOTSUPP;
> + }
> + pr_info("SBI SSE extension detected\n");
> +
> + for_each_possible_cpu(cpu)
> + INIT_LIST_HEAD(&events);
> +
> + ret = cpu_pm_register_notifier(&sse_pm_nb);
> + if (ret) {
> + pr_warn("Failed to register CPU PM notifier...\n");
> + return ret;
> + }
> +
> + ret = register_reboot_notifier(&sse_reboot_nb);
> + if (ret) {
> + pr_warn("Failed to register reboot notifier...\n");
> + goto remove_cpupm;
> + }
> +
> + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/sse:online",
> + sse_cpu_online, sse_cpu_teardown);
> + if (ret < 0)
> + goto remove_reboot;
> +
> + sse_hp_state = ret;
> + sse_available = true;
> +
> + return 0;
> +
> +remove_reboot:
> + unregister_reboot_notifier(&sse_reboot_nb);
> +
> +remove_cpupm:
> + cpu_pm_unregister_notifier(&sse_pm_nb);
> +
> + return ret;
> +}
> +arch_initcall(sse_init);
> diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
> new file mode 100644
> index 000000000000..c73184074b8c
> --- /dev/null
> +++ b/include/linux/riscv_sse.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#ifndef __LINUX_RISCV_SSE_H
> +#define __LINUX_RISCV_SSE_H
> +
> +#include <linux/types.h>
> +#include <linux/linkage.h>
> +
> +struct sse_event;
> +struct pt_regs;
> +
> +typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
> +
> +#ifdef CONFIG_RISCV_SSE
> +
> +struct sse_event *sse_event_register(u32 event_num, u32 priority,
> + sse_event_handler *handler, void *arg);
> +
> +void sse_event_unregister(struct sse_event *evt);
> +
> +int sse_event_set_target_cpu(struct sse_event *sse_evt, unsigned int cpu);
> +
> +int sse_event_enable(struct sse_event *sse_evt);
> +
> +void sse_event_disable(struct sse_event *sse_evt);
> +
> +#else
> +static inline struct sse_event *sse_event_register(u32 event_num, u32 priority,
> + sse_event_handler *handler,
> + void *arg)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +static inline void sse_event_unregister(struct sse_event *evt) {}
> +
> +static inline int sse_event_set_target_cpu(struct sse_event *sse_evt,
> + unsigned int cpu)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline int sse_event_enable(struct sse_event *sse_evt)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline void sse_event_disable(struct sse_event *sse_evt) {}
> +
> +
> +#endif
> +
> +#endif /* __LINUX_RISCV_SSE_H */
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2024-12-13 5:03 ` Himanshu Chauhan
@ 2024-12-13 8:33 ` Clément Léger
0 siblings, 0 replies; 22+ messages in thread
From: Clément Léger @ 2024-12-13 8:33 UTC (permalink / raw)
To: Himanshu Chauhan
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Anup Patel, Xu Lu, Atish Patra
On 13/12/2024 06:03, Himanshu Chauhan wrote:
> Hi Clement,
>
> On Fri, Dec 06, 2024 at 05:30:59PM +0100, Clément Léger wrote:
>> Add driver level interface to use RISC-V SSE arch support. This interface
>> allows registering SSE handlers, and receive them. This will be used by
>> PMU and GHES driver.
>>
>> Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
>> Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>> ---
>> MAINTAINERS | 14 +
>> drivers/firmware/Kconfig | 1 +
>> drivers/firmware/Makefile | 1 +
>> drivers/firmware/riscv/Kconfig | 15 +
>> drivers/firmware/riscv/Makefile | 3 +
>> drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++++
>> include/linux/riscv_sse.h | 56 +++
>> 7 files changed, 781 insertions(+)
>> create mode 100644 drivers/firmware/riscv/Kconfig
>> create mode 100644 drivers/firmware/riscv/Makefile
>> create mode 100644 drivers/firmware/riscv/riscv_sse.c
>> create mode 100644 include/linux/riscv_sse.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 686109008d8e..a3ddde7fe9fb 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -20125,6 +20125,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
>> F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>> F: drivers/iommu/riscv/
>>
>> +RISC-V FIRMWARE DRIVERS
>> +M: Conor Dooley <conor@kernel.org>
>> +L: linux-riscv@lists.infradead.org
>> +S: Maintained
>> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
>> +F: drivers/firmware/riscv/*
>> +
>> RISC-V MICROCHIP FPGA SUPPORT
>> M: Conor Dooley <conor.dooley@microchip.com>
>> M: Daire McNamara <daire.mcnamara@microchip.com>
>> @@ -20177,6 +20184,13 @@ F: drivers/perf/riscv_pmu.c
>> F: drivers/perf/riscv_pmu_legacy.c
>> F: drivers/perf/riscv_pmu_sbi.c
>>
>> +RISC-V SSE DRIVER
>> +M: Clément Léger <cleger@rivosinc.com>
>> +L: linux-riscv@lists.infradead.org
>> +S: Maintained
>> +F: drivers/firmware/riscv/riscv_sse.c
>> +F: include/linux/riscv_sse.h
>> +
>
> I request you to add me as a reviewer to these SSE files.
> Himanshu Chauhan <himanshu@thechauhan.dev>
Oh yes sure !
Thanks,
Clément
>
> Thanks
> Regards
> Himanshu
>
>> RISC-V THEAD SoC SUPPORT
>> M: Drew Fustini <drew@pdp7.com>
>> M: Guo Ren <guoren@kernel.org>
>> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
>> index 71d8b26c4103..9e996a1fd511 100644
>> --- a/drivers/firmware/Kconfig
>> +++ b/drivers/firmware/Kconfig
>> @@ -267,6 +267,7 @@ source "drivers/firmware/meson/Kconfig"
>> source "drivers/firmware/microchip/Kconfig"
>> source "drivers/firmware/psci/Kconfig"
>> source "drivers/firmware/qcom/Kconfig"
>> +source "drivers/firmware/riscv/Kconfig"
>> source "drivers/firmware/smccc/Kconfig"
>> source "drivers/firmware/tegra/Kconfig"
>> source "drivers/firmware/xilinx/Kconfig"
>> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
>> index 7a8d486e718f..c0f5009949a8 100644
>> --- a/drivers/firmware/Makefile
>> +++ b/drivers/firmware/Makefile
>> @@ -33,6 +33,7 @@ obj-y += efi/
>> obj-y += imx/
>> obj-y += psci/
>> obj-y += qcom/
>> +obj-y += riscv/
>> obj-y += smccc/
>> obj-y += tegra/
>> obj-y += xilinx/
>> diff --git a/drivers/firmware/riscv/Kconfig b/drivers/firmware/riscv/Kconfig
>> new file mode 100644
>> index 000000000000..8056ed3262d9
>> --- /dev/null
>> +++ b/drivers/firmware/riscv/Kconfig
>> @@ -0,0 +1,15 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +menu "Risc-V Specific firmware drivers"
>> +depends on RISCV
>> +
>> +config RISCV_SSE
>> + bool "Enable SBI Supervisor Software Events support"
>> + depends on RISCV_SBI
>> + default y
>> + help
>> + The Supervisor Software Events support allow the SBI to deliver
>> + NMI-like notifications to the supervisor mode software. When enable,
>> + this option provides support to register callbacks on specific SSE
>> + events.
>> +
>> +endmenu
>> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
>> new file mode 100644
>> index 000000000000..4ccfcbbc28ea
>> --- /dev/null
>> +++ b/drivers/firmware/riscv/Makefile
>> @@ -0,0 +1,3 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +obj-$(CONFIG_RISCV_SSE) += riscv_sse.o
>> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
>> new file mode 100644
>> index 000000000000..c165e32cc9a5
>> --- /dev/null
>> +++ b/drivers/firmware/riscv/riscv_sse.c
>> @@ -0,0 +1,691 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +
>> +#define pr_fmt(fmt) "sse: " fmt
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/cpuhotplug.h>
>> +#include <linux/cpu_pm.h>
>> +#include <linux/hardirq.h>
>> +#include <linux/list.h>
>> +#include <linux/percpu-defs.h>
>> +#include <linux/reboot.h>
>> +#include <linux/riscv_sse.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> +
>> +struct sse_event {
>> + struct list_head list;
>> + u32 evt;
>> + u32 priority;
>> + sse_event_handler *handler;
>> + void *handler_arg;
>> + bool is_enabled;
>> + /* Only valid for global events */
>> + unsigned int cpu;
>> +
>> + union {
>> + struct sse_registered_event *global;
>> + struct sse_registered_event __percpu *local;
>> + };
>> +};
>> +
>> +static int sse_hp_state;
>> +static bool sse_available;
>> +static DEFINE_SPINLOCK(events_list_lock);
>> +static LIST_HEAD(events);
>> +static DEFINE_MUTEX(sse_mutex);
>> +
>> +struct sse_registered_event {
>> + struct sse_event_arch_data arch;
>> + struct sse_event *evt;
>> + unsigned long attr_buf;
>> +};
>> +
>> +void sse_handle_event(struct sse_event_arch_data *arch_event,
>> + struct pt_regs *regs)
>> +{
>> + int ret;
>> + struct sse_registered_event *reg_evt =
>> + container_of(arch_event, struct sse_registered_event, arch);
>> + struct sse_event *evt = reg_evt->evt;
>> +
>> + ret = evt->handler(evt->evt, evt->handler_arg, regs);
>> + if (ret)
>> + pr_warn("event %x handler failed with error %d\n", evt->evt,
>> + ret);
>> +}
>> +
>> +static bool sse_event_is_global(u32 evt)
>> +{
>> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
>> +}
>> +
>> +static
>> +struct sse_event *sse_event_get(u32 evt)
>> +{
>> + struct sse_event *sse_evt = NULL, *tmp;
>> +
>> + scoped_guard(spinlock, &events_list_lock) {
>> + list_for_each_entry(tmp, &events, list) {
>> + if (tmp->evt == evt) {
>> + return sse_evt;
>> + }
>> + }
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static phys_addr_t sse_event_get_phys(struct sse_registered_event *reg_evt,
>> + void *addr)
>> +{
>> + phys_addr_t phys;
>> +
>> + if (sse_event_is_global(reg_evt->evt->evt))
>> + phys = virt_to_phys(addr);
>> + else
>> + phys = per_cpu_ptr_to_phys(addr);
>> +
>> + return phys;
>> +}
>> +
>> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
>> +{
>> + struct sbiret ret;
>> + u32 evt = event->evt;
>> +
>> + ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
>> + if (ret.error)
>> + pr_debug("Failed to execute func %lx, event %x, error %ld\n",
>> + func, evt, ret.error);
>> +
>> + return sbi_err_map_linux_errno(ret.error);
>> +}
>> +
>> +static int sse_sbi_disable_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
>> +}
>> +
>> +static int sse_sbi_enable_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
>> +}
>> +
>> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
>> + unsigned long attr_id, unsigned long *val)
>> +{
>> + struct sbiret sret;
>> + u32 evt = reg_evt->evt->evt;
>> + unsigned long phys;
>> +
>> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt,
>> + attr_id, 1, phys, 0, 0);
>> + if (sret.error) {
>> + pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
>> + attr_id, sret.error);
>> + return sbi_err_map_linux_errno(sret.error);
>> + }
>> +
>> + *val = reg_evt->attr_buf;
>> +
>> + return 0;
>> +}
>> +
>> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
>> + unsigned long attr_id, unsigned long val)
>> +{
>> + struct sbiret sret;
>> + u32 evt = reg_evt->evt->evt;
>> + unsigned long phys;
>> +
>> + reg_evt->attr_buf = val;
>> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt,
>> + attr_id, 1, phys, 0, 0);
>> + if (sret.error && sret.error != SBI_ERR_INVALID_STATE) {
>> + pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
>> + attr_id, sret.error);
>> + return sbi_err_map_linux_errno(sret.error);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
>> + unsigned int cpu)
>> +{
>> + unsigned int hart_id = cpuid_to_hartid_map(cpu);
>> + struct sse_registered_event *reg_evt = event->global;
>> + u32 evt = event->evt;
>> + bool was_enabled;
>> + int ret;
>> +
>> + if (!sse_event_is_global(evt))
>> + return -EINVAL;
>> +
>> + was_enabled = event->is_enabled;
>> + if (was_enabled)
>> + sse_sbi_disable_event(event);
>> + do {
>> + ret = sse_event_attr_set_nolock(reg_evt,
>> + SBI_SSE_ATTR_PREFERRED_HART,
>> + hart_id);
>> + } while (ret == -EINVAL);
>> +
>> + if (ret == 0)
>> + event->cpu = cpu;
>> +
>> + if (was_enabled)
>> + sse_sbi_enable_event(event);
>> +
>> + return 0;
>> +}
>> +
>> +int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
>> +{
>> + int ret;
>> +
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> +
>> + if (!cpu_online(cpu))
>> + return -EINVAL;
>> +
>> + ret = sse_event_set_target_cpu_nolock(event, cpu);
>> +
>> + cpus_read_unlock();
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int sse_event_init_registered(unsigned int cpu,
>> + struct sse_registered_event *reg_evt,
>> + struct sse_event *event)
>> +{
>> + reg_evt->evt = event;
>> + arch_sse_init_event(®_evt->arch, event->evt, cpu);
>> +
>> + return 0;
>> +}
>> +
>> +static void sse_event_free_registered(struct sse_registered_event *reg_evt)
>> +{
>> + arch_sse_free_event(®_evt->arch);
>> +}
>> +
>> +static int sse_event_alloc_global(struct sse_event *event)
>> +{
>> + int err;
>> + struct sse_registered_event *reg_evt;
>> +
>> + reg_evt = kzalloc(sizeof(*reg_evt), GFP_KERNEL);
>> + if (!reg_evt)
>> + return -ENOMEM;
>> +
>> + event->global = reg_evt;
>> + err = sse_event_init_registered(smp_processor_id(), reg_evt,
>> + event);
>> + if (err)
>> + kfree(reg_evt);
>> +
>> + return err;
>> +}
>> +
>> +static int sse_event_alloc_local(struct sse_event *event)
>> +{
>> + int err;
>> + unsigned int cpu, err_cpu;
>> + struct sse_registered_event *reg_evt;
>> + struct sse_registered_event __percpu *reg_evts;
>> +
>> + reg_evts = alloc_percpu(struct sse_registered_event);
>> + if (!reg_evts)
>> + return -ENOMEM;
>> +
>> + event->local = reg_evts;
>> +
>> + for_each_possible_cpu(cpu) {
>> + reg_evt = per_cpu_ptr(reg_evts, cpu);
>> + err = sse_event_init_registered(cpu, reg_evt, event);
>> + if (err) {
>> + err_cpu = cpu;
>> + goto err_free_per_cpu;
>> + }
>> + }
>> +
>> + return 0;
>> +
>> +err_free_per_cpu:
>> + for_each_possible_cpu(cpu) {
>> + if (cpu == err_cpu)
>> + break;
>> + reg_evt = per_cpu_ptr(reg_evts, cpu);
>> + sse_event_free_registered(reg_evt);
>> + }
>> +
>> + free_percpu(reg_evts);
>> +
>> + return err;
>> +}
>> +
>> +static struct sse_event *sse_event_alloc(u32 evt,
>> + u32 priority,
>> + sse_event_handler *handler, void *arg)
>> +{
>> + int err;
>> + struct sse_event *event;
>> +
>> + event = kzalloc(sizeof(*event), GFP_KERNEL);
>> + if (!event)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + event->evt = evt;
>> + event->priority = priority;
>> + event->handler_arg = arg;
>> + event->handler = handler;
>> +
>> + if (sse_event_is_global(evt)) {
>> + err = sse_event_alloc_global(event);
>> + if (err)
>> + goto err_alloc_reg_evt;
>> + } else {
>> + err = sse_event_alloc_local(event);
>> + if (err)
>> + goto err_alloc_reg_evt;
>> + }
>> +
>> + return event;
>> +
>> +err_alloc_reg_evt:
>> + kfree(event);
>> +
>> + return ERR_PTR(err);
>> +}
>> +
>> +static int sse_sbi_register_event(struct sse_event *event,
>> + struct sse_registered_event *reg_evt)
>> +{
>> + int ret;
>> +
>> + ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
>> + event->priority);
>> + if (ret)
>> + return ret;
>> +
>> + return arch_sse_register_event(®_evt->arch);
>> +}
>> +
>> +static int sse_event_register_local(struct sse_event *event)
>> +{
>> + int ret;
>> + struct sse_registered_event *reg_evt = per_cpu_ptr(event->local,
>> + smp_processor_id());
>> +
>> + ret = sse_sbi_register_event(event, reg_evt);
>> + if (ret)
>> + pr_debug("Failed to register event %x: err %d\n", event->evt,
>> + ret);
>> +
>> + return ret;
>> +}
>> +
>> +
>> +static int sse_sbi_unregister_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
>> +}
>> +
>> +struct sse_per_cpu_evt {
>> + struct sse_event *event;
>> + unsigned long func;
>> + atomic_t error;
>> +};
>> +
>> +static void sse_event_per_cpu_func(void *info)
>> +{
>> + int ret;
>> + struct sse_per_cpu_evt *cpu_evt = info;
>> +
>> + if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
>> + ret = sse_event_register_local(cpu_evt->event);
>> + else
>> + ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
>> +
>> + if (ret)
>> + atomic_set(&cpu_evt->error, ret);
>> +}
>> +
>> +static void sse_event_free(struct sse_event *event)
>> +{
>> + unsigned int cpu;
>> + struct sse_registered_event *reg_evt;
>> +
>> + if (sse_event_is_global(event->evt)) {
>> + sse_event_free_registered(event->global);
>> + kfree(event->global);
>> + } else {
>> + for_each_possible_cpu(cpu) {
>> + reg_evt = per_cpu_ptr(event->local, cpu);
>> + sse_event_free_registered(reg_evt);
>> + }
>> + free_percpu(event->local);
>> + }
>> +
>> + kfree(event);
>> +}
>> +
>> +int sse_event_enable(struct sse_event *event)
>> +{
>> + int ret = 0;
>> + struct sse_per_cpu_evt cpu_evt;
>> +
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> + if (sse_event_is_global(event->evt)) {
>> + ret = sse_sbi_enable_event(event);
>> + } else {
>> + cpu_evt.event = event;
>> + atomic_set(&cpu_evt.error, 0);
>> + cpu_evt.func = SBI_SSE_EVENT_ENABLE;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + ret = atomic_read(&cpu_evt.error);
>> + if (ret) {
>> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
>> + 1);
>> + }
>> + }
>> + cpus_read_unlock();
>> +
>> + if (ret == 0)
>> + event->is_enabled = true;
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static void sse_events_mask(void)
>> +{
>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_MASK, 0, 0, 0, 0, 0, 0);
>> +}
>> +
>> +static void sse_events_unmask(void)
>> +{
>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_HART_UNMASK, 0, 0, 0, 0, 0, 0);
>> +}
>> +
>> +static void sse_event_disable_nolock(struct sse_event *event)
>> +{
>> + struct sse_per_cpu_evt cpu_evt;
>> +
>> + if (sse_event_is_global(event->evt)) {
>> + sse_sbi_disable_event(event);
>> + } else {
>> + cpu_evt.event = event;
>> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + }
>> +}
>> +
>> +void sse_event_disable(struct sse_event *event)
>> +{
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> + sse_event_disable_nolock(event);
>> + event->is_enabled = false;
>> + cpus_read_unlock();
>> + }
>> +}
>> +
>> +struct sse_event *sse_event_register(u32 evt, u32 priority,
>> + sse_event_handler *handler, void *arg)
>> +{
>> + struct sse_per_cpu_evt cpu_evt;
>> + struct sse_event *event;
>> + int ret = 0;
>> +
>> + if (!sse_available)
>> + return ERR_PTR(-EOPNOTSUPP);
>> +
>> + mutex_lock(&sse_mutex);
>> + if (sse_event_get(evt)) {
>> + pr_debug("Event %x already registered\n", evt);
>> + ret = -EEXIST;
>> + goto out_unlock;
>> + }
>> +
>> + event = sse_event_alloc(evt, priority, handler, arg);
>> + if (IS_ERR(event)) {
>> + ret = PTR_ERR(event);
>> + goto out_unlock;
>> + }
>> +
>> + cpus_read_lock();
>> + if (sse_event_is_global(evt)) {
>> + unsigned long preferred_hart;
>> +
>> + ret = sse_event_attr_get_no_lock(event->global,
>> + SBI_SSE_ATTR_PREFERRED_HART,
>> + &preferred_hart);
>> + if (ret)
>> + goto err_event_free;
>> + event->cpu = riscv_hartid_to_cpuid(preferred_hart);
>> +
>> + ret = sse_sbi_register_event(event, event->global);
>> + if (ret)
>> + goto err_event_free;
>> +
>> + } else {
>> + cpu_evt.event = event;
>> + atomic_set(&cpu_evt.error, 0);
>> + cpu_evt.func = SBI_SSE_EVENT_REGISTER;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + ret = atomic_read(&cpu_evt.error);
>> + if (ret) {
>> + cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + goto err_event_free;
>> + }
>> + }
>> + cpus_read_unlock();
>> +
>> + scoped_guard(spinlock, &events_list_lock)
>> + list_add(&event->list, &events);
>> +
>> + mutex_unlock(&sse_mutex);
>> +
>> + return event;
>> +
>> +err_event_free:
>> + cpus_read_unlock();
>> + sse_event_free(event);
>> +out_unlock:
>> + mutex_unlock(&sse_mutex);
>> +
>> + return ERR_PTR(ret);
>> +}
>> +
>> +static void sse_event_unregister_nolock(struct sse_event *event)
>> +{
>> + struct sse_per_cpu_evt cpu_evt;
>> +
>> + if (sse_event_is_global(event->evt)) {
>> + sse_sbi_unregister_event(event);
>> + } else {
>> + cpu_evt.event = event;
>> + cpu_evt.func = SBI_SSE_EVENT_UNREGISTER;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + }
>> +}
>> +
>> +void sse_event_unregister(struct sse_event *event)
>> +{
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> + sse_event_unregister_nolock(event);
>> + cpus_read_unlock();
>> +
>> + scoped_guard(spinlock, &events_list_lock)
>> + list_del(&event->list);
>> +
>> + sse_event_free(event);
>> + }
>> +}
>> +
>> +static int sse_cpu_online(unsigned int cpu)
>> +{
>> + struct sse_event *sse_evt;
>> +
>> + scoped_guard(spinlock, &events_list_lock) {
>> + list_for_each_entry(sse_evt, &events, list) {
>> + if (sse_event_is_global(sse_evt->evt))
>> + continue;
>> +
>> + sse_event_register_local(sse_evt);
>> + if (sse_evt->is_enabled)
>> + sse_sbi_enable_event(sse_evt);
>> + }
>> + }
>> +
>> + /* Ready to handle events. Unmask SSE. */
>> + sse_events_unmask();
>> +
>> + return 0;
>> +}
>> +
>> +static int sse_cpu_teardown(unsigned int cpu)
>> +{
>> + unsigned int next_cpu;
>> + struct sse_event *sse_evt;
>> +
>> + /* Mask the sse events */
>> + sse_events_mask();
>> +
>> + scoped_guard(spinlock, &events_list_lock) {
>> + list_for_each_entry(sse_evt, &events, list) {
>> + if (!sse_event_is_global(sse_evt->evt)) {
>> +
>> + if (sse_evt->is_enabled)
>> + sse_sbi_disable_event(sse_evt);
>> +
>> + sse_sbi_unregister_event(sse_evt);
>> + continue;
>> + }
>> +
>> + if (sse_evt->cpu != smp_processor_id())
>> + continue;
>> +
>> + /* Update destination hart for global event */
>> + next_cpu = cpumask_any_but(cpu_online_mask, cpu);
>> + sse_event_set_target_cpu_nolock(sse_evt, next_cpu);
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void sse_reset(void)
>> +{
>> + struct sse_event *event = NULL;
>> +
>> + list_for_each_entry(event, &events, list) {
>> + sse_event_disable_nolock(event);
>> + sse_event_unregister_nolock(event);
>> + }
>> +}
>> +
>> +static int sse_pm_notifier(struct notifier_block *nb, unsigned long action,
>> + void *data)
>> +{
>> + WARN_ON_ONCE(preemptible());
>> +
>> + switch (action) {
>> + case CPU_PM_ENTER:
>> + sse_events_mask();
>> + break;
>> + case CPU_PM_EXIT:
>> + case CPU_PM_ENTER_FAILED:
>> + sse_events_unmask();
>> + break;
>> + default:
>> + return NOTIFY_DONE;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block sse_pm_nb = {
>> + .notifier_call = sse_pm_notifier,
>> +};
>> +
>> +/*
>> + * Mask all CPUs and unregister all events on panic, reboot or kexec.
>> + */
>> +static int sse_reboot_notifier(struct notifier_block *nb, unsigned long action,
>> + void *data)
>> +{
>> + cpuhp_remove_state(sse_hp_state);
>> +
>> + sse_reset();
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block sse_reboot_nb = {
>> + .notifier_call = sse_reboot_notifier,
>> +};
>> +
>> +static int __init sse_init(void)
>> +{
>> + int cpu, ret;
>> +
>> + if (sbi_probe_extension(SBI_EXT_SSE) <= 0) {
>> + pr_err("Missing SBI SSE extension\n");
>> + return -EOPNOTSUPP;
>> + }
>> + pr_info("SBI SSE extension detected\n");
>> +
>> + for_each_possible_cpu(cpu)
>> + INIT_LIST_HEAD(&events);
>> +
>> + ret = cpu_pm_register_notifier(&sse_pm_nb);
>> + if (ret) {
>> + pr_warn("Failed to register CPU PM notifier...\n");
>> + return ret;
>> + }
>> +
>> + ret = register_reboot_notifier(&sse_reboot_nb);
>> + if (ret) {
>> + pr_warn("Failed to register reboot notifier...\n");
>> + goto remove_cpupm;
>> + }
>> +
>> + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/sse:online",
>> + sse_cpu_online, sse_cpu_teardown);
>> + if (ret < 0)
>> + goto remove_reboot;
>> +
>> + sse_hp_state = ret;
>> + sse_available = true;
>> +
>> + return 0;
>> +
>> +remove_reboot:
>> + unregister_reboot_notifier(&sse_reboot_nb);
>> +
>> +remove_cpupm:
>> + cpu_pm_unregister_notifier(&sse_pm_nb);
>> +
>> + return ret;
>> +}
>> +arch_initcall(sse_init);
>> diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
>> new file mode 100644
>> index 000000000000..c73184074b8c
>> --- /dev/null
>> +++ b/include/linux/riscv_sse.h
>> @@ -0,0 +1,56 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +
>> +#ifndef __LINUX_RISCV_SSE_H
>> +#define __LINUX_RISCV_SSE_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/linkage.h>
>> +
>> +struct sse_event;
>> +struct pt_regs;
>> +
>> +typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
>> +
>> +#ifdef CONFIG_RISCV_SSE
>> +
>> +struct sse_event *sse_event_register(u32 event_num, u32 priority,
>> + sse_event_handler *handler, void *arg);
>> +
>> +void sse_event_unregister(struct sse_event *evt);
>> +
>> +int sse_event_set_target_cpu(struct sse_event *sse_evt, unsigned int cpu);
>> +
>> +int sse_event_enable(struct sse_event *sse_evt);
>> +
>> +void sse_event_disable(struct sse_event *sse_evt);
>> +
>> +#else
>> +static inline struct sse_event *sse_event_register(u32 event_num, u32 priority,
>> + sse_event_handler *handler,
>> + void *arg)
>> +{
>> + return ERR_PTR(-EOPNOTSUPP);
>> +}
>> +
>> +static inline void sse_event_unregister(struct sse_event *evt) {}
>> +
>> +static inline int sse_event_set_target_cpu(struct sse_event *sse_evt,
>> + unsigned int cpu)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static inline int sse_event_enable(struct sse_event *sse_evt)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static inline void sse_event_disable(struct sse_event *sse_evt) {}
>> +
>> +
>> +#endif
>> +
>> +#endif /* __LINUX_RISCV_SSE_H */
>> --
>> 2.45.2
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2024-12-06 16:30 ` [PATCH v3 3/4] drivers: firmware: add riscv SSE support Clément Léger
2024-12-13 5:03 ` Himanshu Chauhan
@ 2025-01-16 13:58 ` Conor Dooley
2025-01-23 10:52 ` Clément Léger
1 sibling, 1 reply; 22+ messages in thread
From: Conor Dooley @ 2025-01-16 13:58 UTC (permalink / raw)
To: Clément Léger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
[-- Attachment #1: Type: text/plain, Size: 13064 bytes --]
On Fri, Dec 06, 2024 at 05:30:59PM +0100, Clément Léger wrote:
> Add driver level interface to use RISC-V SSE arch support. This interface
> allows registering SSE handlers, and receive them. This will be used by
> PMU and GHES driver.
>
> Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
> MAINTAINERS | 14 +
> drivers/firmware/Kconfig | 1 +
> drivers/firmware/Makefile | 1 +
> drivers/firmware/riscv/Kconfig | 15 +
> drivers/firmware/riscv/Makefile | 3 +
> drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++++
> include/linux/riscv_sse.h | 56 +++
> 7 files changed, 781 insertions(+)
> create mode 100644 drivers/firmware/riscv/Kconfig
> create mode 100644 drivers/firmware/riscv/Makefile
> create mode 100644 drivers/firmware/riscv/riscv_sse.c
> create mode 100644 include/linux/riscv_sse.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 686109008d8e..a3ddde7fe9fb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20125,6 +20125,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
> F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
> F: drivers/iommu/riscv/
>
> +RISC-V FIRMWARE DRIVERS
> +M: Conor Dooley <conor@kernel.org>
> +L: linux-riscv@lists.infradead.org
> +S: Maintained
> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
> +F: drivers/firmware/riscv/*
Acked-by: Conor Dooley <conor.dooley@microchip.com>
(got some, mostly minor, comments below)
> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
> new file mode 100644
> index 000000000000..4ccfcbbc28ea
> --- /dev/null
> +++ b/drivers/firmware/riscv/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_RISCV_SSE) += riscv_sse.o
> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
> new file mode 100644
> index 000000000000..c165e32cc9a5
> --- /dev/null
> +++ b/drivers/firmware/riscv/riscv_sse.c
> @@ -0,0 +1,691 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#define pr_fmt(fmt) "sse: " fmt
> +
> +#include <linux/cpu.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/cpu_pm.h>
> +#include <linux/hardirq.h>
> +#include <linux/list.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/reboot.h>
> +#include <linux/riscv_sse.h>
> +#include <linux/slab.h>
> +
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> +
> +struct sse_event {
> + struct list_head list;
> + u32 evt;
> + u32 priority;
> + sse_event_handler *handler;
> + void *handler_arg;
> + bool is_enabled;
> + /* Only valid for global events */
> + unsigned int cpu;
> +
> + union {
> + struct sse_registered_event *global;
> + struct sse_registered_event __percpu *local;
> + };
> +};
> +
> +static int sse_hp_state;
> +static bool sse_available;
> +static DEFINE_SPINLOCK(events_list_lock);
> +static LIST_HEAD(events);
> +static DEFINE_MUTEX(sse_mutex);
> +
> +struct sse_registered_event {
> + struct sse_event_arch_data arch;
> + struct sse_event *evt;
> + unsigned long attr_buf;
> +};
> +
> +void sse_handle_event(struct sse_event_arch_data *arch_event,
> + struct pt_regs *regs)
> +{
> + int ret;
> + struct sse_registered_event *reg_evt =
> + container_of(arch_event, struct sse_registered_event, arch);
> + struct sse_event *evt = reg_evt->evt;
> +
> + ret = evt->handler(evt->evt, evt->handler_arg, regs);
Is it possible to get here with a null handler? Or will !registered
events not lead to the handler getting called?
> + if (ret)
> + pr_warn("event %x handler failed with error %d\n", evt->evt,
> + ret);
> +}
> +
> +static bool sse_event_is_global(u32 evt)
> +{
> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
> +}
> +
> +static
> +struct sse_event *sse_event_get(u32 evt)
nit: Could you shift this into one line?
> +{
> + struct sse_event *sse_evt = NULL, *tmp;
> +
> + scoped_guard(spinlock, &events_list_lock) {
> + list_for_each_entry(tmp, &events, list) {
> + if (tmp->evt == evt) {
> + return sse_evt;
> + }
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static phys_addr_t sse_event_get_phys(struct sse_registered_event *reg_evt,
> + void *addr)
> +{
> + phys_addr_t phys;
> +
> + if (sse_event_is_global(reg_evt->evt->evt))
> + phys = virt_to_phys(addr);
> + else
> + phys = per_cpu_ptr_to_phys(addr);
> +
> + return phys;
> +}
> +
> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
> +{
> + struct sbiret ret;
> + u32 evt = event->evt;
> +
> + ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
> + if (ret.error)
> + pr_debug("Failed to execute func %lx, event %x, error %ld\n",
> + func, evt, ret.error);
Why's this only at a debug level?
> +
> + return sbi_err_map_linux_errno(ret.error);
> +}
> +
> +static int sse_sbi_disable_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
> +}
> +
> +static int sse_sbi_enable_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
> +}
> +
> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
> + unsigned long attr_id, unsigned long *val)
> +{
> + struct sbiret sret;
> + u32 evt = reg_evt->evt->evt;
> + unsigned long phys;
> +
> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt,
> + attr_id, 1, phys, 0, 0);
> + if (sret.error) {
> + pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
> + attr_id, sret.error);
> + return sbi_err_map_linux_errno(sret.error);
> + }
> +
> + *val = reg_evt->attr_buf;
> +
> + return 0;
> +}
> +
> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
> + unsigned long attr_id, unsigned long val)
> +{
> + struct sbiret sret;
> + u32 evt = reg_evt->evt->evt;
> + unsigned long phys;
> +
> + reg_evt->attr_buf = val;
> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt,
> + attr_id, 1, phys, 0, 0);
> + if (sret.error && sret.error != SBI_ERR_INVALID_STATE) {
Why's the invalid state error not treated as an error?
> + pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
> + attr_id, sret.error);
> + return sbi_err_map_linux_errno(sret.error);
> + }
> +
> + return 0;
> +}
> +
> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
> + unsigned int cpu)
> +{
> + unsigned int hart_id = cpuid_to_hartid_map(cpu);
> + struct sse_registered_event *reg_evt = event->global;
> + u32 evt = event->evt;
> + bool was_enabled;
> + int ret;
> +
> + if (!sse_event_is_global(evt))
> + return -EINVAL;
> +
> + was_enabled = event->is_enabled;
> + if (was_enabled)
> + sse_sbi_disable_event(event);
> + do {
> + ret = sse_event_attr_set_nolock(reg_evt,
> + SBI_SSE_ATTR_PREFERRED_HART,
> + hart_id);
> + } while (ret == -EINVAL);
> +
> + if (ret == 0)
> + event->cpu = cpu;
> +
> + if (was_enabled)
> + sse_sbi_enable_event(event);
> +
> + return 0;
> +}
> +
> +int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
> +{
> + int ret;
> +
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> +
> + if (!cpu_online(cpu))
> + return -EINVAL;
> +
> + ret = sse_event_set_target_cpu_nolock(event, cpu);
> +
> + cpus_read_unlock();
> + }
> +
> + return ret;
> +}
> +
> +static int sse_event_init_registered(unsigned int cpu,
> + struct sse_registered_event *reg_evt,
> + struct sse_event *event)
> +{
> + reg_evt->evt = event;
> + arch_sse_init_event(®_evt->arch, event->evt, cpu);
> +
> + return 0;
> +}
> +
> +static void sse_event_free_registered(struct sse_registered_event *reg_evt)
> +{
> + arch_sse_free_event(®_evt->arch);
> +}
> +
> +static int sse_event_alloc_global(struct sse_event *event)
> +{
> + int err;
> + struct sse_registered_event *reg_evt;
> +
> + reg_evt = kzalloc(sizeof(*reg_evt), GFP_KERNEL);
> + if (!reg_evt)
> + return -ENOMEM;
> +
> + event->global = reg_evt;
> + err = sse_event_init_registered(smp_processor_id(), reg_evt,
> + event);
> + if (err)
> + kfree(reg_evt);
> +
> + return err;
> +}
> +
> +static int sse_event_alloc_local(struct sse_event *event)
> +{
> + int err;
> + unsigned int cpu, err_cpu;
> + struct sse_registered_event *reg_evt;
> + struct sse_registered_event __percpu *reg_evts;
> +
> + reg_evts = alloc_percpu(struct sse_registered_event);
> + if (!reg_evts)
> + return -ENOMEM;
> +
> + event->local = reg_evts;
> +
> + for_each_possible_cpu(cpu) {
> + reg_evt = per_cpu_ptr(reg_evts, cpu);
> + err = sse_event_init_registered(cpu, reg_evt, event);
> + if (err) {
> + err_cpu = cpu;
> + goto err_free_per_cpu;
> + }
> + }
> +
> + return 0;
> +
> +err_free_per_cpu:
> + for_each_possible_cpu(cpu) {
> + if (cpu == err_cpu)
> + break;
> + reg_evt = per_cpu_ptr(reg_evts, cpu);
> + sse_event_free_registered(reg_evt);
> + }
> +
> + free_percpu(reg_evts);
> +
> + return err;
> +}
> +
> +static struct sse_event *sse_event_alloc(u32 evt,
> + u32 priority,
> + sse_event_handler *handler, void *arg)
> +{
> + int err;
> + struct sse_event *event;
> +
> + event = kzalloc(sizeof(*event), GFP_KERNEL);
> + if (!event)
> + return ERR_PTR(-ENOMEM);
> +
> + event->evt = evt;
> + event->priority = priority;
> + event->handler_arg = arg;
> + event->handler = handler;
> +
> + if (sse_event_is_global(evt)) {
> + err = sse_event_alloc_global(event);
> + if (err)
> + goto err_alloc_reg_evt;
> + } else {
> + err = sse_event_alloc_local(event);
> + if (err)
> + goto err_alloc_reg_evt;
> + }
> +
> + return event;
> +
> +err_alloc_reg_evt:
> + kfree(event);
> +
> + return ERR_PTR(err);
> +}
> +
> +static int sse_sbi_register_event(struct sse_event *event,
> + struct sse_registered_event *reg_evt)
> +{
> + int ret;
> +
> + ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
> + event->priority);
> + if (ret)
> + return ret;
> +
> + return arch_sse_register_event(®_evt->arch);
> +}
> +
> +static int sse_event_register_local(struct sse_event *event)
> +{
> + int ret;
> + struct sse_registered_event *reg_evt = per_cpu_ptr(event->local,
> + smp_processor_id());
> +
> + ret = sse_sbi_register_event(event, reg_evt);
> + if (ret)
> + pr_debug("Failed to register event %x: err %d\n", event->evt,
> + ret);
Same here I guess, why's a registration failure only a debug print?
> +
> + return ret;
> +}
> +
> +
> +static int sse_sbi_unregister_event(struct sse_event *event)
> +{
> + return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
> +}
> +
> +struct sse_per_cpu_evt {
> + struct sse_event *event;
> + unsigned long func;
> + atomic_t error;
> +};
> +
> +static void sse_event_per_cpu_func(void *info)
> +{
> + int ret;
> + struct sse_per_cpu_evt *cpu_evt = info;
> +
> + if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
> + ret = sse_event_register_local(cpu_evt->event);
> + else
> + ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
> +
> + if (ret)
> + atomic_set(&cpu_evt->error, ret);
> +}
> +
> +static void sse_event_free(struct sse_event *event)
> +{
> + unsigned int cpu;
> + struct sse_registered_event *reg_evt;
> +
> + if (sse_event_is_global(event->evt)) {
> + sse_event_free_registered(event->global);
> + kfree(event->global);
> + } else {
> + for_each_possible_cpu(cpu) {
> + reg_evt = per_cpu_ptr(event->local, cpu);
> + sse_event_free_registered(reg_evt);
> + }
> + free_percpu(event->local);
> + }
> +
> + kfree(event);
> +}
> +
> +int sse_event_enable(struct sse_event *event)
> +{
> + int ret = 0;
> + struct sse_per_cpu_evt cpu_evt;
> +
> + scoped_guard(mutex, &sse_mutex) {
> + cpus_read_lock();
> + if (sse_event_is_global(event->evt)) {
> + ret = sse_sbi_enable_event(event);
> + } else {
> + cpu_evt.event = event;
> + atomic_set(&cpu_evt.error, 0);
> + cpu_evt.func = SBI_SSE_EVENT_ENABLE;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> + ret = atomic_read(&cpu_evt.error);
> + if (ret) {
> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
> + 1);
nit: this should fit on one line, no?
> + }
> + }
> + cpus_read_unlock();
> +
> + if (ret == 0)
> + event->is_enabled = true;
> + }
> +
> + return ret;
> +}
> 2.45.2
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
2024-12-10 4:51 ` Himanshu Chauhan
@ 2025-01-22 12:15 ` Alexandre Ghiti
2025-01-22 12:23 ` Alexandre Ghiti
2025-01-23 8:39 ` Clément Léger
2025-03-19 17:08 ` Andrew Jones
2 siblings, 2 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-01-22 12:15 UTC (permalink / raw)
To: Clément Léger, Paul Walmsley, Palmer Dabbelt,
linux-riscv, linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
Hi Clément,
On 06/12/2024 17:30, Clément Léger wrote:
> The SBI SSE extension allows the supervisor software to be notified by
> the SBI of specific events that are not maskable. The context switch is
> handled partially by the firmware which will save registers a6 and a7.
> When entering kernel we can rely on these 2 registers to setup the stack
> and save all the registers.
>
> Since SSE events can be delivered at any time to the kernel (including
> during exception handling, we need a way to locate the current_task for
> context tracking. On RISC-V, it is sotred in scratch when in user space
> or tp when in kernel space (in which case SSCRATCH is zero). But at a
> at the beginning of exception handling, SSCRATCH is used to swap tp and
> check the origin of the exception. If interrupted at that point, then,
> there is no way to reliably know were is located the current
> task_struct. Even checking the interruption location won't work as SSE
> event can be nested on top of each other so the original interruption
> site might be lost at some point. In order to retrieve it reliably,
> store the current task in an additionnal __sse_entry_task per_cpu array.
> This array is then used to retrieve the current task based on the
> hart ID that is passed to the SSE event handler in a6.
>
> That being said, the way the current task struct is stored should
> probably be reworked to find a better reliable alternative.
>
> Since each events (and each CPU for local events) have their own
> context and can preempt each other, allocate a stack (and a shadow stack
> if needed for each of them (and for each cpu for local events).
>
> When completing the event, if we were coming from kernel with interrupts
> disabled, simply return there. If coming from userspace or kernel with
> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
> CSR_IP to allow delivery of signals to user task. For instance this can
> happen, when a RAS event has been generated by a user application and a
> SIGBUS has been sent to a task.
Nit: there are some typos in the commit log and missing ')'.
>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
> arch/riscv/include/asm/asm.h | 14 ++-
> arch/riscv/include/asm/scs.h | 7 ++
> arch/riscv/include/asm/sse.h | 38 ++++++
> arch/riscv/include/asm/switch_to.h | 14 +++
> arch/riscv/include/asm/thread_info.h | 1 +
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/asm-offsets.c | 12 ++
> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
> 9 files changed, 389 insertions(+), 3 deletions(-)
> create mode 100644 arch/riscv/include/asm/sse.h
> create mode 100644 arch/riscv/kernel/sse.c
> create mode 100644 arch/riscv/kernel/sse_entry.S
>
> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> index 776354895b81..de8427c58f02 100644
> --- a/arch/riscv/include/asm/asm.h
> +++ b/arch/riscv/include/asm/asm.h
> @@ -89,16 +89,24 @@
> #define PER_CPU_OFFSET_SHIFT 3
> #endif
>
> -.macro asm_per_cpu dst sym tmp
> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
> la \dst, __per_cpu_offset
> add \dst, \dst, \tmp
> REG_L \tmp, 0(\dst)
> la \dst, \sym
> add \dst, \dst, \tmp
> .endm
> +
> +.macro asm_per_cpu dst sym tmp
> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
> +.endm
> #else /* CONFIG_SMP */
> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
> + la \dst, \sym
> +.endm
> +
> .macro asm_per_cpu dst sym tmp
> la \dst, \sym
> .endm
> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
> index 0e45db78b24b..62344daad73d 100644
> --- a/arch/riscv/include/asm/scs.h
> +++ b/arch/riscv/include/asm/scs.h
> @@ -18,6 +18,11 @@
> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
> .endm
>
> +/* Load the per-CPU IRQ shadow call stack to gp. */
> +.macro scs_load_sse_stack reg_evt
> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
> +.endm
> +
> /* Load task_scs_sp(current) to gp. */
> .macro scs_load_current
> REG_L gp, TASK_TI_SCS_SP(tp)
> @@ -41,6 +46,8 @@
> .endm
> .macro scs_load_irq_stack tmp
> .endm
> +.macro scs_load_sse_stack reg_evt
> +.endm
> .macro scs_load_current
> .endm
> .macro scs_load_current_if_task_changed prev
> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
> new file mode 100644
> index 000000000000..431a19d4cd9c
> --- /dev/null
> +++ b/arch/riscv/include/asm/sse.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +#ifndef __ASM_SSE_H
> +#define __ASM_SSE_H
> +
> +#ifdef CONFIG_RISCV_SSE
> +
> +struct sse_event_interrupted_state {
> + unsigned long a6;
> + unsigned long a7;
> +};
> +
> +struct sse_event_arch_data {
> + void *stack;
> + void *shadow_stack;
> + unsigned long tmp;
> + struct sse_event_interrupted_state interrupted;
> + unsigned long interrupted_state_phys;
> + u32 evt_id;
> +};
> +
> +struct sse_registered_event;
> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id,
> + int cpu);
> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
> +
> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
> + struct pt_regs *regs);
> +asmlinkage void handle_sse(void);
> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
> + struct pt_regs *reg);
> +
> +#endif
> +
> +#endif
> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
> index 94e33216b2d9..e166fabe04ab 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct task_struct *next)
> :: "r" (next->thread.envcfg) : "memory");
> }
>
> +#ifdef CONFIG_RISCV_SSE
> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
> +
> +static inline void __switch_sse_entry_task(struct task_struct *next)
> +{
> + __this_cpu_write(__sse_entry_task, next);
> +}
> +#else
> +static inline void __switch_sse_entry_task(struct task_struct *next)
> +{
> +}
> +#endif
> +
> extern struct task_struct *__switch_to(struct task_struct *,
> struct task_struct *);
>
> @@ -122,6 +135,7 @@ do { \
> if (switch_to_should_flush_icache(__next)) \
> local_flush_icache_all(); \
> __switch_to_envcfg(__next); \
> + __switch_sse_entry_task(__next); \
> ((last) = __switch_to(__prev, __next)); \
> } while (0)
>
> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
> index f5916a70879a..28e9805e61fc 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -36,6 +36,7 @@
> #define OVERFLOW_STACK_SIZE SZ_4K
>
> #define IRQ_STACK_SIZE THREAD_SIZE
> +#define SSE_STACK_SIZE THREAD_SIZE
>
> #ifndef __ASSEMBLY__
>
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 063d1faf5a53..1e8fb83b1162 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
> ifeq ($(CONFIG_RISCV_SBI), y)
> obj-$(CONFIG_SMP) += sbi-ipi.o
> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index e89455a6a0e5..60590a3d9519 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -14,6 +14,8 @@
> #include <asm/ptrace.h>
> #include <asm/cpu_ops_sbi.h>
> #include <asm/stacktrace.h>
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> #include <asm/suspend.h>
>
> void asm_offsets(void);
> @@ -511,4 +513,14 @@ void asm_offsets(void)
> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
> #endif
> +
> +#ifdef CONFIG_RISCV_SSE
> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
> +
> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
> + DEFINE(NR_CPUS, NR_CPUS);
> +#endif
> }
> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
> new file mode 100644
> index 000000000000..b48ae69dad8d
> --- /dev/null
> +++ b/arch/riscv/kernel/sse.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +#include <linux/nmi.h>
> +#include <linux/scs.h>
> +#include <linux/bitfield.h>
> +#include <linux/riscv_sse.h>
> +#include <linux/percpu-defs.h>
> +
> +#include <asm/asm-prototypes.h>
> +#include <asm/switch_to.h>
> +#include <asm/irq_stack.h>
> +#include <asm/sbi.h>
> +#include <asm/sse.h>
> +
> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
> +
> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> +}
> +
> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
> +{
> + nmi_enter();
> +
> + /* Retrieve missing GPRs from SBI */
> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
> + SBI_SSE_ATTR_INTERRUPTED_A6,
> + (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
> + arch_evt->interrupted_state_phys, 0, 0);
> +
> + memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt->interrupted));
> +
> + sse_handle_event(arch_evt, regs);
> +
> + /*
> + * The SSE delivery path does not uses the "standard" exception path and
> + * thus does not process any pending signal/softirqs. Some drivers might
> + * enqueue pending work that needs to be handled as soon as possible.
> + * For that purpose, set the software interrupt pending bit which will
> + * be serviced once interrupts are reenabled
> + */
> + csr_set(CSR_IP, IE_SIE);
This looks a bit hackish and under performant to trigger an IRQ at each
SSE event, why is it necessary? I understand that we may want to service
signals right away, for example in case of a uncorrectable memory error
in order to send a SIGBUS to the process before it goes on, but why
should we care about softirqs here?
> +
> + nmi_exit();
> +}
> +
> +#ifdef CONFIG_VMAP_STACK
> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
> +{
> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
> +}
> +
> +static void sse_stack_free(unsigned long *stack)
> +{
> + vfree(stack);
> +}
> +#else /* CONFIG_VMAP_STACK */
> +
> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int size)
> +{
> + return kmalloc(size, GFP_KERNEL);
> +}
> +
> +static void sse_stack_free(unsigned long *stack)
> +{
> + kfree(stack);
> +}
> +
> +#endif /* CONFIG_VMAP_STACK */
Can't we use kvmalloc() here to avoid the #ifdef? Or is there a real
benefit of using vmalloced stacks?
> +
> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
> +{
> + void *stack;
> +
> + if (!scs_is_enabled())
> + return 0;
> +
> + stack = scs_alloc(cpu_to_node(cpu));
> + if (!stack)
> + return 1;
Nit: return -ENOMEM
> +
> + arch_evt->shadow_stack = stack;
> +
> + return 0;
> +}
> +
> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
> +{
> + void *stack;
> +
> + arch_evt->evt_id = evt_id;
> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
> + if (!stack)
> + return -ENOMEM;
> +
> + arch_evt->stack = stack + SSE_STACK_SIZE;
> +
> + if (sse_init_scs(cpu, arch_evt))
> + goto free_stack;
> +
> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> + arch_evt->interrupted_state_phys =
> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> + } else {
> + arch_evt->interrupted_state_phys =
> + virt_to_phys(&arch_evt->interrupted);
> + }
> +
> + return 0;
> +
> +free_stack:
> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
> +
> + return -ENOMEM;
> +}
> +
> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
> +{
> + scs_free(arch_evt->shadow_stack);
> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
> +}
> +
> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
> +{
> + struct sbiret sret;
> +
> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt->evt_id,
> + (unsigned long) handle_sse, (unsigned long) arch_evt,
> + 0, 0, 0);
> +
> + return sbi_err_map_linux_errno(sret.error);
> +}
> diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/sse_entry.S
> new file mode 100644
> index 000000000000..0b2f890edd89
> --- /dev/null
> +++ b/arch/riscv/kernel/sse_entry.S
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024 Rivos Inc.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/linkage.h>
> +
> +#include <asm/asm.h>
> +#include <asm/csr.h>
> +#include <asm/scs.h>
> +
> +/* When entering handle_sse, the following registers are set:
> + * a6: contains the hartid
> + * a6: contains struct sse_registered_event pointer
> + */
> +SYM_CODE_START(handle_sse)
> + /* Save stack temporarily */
> + REG_S sp, SSE_REG_EVT_TMP(a7)
> + /* Set entry stack */
> + REG_L sp, SSE_REG_EVT_STACK(a7)
> +
> + addi sp, sp, -(PT_SIZE_ON_STACK)
> + REG_S ra, PT_RA(sp)
> + REG_S s0, PT_S0(sp)
> + REG_S s1, PT_S1(sp)
> + REG_S s2, PT_S2(sp)
> + REG_S s3, PT_S3(sp)
> + REG_S s4, PT_S4(sp)
> + REG_S s5, PT_S5(sp)
> + REG_S s6, PT_S6(sp)
> + REG_S s7, PT_S7(sp)
> + REG_S s8, PT_S8(sp)
> + REG_S s9, PT_S9(sp)
> + REG_S s10, PT_S10(sp)
> + REG_S s11, PT_S11(sp)
> + REG_S tp, PT_TP(sp)
> + REG_S t0, PT_T0(sp)
> + REG_S t1, PT_T1(sp)
> + REG_S t2, PT_T2(sp)
> + REG_S t3, PT_T3(sp)
> + REG_S t4, PT_T4(sp)
> + REG_S t5, PT_T5(sp)
> + REG_S t6, PT_T6(sp)
> + REG_S gp, PT_GP(sp)
> + REG_S a0, PT_A0(sp)
> + REG_S a1, PT_A1(sp)
> + REG_S a2, PT_A2(sp)
> + REG_S a3, PT_A3(sp)
> + REG_S a4, PT_A4(sp)
> + REG_S a5, PT_A5(sp)
> +
> + /* Retrieve entry sp */
> + REG_L a4, SSE_REG_EVT_TMP(a7)
> + /* Save CSRs */
> + csrr a0, CSR_EPC
> + csrr a1, CSR_SSTATUS
> + csrr a2, CSR_STVAL
> + csrr a3, CSR_SCAUSE
> +
> + REG_S a0, PT_EPC(sp)
> + REG_S a1, PT_STATUS(sp)
> + REG_S a2, PT_BADADDR(sp)
> + REG_S a3, PT_CAUSE(sp)
> + REG_S a4, PT_SP(sp)
> +
> + /* Disable user memory access and floating/vector computing */
> + li t0, SR_SUM | SR_FS_VS
> + csrc CSR_STATUS, t0
> +
> + load_global_pointer
> + scs_load_sse_stack a7
> +
> + /* Restore current task struct from __sse_entry_task */
> + li t1, NR_CPUS
> + move t3, zero
> +
> +#ifdef CONFIG_SMP
> + /* Find the CPU id associated to the hart id */
> + la t0, __cpuid_to_hartid_map
> +.Lhart_id_loop:
> + REG_L t2, 0(t0)
> + beq t2, a6, .Lcpu_id_found
> +
> + /* Increment pointer and CPU number */
> + addi t3, t3, 1
> + addi t0, t0, RISCV_SZPTR
> + bltu t3, t1, .Lhart_id_loop
> +
> + /*
> + * This should never happen since we expect the hart_id to match one
> + * of our CPU, but better be safe than sorry
> + */
> + la tp, init_task
> + la a0, sse_hart_id_panic_string
> + la t0, panic
> + jalr t0
> +
> +.Lcpu_id_found:
> +#endif
> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
> + REG_L tp, 0(t2)
> +
> + move a1, sp /* pt_regs on stack */
> + /* Kernel was interrupted, create stack frame */
> + beqz s1, .Lcall_do_sse
I don't understand this since in any case we will go to .Lcall_do_sse
right? And I don't see where s1 is initialized.
> +
> +.Lcall_do_sse:
> + /*
> + * Save sscratch for restoration since we might have interrupted the
> + * kernel in early exception path and thus, we don't know the content of
> + * sscratch.
> + */
> + csrr s4, CSR_SSCRATCH
> + /* In-kernel scratch is 0 */
> + csrw CSR_SCRATCH, x0
> +
> + move a0, a7
> +
> + call do_sse
> +
> + csrw CSR_SSCRATCH, s4
> +
> + REG_L a0, PT_EPC(sp)
> + REG_L a1, PT_STATUS(sp)
> + REG_L a2, PT_BADADDR(sp)
> + REG_L a3, PT_CAUSE(sp)
> + csrw CSR_EPC, a0
> + csrw CSR_SSTATUS, a1
> + csrw CSR_STVAL, a2
> + csrw CSR_SCAUSE, a3
> +
> + REG_L ra, PT_RA(sp)
> + REG_L s0, PT_S0(sp)
> + REG_L s1, PT_S1(sp)
> + REG_L s2, PT_S2(sp)
> + REG_L s3, PT_S3(sp)
> + REG_L s4, PT_S4(sp)
> + REG_L s5, PT_S5(sp)
> + REG_L s6, PT_S6(sp)
> + REG_L s7, PT_S7(sp)
> + REG_L s8, PT_S8(sp)
> + REG_L s9, PT_S9(sp)
> + REG_L s10, PT_S10(sp)
> + REG_L s11, PT_S11(sp)
> + REG_L tp, PT_TP(sp)
> + REG_L t0, PT_T0(sp)
> + REG_L t1, PT_T1(sp)
> + REG_L t2, PT_T2(sp)
> + REG_L t3, PT_T3(sp)
> + REG_L t4, PT_T4(sp)
> + REG_L t5, PT_T5(sp)
> + REG_L t6, PT_T6(sp)
> + REG_L gp, PT_GP(sp)
> + REG_L a0, PT_A0(sp)
> + REG_L a1, PT_A1(sp)
> + REG_L a2, PT_A2(sp)
> + REG_L a3, PT_A3(sp)
> + REG_L a4, PT_A4(sp)
> + REG_L a5, PT_A5(sp)
> +
> + REG_L sp, PT_SP(sp)
> +
> + li a7, SBI_EXT_SSE
> + li a6, SBI_SSE_EVENT_COMPLETE
> + ecall
> +
> +SYM_CODE_END(handle_sse)
> +
> +sse_hart_id_panic_string:
> + .ascii "Unable to match hart_id with cpu\0"
Thanks,
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-22 12:15 ` Alexandre Ghiti
@ 2025-01-22 12:23 ` Alexandre Ghiti
2025-01-23 8:41 ` Clément Léger
2025-01-23 8:39 ` Clément Léger
1 sibling, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-01-22 12:23 UTC (permalink / raw)
To: Clément Léger, Paul Walmsley, Palmer Dabbelt,
linux-riscv, linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
BTW, shouldn't we "detect" the SSE extension like we do for other SBI
extensions (I don't know if we do that for all of them though)? Not that
it seems needed but maybe as a way to visualize that SBI supports it?
Thanks,
Alex
On 22/01/2025 13:15, Alexandre Ghiti wrote:
> Hi Clément,
>
> On 06/12/2024 17:30, Clément Léger wrote:
>> The SBI SSE extension allows the supervisor software to be notified by
>> the SBI of specific events that are not maskable. The context switch is
>> handled partially by the firmware which will save registers a6 and a7.
>> When entering kernel we can rely on these 2 registers to setup the stack
>> and save all the registers.
>>
>> Since SSE events can be delivered at any time to the kernel (including
>> during exception handling, we need a way to locate the current_task for
>> context tracking. On RISC-V, it is sotred in scratch when in user space
>> or tp when in kernel space (in which case SSCRATCH is zero). But at a
>> at the beginning of exception handling, SSCRATCH is used to swap tp and
>> check the origin of the exception. If interrupted at that point, then,
>> there is no way to reliably know were is located the current
>> task_struct. Even checking the interruption location won't work as SSE
>> event can be nested on top of each other so the original interruption
>> site might be lost at some point. In order to retrieve it reliably,
>> store the current task in an additionnal __sse_entry_task per_cpu array.
>> This array is then used to retrieve the current task based on the
>> hart ID that is passed to the SSE event handler in a6.
>>
>> That being said, the way the current task struct is stored should
>> probably be reworked to find a better reliable alternative.
>>
>> Since each events (and each CPU for local events) have their own
>> context and can preempt each other, allocate a stack (and a shadow stack
>> if needed for each of them (and for each cpu for local events).
>>
>> When completing the event, if we were coming from kernel with interrupts
>> disabled, simply return there. If coming from userspace or kernel with
>> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
>> CSR_IP to allow delivery of signals to user task. For instance this can
>> happen, when a RAS event has been generated by a user application and a
>> SIGBUS has been sent to a task.
>
>
> Nit: there are some typos in the commit log and missing ')'.
>
>
>>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>> ---
>> arch/riscv/include/asm/asm.h | 14 ++-
>> arch/riscv/include/asm/scs.h | 7 ++
>> arch/riscv/include/asm/sse.h | 38 ++++++
>> arch/riscv/include/asm/switch_to.h | 14 +++
>> arch/riscv/include/asm/thread_info.h | 1 +
>> arch/riscv/kernel/Makefile | 1 +
>> arch/riscv/kernel/asm-offsets.c | 12 ++
>> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
>> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
>> 9 files changed, 389 insertions(+), 3 deletions(-)
>> create mode 100644 arch/riscv/include/asm/sse.h
>> create mode 100644 arch/riscv/kernel/sse.c
>> create mode 100644 arch/riscv/kernel/sse_entry.S
>>
>> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
>> index 776354895b81..de8427c58f02 100644
>> --- a/arch/riscv/include/asm/asm.h
>> +++ b/arch/riscv/include/asm/asm.h
>> @@ -89,16 +89,24 @@
>> #define PER_CPU_OFFSET_SHIFT 3
>> #endif
>> -.macro asm_per_cpu dst sym tmp
>> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
>> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
>> la \dst, __per_cpu_offset
>> add \dst, \dst, \tmp
>> REG_L \tmp, 0(\dst)
>> la \dst, \sym
>> add \dst, \dst, \tmp
>> .endm
>> +
>> +.macro asm_per_cpu dst sym tmp
>> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
>> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
>> +.endm
>> #else /* CONFIG_SMP */
>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>> + la \dst, \sym
>> +.endm
>> +
>> .macro asm_per_cpu dst sym tmp
>> la \dst, \sym
>> .endm
>> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
>> index 0e45db78b24b..62344daad73d 100644
>> --- a/arch/riscv/include/asm/scs.h
>> +++ b/arch/riscv/include/asm/scs.h
>> @@ -18,6 +18,11 @@
>> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
>> .endm
>> +/* Load the per-CPU IRQ shadow call stack to gp. */
>> +.macro scs_load_sse_stack reg_evt
>> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
>> +.endm
>> +
>> /* Load task_scs_sp(current) to gp. */
>> .macro scs_load_current
>> REG_L gp, TASK_TI_SCS_SP(tp)
>> @@ -41,6 +46,8 @@
>> .endm
>> .macro scs_load_irq_stack tmp
>> .endm
>> +.macro scs_load_sse_stack reg_evt
>> +.endm
>> .macro scs_load_current
>> .endm
>> .macro scs_load_current_if_task_changed prev
>> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
>> new file mode 100644
>> index 000000000000..431a19d4cd9c
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/sse.h
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +#ifndef __ASM_SSE_H
>> +#define __ASM_SSE_H
>> +
>> +#ifdef CONFIG_RISCV_SSE
>> +
>> +struct sse_event_interrupted_state {
>> + unsigned long a6;
>> + unsigned long a7;
>> +};
>> +
>> +struct sse_event_arch_data {
>> + void *stack;
>> + void *shadow_stack;
>> + unsigned long tmp;
>> + struct sse_event_interrupted_state interrupted;
>> + unsigned long interrupted_state_phys;
>> + u32 evt_id;
>> +};
>> +
>> +struct sse_registered_event;
>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>> evt_id,
>> + int cpu);
>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
>> +
>> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
>> + struct pt_regs *regs);
>> +asmlinkage void handle_sse(void);
>> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
>> + struct pt_regs *reg);
>> +
>> +#endif
>> +
>> +#endif
>> diff --git a/arch/riscv/include/asm/switch_to.h
>> b/arch/riscv/include/asm/switch_to.h
>> index 94e33216b2d9..e166fabe04ab 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct
>> task_struct *next)
>> :: "r" (next->thread.envcfg) : "memory");
>> }
>> +#ifdef CONFIG_RISCV_SSE
>> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
>> +
>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>> +{
>> + __this_cpu_write(__sse_entry_task, next);
>> +}
>> +#else
>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>> +{
>> +}
>> +#endif
>> +
>> extern struct task_struct *__switch_to(struct task_struct *,
>> struct task_struct *);
>> @@ -122,6 +135,7 @@ do { \
>> if (switch_to_should_flush_icache(__next)) \
>> local_flush_icache_all(); \
>> __switch_to_envcfg(__next); \
>> + __switch_sse_entry_task(__next); \
>> ((last) = __switch_to(__prev, __next)); \
>> } while (0)
>> diff --git a/arch/riscv/include/asm/thread_info.h
>> b/arch/riscv/include/asm/thread_info.h
>> index f5916a70879a..28e9805e61fc 100644
>> --- a/arch/riscv/include/asm/thread_info.h
>> +++ b/arch/riscv/include/asm/thread_info.h
>> @@ -36,6 +36,7 @@
>> #define OVERFLOW_STACK_SIZE SZ_4K
>> #define IRQ_STACK_SIZE THREAD_SIZE
>> +#define SSE_STACK_SIZE THREAD_SIZE
>> #ifndef __ASSEMBLY__
>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>> index 063d1faf5a53..1e8fb83b1162 100644
>> --- a/arch/riscv/kernel/Makefile
>> +++ b/arch/riscv/kernel/Makefile
>> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
>> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
>> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
>> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
>> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
>> ifeq ($(CONFIG_RISCV_SBI), y)
>> obj-$(CONFIG_SMP) += sbi-ipi.o
>> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
>> diff --git a/arch/riscv/kernel/asm-offsets.c
>> b/arch/riscv/kernel/asm-offsets.c
>> index e89455a6a0e5..60590a3d9519 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -14,6 +14,8 @@
>> #include <asm/ptrace.h>
>> #include <asm/cpu_ops_sbi.h>
>> #include <asm/stacktrace.h>
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> #include <asm/suspend.h>
>> void asm_offsets(void);
>> @@ -511,4 +513,14 @@ void asm_offsets(void)
>> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
>> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
>> #endif
>> +
>> +#ifdef CONFIG_RISCV_SSE
>> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
>> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data,
>> shadow_stack);
>> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
>> +
>> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
>> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
>> + DEFINE(NR_CPUS, NR_CPUS);
>> +#endif
>> }
>> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
>> new file mode 100644
>> index 000000000000..b48ae69dad8d
>> --- /dev/null
>> +++ b/arch/riscv/kernel/sse.c
>> @@ -0,0 +1,134 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +#include <linux/nmi.h>
>> +#include <linux/scs.h>
>> +#include <linux/bitfield.h>
>> +#include <linux/riscv_sse.h>
>> +#include <linux/percpu-defs.h>
>> +
>> +#include <asm/asm-prototypes.h>
>> +#include <asm/switch_to.h>
>> +#include <asm/irq_stack.h>
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> +
>> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
>> +
>> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt,
>> struct pt_regs *regs)
>> +{
>> +}
>> +
>> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
>> +{
>> + nmi_enter();
>> +
>> + /* Retrieve missing GPRs from SBI */
>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
>> + SBI_SSE_ATTR_INTERRUPTED_A6,
>> + (SBI_SSE_ATTR_INTERRUPTED_A7 -
>> SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
>> + arch_evt->interrupted_state_phys, 0, 0);
>> +
>> + memcpy(®s->a6, &arch_evt->interrupted,
>> sizeof(arch_evt->interrupted));
>> +
>> + sse_handle_event(arch_evt, regs);
>> +
>> + /*
>> + * The SSE delivery path does not uses the "standard" exception
>> path and
>> + * thus does not process any pending signal/softirqs. Some
>> drivers might
>> + * enqueue pending work that needs to be handled as soon as
>> possible.
>> + * For that purpose, set the software interrupt pending bit
>> which will
>> + * be serviced once interrupts are reenabled
>> + */
>> + csr_set(CSR_IP, IE_SIE);
>
>
> This looks a bit hackish and under performant to trigger an IRQ at
> each SSE event, why is it necessary? I understand that we may want to
> service signals right away, for example in case of a uncorrectable
> memory error in order to send a SIGBUS to the process before it goes
> on, but why should we care about softirqs here?
>
>
>> +
>> + nmi_exit();
>> +}
>> +
>> +#ifdef CONFIG_VMAP_STACK
>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>> size)
>> +{
>> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
>> +}
>> +
>> +static void sse_stack_free(unsigned long *stack)
>> +{
>> + vfree(stack);
>> +}
>> +#else /* CONFIG_VMAP_STACK */
>> +
>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>> size)
>> +{
>> + return kmalloc(size, GFP_KERNEL);
>> +}
>> +
>> +static void sse_stack_free(unsigned long *stack)
>> +{
>> + kfree(stack);
>> +}
>> +
>> +#endif /* CONFIG_VMAP_STACK */
>
>
> Can't we use kvmalloc() here to avoid the #ifdef? Or is there a real
> benefit of using vmalloced stacks?
>
>
>> +
>> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
>> +{
>> + void *stack;
>> +
>> + if (!scs_is_enabled())
>> + return 0;
>> +
>> + stack = scs_alloc(cpu_to_node(cpu));
>> + if (!stack)
>> + return 1;
>
>
> Nit: return -ENOMEM
>
>
>> +
>> + arch_evt->shadow_stack = stack;
>> +
>> + return 0;
>> +}
>> +
>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>> evt_id, int cpu)
>> +{
>> + void *stack;
>> +
>> + arch_evt->evt_id = evt_id;
>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>> + if (!stack)
>> + return -ENOMEM;
>> +
>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>> +
>> + if (sse_init_scs(cpu, arch_evt))
>> + goto free_stack;
>> +
>> + if (is_kernel_percpu_address((unsigned
>> long)&arch_evt->interrupted)) {
>> + arch_evt->interrupted_state_phys =
>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>> + } else {
>> + arch_evt->interrupted_state_phys =
>> + virt_to_phys(&arch_evt->interrupted);
>> + }
>> +
>> + return 0;
>> +
>> +free_stack:
>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>> +
>> + return -ENOMEM;
>> +}
>> +
>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
>> +{
>> + scs_free(arch_evt->shadow_stack);
>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>> +}
>> +
>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
>> +{
>> + struct sbiret sret;
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER,
>> arch_evt->evt_id,
>> + (unsigned long) handle_sse, (unsigned long) arch_evt,
>> + 0, 0, 0);
>> +
>> + return sbi_err_map_linux_errno(sret.error);
>> +}
>> diff --git a/arch/riscv/kernel/sse_entry.S
>> b/arch/riscv/kernel/sse_entry.S
>> new file mode 100644
>> index 000000000000..0b2f890edd89
>> --- /dev/null
>> +++ b/arch/riscv/kernel/sse_entry.S
>> @@ -0,0 +1,171 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/linkage.h>
>> +
>> +#include <asm/asm.h>
>> +#include <asm/csr.h>
>> +#include <asm/scs.h>
>> +
>> +/* When entering handle_sse, the following registers are set:
>> + * a6: contains the hartid
>> + * a6: contains struct sse_registered_event pointer
>> + */
>> +SYM_CODE_START(handle_sse)
>> + /* Save stack temporarily */
>> + REG_S sp, SSE_REG_EVT_TMP(a7)
>> + /* Set entry stack */
>> + REG_L sp, SSE_REG_EVT_STACK(a7)
>> +
>> + addi sp, sp, -(PT_SIZE_ON_STACK)
>> + REG_S ra, PT_RA(sp)
>> + REG_S s0, PT_S0(sp)
>> + REG_S s1, PT_S1(sp)
>> + REG_S s2, PT_S2(sp)
>> + REG_S s3, PT_S3(sp)
>> + REG_S s4, PT_S4(sp)
>> + REG_S s5, PT_S5(sp)
>> + REG_S s6, PT_S6(sp)
>> + REG_S s7, PT_S7(sp)
>> + REG_S s8, PT_S8(sp)
>> + REG_S s9, PT_S9(sp)
>> + REG_S s10, PT_S10(sp)
>> + REG_S s11, PT_S11(sp)
>> + REG_S tp, PT_TP(sp)
>> + REG_S t0, PT_T0(sp)
>> + REG_S t1, PT_T1(sp)
>> + REG_S t2, PT_T2(sp)
>> + REG_S t3, PT_T3(sp)
>> + REG_S t4, PT_T4(sp)
>> + REG_S t5, PT_T5(sp)
>> + REG_S t6, PT_T6(sp)
>> + REG_S gp, PT_GP(sp)
>> + REG_S a0, PT_A0(sp)
>> + REG_S a1, PT_A1(sp)
>> + REG_S a2, PT_A2(sp)
>> + REG_S a3, PT_A3(sp)
>> + REG_S a4, PT_A4(sp)
>> + REG_S a5, PT_A5(sp)
>> +
>> + /* Retrieve entry sp */
>> + REG_L a4, SSE_REG_EVT_TMP(a7)
>> + /* Save CSRs */
>> + csrr a0, CSR_EPC
>> + csrr a1, CSR_SSTATUS
>> + csrr a2, CSR_STVAL
>> + csrr a3, CSR_SCAUSE
>> +
>> + REG_S a0, PT_EPC(sp)
>> + REG_S a1, PT_STATUS(sp)
>> + REG_S a2, PT_BADADDR(sp)
>> + REG_S a3, PT_CAUSE(sp)
>> + REG_S a4, PT_SP(sp)
>> +
>> + /* Disable user memory access and floating/vector computing */
>> + li t0, SR_SUM | SR_FS_VS
>> + csrc CSR_STATUS, t0
>> +
>> + load_global_pointer
>> + scs_load_sse_stack a7
>> +
>> + /* Restore current task struct from __sse_entry_task */
>> + li t1, NR_CPUS
>> + move t3, zero
>> +
>> +#ifdef CONFIG_SMP
>> + /* Find the CPU id associated to the hart id */
>> + la t0, __cpuid_to_hartid_map
>> +.Lhart_id_loop:
>> + REG_L t2, 0(t0)
>> + beq t2, a6, .Lcpu_id_found
>> +
>> + /* Increment pointer and CPU number */
>> + addi t3, t3, 1
>> + addi t0, t0, RISCV_SZPTR
>> + bltu t3, t1, .Lhart_id_loop
>> +
>> + /*
>> + * This should never happen since we expect the hart_id to match
>> one
>> + * of our CPU, but better be safe than sorry
>> + */
>> + la tp, init_task
>> + la a0, sse_hart_id_panic_string
>> + la t0, panic
>> + jalr t0
>> +
>> +.Lcpu_id_found:
>> +#endif
>> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
>> + REG_L tp, 0(t2)
>> +
>> + move a1, sp /* pt_regs on stack */
>> + /* Kernel was interrupted, create stack frame */
>> + beqz s1, .Lcall_do_sse
>
>
> I don't understand this since in any case we will go to .Lcall_do_sse
> right? And I don't see where s1 is initialized.
>
>
>> +
>> +.Lcall_do_sse:
>> + /*
>> + * Save sscratch for restoration since we might have interrupted
>> the
>> + * kernel in early exception path and thus, we don't know the
>> content of
>> + * sscratch.
>> + */
>> + csrr s4, CSR_SSCRATCH
>> + /* In-kernel scratch is 0 */
>> + csrw CSR_SCRATCH, x0
>> +
>> + move a0, a7
>> +
>> + call do_sse
>> +
>> + csrw CSR_SSCRATCH, s4
>> +
>> + REG_L a0, PT_EPC(sp)
>> + REG_L a1, PT_STATUS(sp)
>> + REG_L a2, PT_BADADDR(sp)
>> + REG_L a3, PT_CAUSE(sp)
>> + csrw CSR_EPC, a0
>> + csrw CSR_SSTATUS, a1
>> + csrw CSR_STVAL, a2
>> + csrw CSR_SCAUSE, a3
>> +
>> + REG_L ra, PT_RA(sp)
>> + REG_L s0, PT_S0(sp)
>> + REG_L s1, PT_S1(sp)
>> + REG_L s2, PT_S2(sp)
>> + REG_L s3, PT_S3(sp)
>> + REG_L s4, PT_S4(sp)
>> + REG_L s5, PT_S5(sp)
>> + REG_L s6, PT_S6(sp)
>> + REG_L s7, PT_S7(sp)
>> + REG_L s8, PT_S8(sp)
>> + REG_L s9, PT_S9(sp)
>> + REG_L s10, PT_S10(sp)
>> + REG_L s11, PT_S11(sp)
>> + REG_L tp, PT_TP(sp)
>> + REG_L t0, PT_T0(sp)
>> + REG_L t1, PT_T1(sp)
>> + REG_L t2, PT_T2(sp)
>> + REG_L t3, PT_T3(sp)
>> + REG_L t4, PT_T4(sp)
>> + REG_L t5, PT_T5(sp)
>> + REG_L t6, PT_T6(sp)
>> + REG_L gp, PT_GP(sp)
>> + REG_L a0, PT_A0(sp)
>> + REG_L a1, PT_A1(sp)
>> + REG_L a2, PT_A2(sp)
>> + REG_L a3, PT_A3(sp)
>> + REG_L a4, PT_A4(sp)
>> + REG_L a5, PT_A5(sp)
>> +
>> + REG_L sp, PT_SP(sp)
>> +
>> + li a7, SBI_EXT_SSE
>> + li a6, SBI_SSE_EVENT_COMPLETE
>> + ecall
>> +
>> +SYM_CODE_END(handle_sse)
>> +
>> +sse_hart_id_panic_string:
>> + .ascii "Unable to match hart_id with cpu\0"
>
>
> Thanks,
>
> Alex
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-22 12:15 ` Alexandre Ghiti
2025-01-22 12:23 ` Alexandre Ghiti
@ 2025-01-23 8:39 ` Clément Léger
2025-01-27 8:09 ` Alexandre Ghiti
1 sibling, 1 reply; 22+ messages in thread
From: Clément Léger @ 2025-01-23 8:39 UTC (permalink / raw)
To: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, linux-riscv,
linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
On 22/01/2025 13:15, Alexandre Ghiti wrote:
> Hi Clément,
>
> On 06/12/2024 17:30, Clément Léger wrote:
>> The SBI SSE extension allows the supervisor software to be notified by
>> the SBI of specific events that are not maskable. The context switch is
>> handled partially by the firmware which will save registers a6 and a7.
>> When entering kernel we can rely on these 2 registers to setup the stack
>> and save all the registers.
>>
>> Since SSE events can be delivered at any time to the kernel (including
>> during exception handling, we need a way to locate the current_task for
>> context tracking. On RISC-V, it is sotred in scratch when in user space
>> or tp when in kernel space (in which case SSCRATCH is zero). But at a
>> at the beginning of exception handling, SSCRATCH is used to swap tp and
>> check the origin of the exception. If interrupted at that point, then,
>> there is no way to reliably know were is located the current
>> task_struct. Even checking the interruption location won't work as SSE
>> event can be nested on top of each other so the original interruption
>> site might be lost at some point. In order to retrieve it reliably,
>> store the current task in an additionnal __sse_entry_task per_cpu array.
>> This array is then used to retrieve the current task based on the
>> hart ID that is passed to the SSE event handler in a6.
>>
>> That being said, the way the current task struct is stored should
>> probably be reworked to find a better reliable alternative.
>>
>> Since each events (and each CPU for local events) have their own
>> context and can preempt each other, allocate a stack (and a shadow stack
>> if needed for each of them (and for each cpu for local events).
>>
>> When completing the event, if we were coming from kernel with interrupts
>> disabled, simply return there. If coming from userspace or kernel with
>> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
>> CSR_IP to allow delivery of signals to user task. For instance this can
>> happen, when a RAS event has been generated by a user application and a
>> SIGBUS has been sent to a task.
>
>
> Nit: there are some typos in the commit log and missing ')'.
Acked, I'll spellcheck that.
>
>
>>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>> ---
>> arch/riscv/include/asm/asm.h | 14 ++-
>> arch/riscv/include/asm/scs.h | 7 ++
>> arch/riscv/include/asm/sse.h | 38 ++++++
>> arch/riscv/include/asm/switch_to.h | 14 +++
>> arch/riscv/include/asm/thread_info.h | 1 +
>> arch/riscv/kernel/Makefile | 1 +
>> arch/riscv/kernel/asm-offsets.c | 12 ++
>> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
>> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
>> 9 files changed, 389 insertions(+), 3 deletions(-)
>> create mode 100644 arch/riscv/include/asm/sse.h
>> create mode 100644 arch/riscv/kernel/sse.c
>> create mode 100644 arch/riscv/kernel/sse_entry.S
>>
>> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
>> index 776354895b81..de8427c58f02 100644
>> --- a/arch/riscv/include/asm/asm.h
>> +++ b/arch/riscv/include/asm/asm.h
>> @@ -89,16 +89,24 @@
>> #define PER_CPU_OFFSET_SHIFT 3
>> #endif
>> -.macro asm_per_cpu dst sym tmp
>> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
>> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
>> la \dst, __per_cpu_offset
>> add \dst, \dst, \tmp
>> REG_L \tmp, 0(\dst)
>> la \dst, \sym
>> add \dst, \dst, \tmp
>> .endm
>> +
>> +.macro asm_per_cpu dst sym tmp
>> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
>> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
>> +.endm
>> #else /* CONFIG_SMP */
>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>> + la \dst, \sym
>> +.endm
>> +
>> .macro asm_per_cpu dst sym tmp
>> la \dst, \sym
>> .endm
>> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
>> index 0e45db78b24b..62344daad73d 100644
>> --- a/arch/riscv/include/asm/scs.h
>> +++ b/arch/riscv/include/asm/scs.h
>> @@ -18,6 +18,11 @@
>> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
>> .endm
>> +/* Load the per-CPU IRQ shadow call stack to gp. */
>> +.macro scs_load_sse_stack reg_evt
>> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
>> +.endm
>> +
>> /* Load task_scs_sp(current) to gp. */
>> .macro scs_load_current
>> REG_L gp, TASK_TI_SCS_SP(tp)
>> @@ -41,6 +46,8 @@
>> .endm
>> .macro scs_load_irq_stack tmp
>> .endm
>> +.macro scs_load_sse_stack reg_evt
>> +.endm
>> .macro scs_load_current
>> .endm
>> .macro scs_load_current_if_task_changed prev
>> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
>> new file mode 100644
>> index 000000000000..431a19d4cd9c
>> --- /dev/null
>> +++ b/arch/riscv/include/asm/sse.h
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +#ifndef __ASM_SSE_H
>> +#define __ASM_SSE_H
>> +
>> +#ifdef CONFIG_RISCV_SSE
>> +
>> +struct sse_event_interrupted_state {
>> + unsigned long a6;
>> + unsigned long a7;
>> +};
>> +
>> +struct sse_event_arch_data {
>> + void *stack;
>> + void *shadow_stack;
>> + unsigned long tmp;
>> + struct sse_event_interrupted_state interrupted;
>> + unsigned long interrupted_state_phys;
>> + u32 evt_id;
>> +};
>> +
>> +struct sse_registered_event;
>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>> evt_id,
>> + int cpu);
>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
>> +
>> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
>> + struct pt_regs *regs);
>> +asmlinkage void handle_sse(void);
>> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
>> + struct pt_regs *reg);
>> +
>> +#endif
>> +
>> +#endif
>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/
>> asm/switch_to.h
>> index 94e33216b2d9..e166fabe04ab 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct
>> task_struct *next)
>> :: "r" (next->thread.envcfg) : "memory");
>> }
>> +#ifdef CONFIG_RISCV_SSE
>> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
>> +
>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>> +{
>> + __this_cpu_write(__sse_entry_task, next);
>> +}
>> +#else
>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>> +{
>> +}
>> +#endif
>> +
>> extern struct task_struct *__switch_to(struct task_struct *,
>> struct task_struct *);
>> @@ -122,6 +135,7 @@ do { \
>> if (switch_to_should_flush_icache(__next)) \
>> local_flush_icache_all(); \
>> __switch_to_envcfg(__next); \
>> + __switch_sse_entry_task(__next); \
>> ((last) = __switch_to(__prev, __next)); \
>> } while (0)
>> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/
>> include/asm/thread_info.h
>> index f5916a70879a..28e9805e61fc 100644
>> --- a/arch/riscv/include/asm/thread_info.h
>> +++ b/arch/riscv/include/asm/thread_info.h
>> @@ -36,6 +36,7 @@
>> #define OVERFLOW_STACK_SIZE SZ_4K
>> #define IRQ_STACK_SIZE THREAD_SIZE
>> +#define SSE_STACK_SIZE THREAD_SIZE
>> #ifndef __ASSEMBLY__
>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>> index 063d1faf5a53..1e8fb83b1162 100644
>> --- a/arch/riscv/kernel/Makefile
>> +++ b/arch/riscv/kernel/Makefile
>> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
>> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
>> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
>> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
>> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
>> ifeq ($(CONFIG_RISCV_SBI), y)
>> obj-$(CONFIG_SMP) += sbi-ipi.o
>> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-
>> offsets.c
>> index e89455a6a0e5..60590a3d9519 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -14,6 +14,8 @@
>> #include <asm/ptrace.h>
>> #include <asm/cpu_ops_sbi.h>
>> #include <asm/stacktrace.h>
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> #include <asm/suspend.h>
>> void asm_offsets(void);
>> @@ -511,4 +513,14 @@ void asm_offsets(void)
>> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
>> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
>> #endif
>> +
>> +#ifdef CONFIG_RISCV_SSE
>> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
>> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
>> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
>> +
>> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
>> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
>> + DEFINE(NR_CPUS, NR_CPUS);
>> +#endif
>> }
>> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
>> new file mode 100644
>> index 000000000000..b48ae69dad8d
>> --- /dev/null
>> +++ b/arch/riscv/kernel/sse.c
>> @@ -0,0 +1,134 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +#include <linux/nmi.h>
>> +#include <linux/scs.h>
>> +#include <linux/bitfield.h>
>> +#include <linux/riscv_sse.h>
>> +#include <linux/percpu-defs.h>
>> +
>> +#include <asm/asm-prototypes.h>
>> +#include <asm/switch_to.h>
>> +#include <asm/irq_stack.h>
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> +
>> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
>> +
>> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt,
>> struct pt_regs *regs)
>> +{
>> +}
>> +
>> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
>> +{
>> + nmi_enter();
>> +
>> + /* Retrieve missing GPRs from SBI */
>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
>> + SBI_SSE_ATTR_INTERRUPTED_A6,
>> + (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6)
>> + 1,
>> + arch_evt->interrupted_state_phys, 0, 0);
>> +
>> + memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt-
>> >interrupted));
>> +
>> + sse_handle_event(arch_evt, regs);
>> +
>> + /*
>> + * The SSE delivery path does not uses the "standard" exception
>> path and
>> + * thus does not process any pending signal/softirqs. Some
>> drivers might
>> + * enqueue pending work that needs to be handled as soon as
>> possible.
>> + * For that purpose, set the software interrupt pending bit which
>> will
>> + * be serviced once interrupts are reenabled
>> + */
>> + csr_set(CSR_IP, IE_SIE);
>
>
> This looks a bit hackish and under performant to trigger an IRQ at each
> SSE event, why is it necessary? I understand that we may want to service
> signals right away, for example in case of a uncorrectable memory error
> in order to send a SIGBUS to the process before it goes on, but why
> should we care about softirqs here?
Hi Alex,
SSE events are run in a NMI context. Basically, nothing is executed in
this context, except signaling that there is work to do. For instance,
the GHES handler (currently in a ventana branch) just enqueue some work
to be done in a workqueue. The same goes for the PMU, it just enqueue
some work in case of a NMI.
While it might not be strictly necessary for the PMU, it is for the GHES
handler. Not doing so would allow the user application to continue it's
execution until the next IRQ even though an error was reported. A late
signal handling coulmd be really problematic. That would be even worse
for the kernel.
ARM SDEI does the same, except for a single case that I can add (ie,
interrupted a kernel with interrupts disabled, thus there is no need to
trig softirqs, they will be handled when returning from it).
>
>
>> +
>> + nmi_exit();
>> +}
>> +
>> +#ifdef CONFIG_VMAP_STACK
>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>> size)
>> +{
>> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
>> +}
>> +
>> +static void sse_stack_free(unsigned long *stack)
>> +{
>> + vfree(stack);
>> +}
>> +#else /* CONFIG_VMAP_STACK */
>> +
>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>> size)
>> +{
>> + return kmalloc(size, GFP_KERNEL);
>> +}
>> +
>> +static void sse_stack_free(unsigned long *stack)
>> +{
>> + kfree(stack);
>> +}
>> +
>> +#endif /* CONFIG_VMAP_STACK */
>
>
> Can't we use kvmalloc() here to avoid the #ifdef? Or is there a real
> benefit of using vmalloced stacks?
I believe the goal is not the same. Using CONFIG_VMAP_STACK allows the
kernel exception handling to catch any stack overflow when entering the
kernel and thus using vmalloc is required to allocate twice the page
size (overflow is when sp is located in the upper half of the allocated
vmalloc stack. So basically, this is two distinct purposes.
AFAIU, kvmalloc allows to fallback to vmalloc if kmalloc fails. This is
not what we are looking for here since our allocation size is always
quite small and known (STACK_SIZE basically).
But I might be missing something.
>
>
>> +
>> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
>> +{
>> + void *stack;
>> +
>> + if (!scs_is_enabled())
>> + return 0;
>> +
>> + stack = scs_alloc(cpu_to_node(cpu));
>> + if (!stack)
>> + return 1;
>
>
> Nit: return -ENOMEM
That's better indeed.
>
>
>> +
>> + arch_evt->shadow_stack = stack;
>> +
>> + return 0;
>> +}
>> +
>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>> evt_id, int cpu)
>> +{
>> + void *stack;
>> +
>> + arch_evt->evt_id = evt_id;
>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>> + if (!stack)
>> + return -ENOMEM;
>> +
>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>> +
>> + if (sse_init_scs(cpu, arch_evt))
>> + goto free_stack;
>> +
>> + if (is_kernel_percpu_address((unsigned long)&arch_evt-
>> >interrupted)) {
>> + arch_evt->interrupted_state_phys =
>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>> + } else {
>> + arch_evt->interrupted_state_phys =
>> + virt_to_phys(&arch_evt->interrupted);
>> + }
>> +
>> + return 0;
>> +
>> +free_stack:
>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>> +
>> + return -ENOMEM;
>> +}
>> +
>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
>> +{
>> + scs_free(arch_evt->shadow_stack);
>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>> +}
>> +
>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
>> +{
>> + struct sbiret sret;
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt-
>> >evt_id,
>> + (unsigned long) handle_sse, (unsigned long) arch_evt,
>> + 0, 0, 0);
>> +
>> + return sbi_err_map_linux_errno(sret.error);
>> +}
>> diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/
>> sse_entry.S
>> new file mode 100644
>> index 000000000000..0b2f890edd89
>> --- /dev/null
>> +++ b/arch/riscv/kernel/sse_entry.S
>> @@ -0,0 +1,171 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/linkage.h>
>> +
>> +#include <asm/asm.h>
>> +#include <asm/csr.h>
>> +#include <asm/scs.h>
>> +
>> +/* When entering handle_sse, the following registers are set:
>> + * a6: contains the hartid
>> + * a6: contains struct sse_registered_event pointer
>> + */
>> +SYM_CODE_START(handle_sse)
>> + /* Save stack temporarily */
>> + REG_S sp, SSE_REG_EVT_TMP(a7)
>> + /* Set entry stack */
>> + REG_L sp, SSE_REG_EVT_STACK(a7)
>> +
>> + addi sp, sp, -(PT_SIZE_ON_STACK)
>> + REG_S ra, PT_RA(sp)
>> + REG_S s0, PT_S0(sp)
>> + REG_S s1, PT_S1(sp)
>> + REG_S s2, PT_S2(sp)
>> + REG_S s3, PT_S3(sp)
>> + REG_S s4, PT_S4(sp)
>> + REG_S s5, PT_S5(sp)
>> + REG_S s6, PT_S6(sp)
>> + REG_S s7, PT_S7(sp)
>> + REG_S s8, PT_S8(sp)
>> + REG_S s9, PT_S9(sp)
>> + REG_S s10, PT_S10(sp)
>> + REG_S s11, PT_S11(sp)
>> + REG_S tp, PT_TP(sp)
>> + REG_S t0, PT_T0(sp)
>> + REG_S t1, PT_T1(sp)
>> + REG_S t2, PT_T2(sp)
>> + REG_S t3, PT_T3(sp)
>> + REG_S t4, PT_T4(sp)
>> + REG_S t5, PT_T5(sp)
>> + REG_S t6, PT_T6(sp)
>> + REG_S gp, PT_GP(sp)
>> + REG_S a0, PT_A0(sp)
>> + REG_S a1, PT_A1(sp)
>> + REG_S a2, PT_A2(sp)
>> + REG_S a3, PT_A3(sp)
>> + REG_S a4, PT_A4(sp)
>> + REG_S a5, PT_A5(sp)
>> +
>> + /* Retrieve entry sp */
>> + REG_L a4, SSE_REG_EVT_TMP(a7)
>> + /* Save CSRs */
>> + csrr a0, CSR_EPC
>> + csrr a1, CSR_SSTATUS
>> + csrr a2, CSR_STVAL
>> + csrr a3, CSR_SCAUSE
>> +
>> + REG_S a0, PT_EPC(sp)
>> + REG_S a1, PT_STATUS(sp)
>> + REG_S a2, PT_BADADDR(sp)
>> + REG_S a3, PT_CAUSE(sp)
>> + REG_S a4, PT_SP(sp)
>> +
>> + /* Disable user memory access and floating/vector computing */
>> + li t0, SR_SUM | SR_FS_VS
>> + csrc CSR_STATUS, t0
>> +
>> + load_global_pointer
>> + scs_load_sse_stack a7
>> +
>> + /* Restore current task struct from __sse_entry_task */
>> + li t1, NR_CPUS
>> + move t3, zero
>> +
>> +#ifdef CONFIG_SMP
>> + /* Find the CPU id associated to the hart id */
>> + la t0, __cpuid_to_hartid_map
>> +.Lhart_id_loop:
>> + REG_L t2, 0(t0)
>> + beq t2, a6, .Lcpu_id_found
>> +
>> + /* Increment pointer and CPU number */
>> + addi t3, t3, 1
>> + addi t0, t0, RISCV_SZPTR
>> + bltu t3, t1, .Lhart_id_loop
>> +
>> + /*
>> + * This should never happen since we expect the hart_id to match one
>> + * of our CPU, but better be safe than sorry
>> + */
>> + la tp, init_task
>> + la a0, sse_hart_id_panic_string
>> + la t0, panic
>> + jalr t0
>> +
>> +.Lcpu_id_found:
>> +#endif
>> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
>> + REG_L tp, 0(t2)
>> +
>> + move a1, sp /* pt_regs on stack */
>> + /* Kernel was interrupted, create stack frame */
>> + beqz s1, .Lcall_do_sse
>
>
> I don't understand this since in any case we will go to .Lcall_do_sse
> right? And I don't see where s1 is initialized.
Yeah indeed, that's a leftover of some stack frame creation that I
forgot to remove. I'll remove that !
Thanks for the review.
Clément
>
>
>> +
>> +.Lcall_do_sse:
>> + /*
>> + * Save sscratch for restoration since we might have interrupted the
>> + * kernel in early exception path and thus, we don't know the
>> content of
>> + * sscratch.
>> + */
>> + csrr s4, CSR_SSCRATCH
>> + /* In-kernel scratch is 0 */
>> + csrw CSR_SCRATCH, x0
>> +
>> + move a0, a7
>> +
>> + call do_sse
>> +
>> + csrw CSR_SSCRATCH, s4
>> +
>> + REG_L a0, PT_EPC(sp)
>> + REG_L a1, PT_STATUS(sp)
>> + REG_L a2, PT_BADADDR(sp)
>> + REG_L a3, PT_CAUSE(sp)
>> + csrw CSR_EPC, a0
>> + csrw CSR_SSTATUS, a1
>> + csrw CSR_STVAL, a2
>> + csrw CSR_SCAUSE, a3
>> +
>> + REG_L ra, PT_RA(sp)
>> + REG_L s0, PT_S0(sp)
>> + REG_L s1, PT_S1(sp)
>> + REG_L s2, PT_S2(sp)
>> + REG_L s3, PT_S3(sp)
>> + REG_L s4, PT_S4(sp)
>> + REG_L s5, PT_S5(sp)
>> + REG_L s6, PT_S6(sp)
>> + REG_L s7, PT_S7(sp)
>> + REG_L s8, PT_S8(sp)
>> + REG_L s9, PT_S9(sp)
>> + REG_L s10, PT_S10(sp)
>> + REG_L s11, PT_S11(sp)
>> + REG_L tp, PT_TP(sp)
>> + REG_L t0, PT_T0(sp)
>> + REG_L t1, PT_T1(sp)
>> + REG_L t2, PT_T2(sp)
>> + REG_L t3, PT_T3(sp)
>> + REG_L t4, PT_T4(sp)
>> + REG_L t5, PT_T5(sp)
>> + REG_L t6, PT_T6(sp)
>> + REG_L gp, PT_GP(sp)
>> + REG_L a0, PT_A0(sp)
>> + REG_L a1, PT_A1(sp)
>> + REG_L a2, PT_A2(sp)
>> + REG_L a3, PT_A3(sp)
>> + REG_L a4, PT_A4(sp)
>> + REG_L a5, PT_A5(sp)
>> +
>> + REG_L sp, PT_SP(sp)
>> +
>> + li a7, SBI_EXT_SSE
>> + li a6, SBI_SSE_EVENT_COMPLETE
>> + ecall
>> +
>> +SYM_CODE_END(handle_sse)
>> +
>> +sse_hart_id_panic_string:
>> + .ascii "Unable to match hart_id with cpu\0"
>
>
> Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-22 12:23 ` Alexandre Ghiti
@ 2025-01-23 8:41 ` Clément Léger
0 siblings, 0 replies; 22+ messages in thread
From: Clément Léger @ 2025-01-23 8:41 UTC (permalink / raw)
To: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, linux-riscv,
linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
On 22/01/2025 13:23, Alexandre Ghiti wrote:
> BTW, shouldn't we "detect" the SSE extension like we do for other SBI
> extensions (I don't know if we do that for all of them though)? Not that
> it seems needed but maybe as a way to visualize that SBI supports it?
This part is done in the drivers/firmware driver. This patch is
basically the arch support for SSE (ie stack setup, registers, entry)
and does nothing on its own. The driver/firmware part handles all the
upper level logic to register/enable/etc the events and checks for the
availability of the SSE extension.
Thanks,
Clément
>
> Thanks,
>
> Alex
>
> On 22/01/2025 13:15, Alexandre Ghiti wrote:
>> Hi Clément,
>>
>> On 06/12/2024 17:30, Clément Léger wrote:
>>> The SBI SSE extension allows the supervisor software to be notified by
>>> the SBI of specific events that are not maskable. The context switch is
>>> handled partially by the firmware which will save registers a6 and a7.
>>> When entering kernel we can rely on these 2 registers to setup the stack
>>> and save all the registers.
>>>
>>> Since SSE events can be delivered at any time to the kernel (including
>>> during exception handling, we need a way to locate the current_task for
>>> context tracking. On RISC-V, it is sotred in scratch when in user space
>>> or tp when in kernel space (in which case SSCRATCH is zero). But at a
>>> at the beginning of exception handling, SSCRATCH is used to swap tp and
>>> check the origin of the exception. If interrupted at that point, then,
>>> there is no way to reliably know were is located the current
>>> task_struct. Even checking the interruption location won't work as SSE
>>> event can be nested on top of each other so the original interruption
>>> site might be lost at some point. In order to retrieve it reliably,
>>> store the current task in an additionnal __sse_entry_task per_cpu array.
>>> This array is then used to retrieve the current task based on the
>>> hart ID that is passed to the SSE event handler in a6.
>>>
>>> That being said, the way the current task struct is stored should
>>> probably be reworked to find a better reliable alternative.
>>>
>>> Since each events (and each CPU for local events) have their own
>>> context and can preempt each other, allocate a stack (and a shadow stack
>>> if needed for each of them (and for each cpu for local events).
>>>
>>> When completing the event, if we were coming from kernel with interrupts
>>> disabled, simply return there. If coming from userspace or kernel with
>>> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
>>> CSR_IP to allow delivery of signals to user task. For instance this can
>>> happen, when a RAS event has been generated by a user application and a
>>> SIGBUS has been sent to a task.
>>
>>
>> Nit: there are some typos in the commit log and missing ')'.
>>
>>
>>>
>>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>>> ---
>>> arch/riscv/include/asm/asm.h | 14 ++-
>>> arch/riscv/include/asm/scs.h | 7 ++
>>> arch/riscv/include/asm/sse.h | 38 ++++++
>>> arch/riscv/include/asm/switch_to.h | 14 +++
>>> arch/riscv/include/asm/thread_info.h | 1 +
>>> arch/riscv/kernel/Makefile | 1 +
>>> arch/riscv/kernel/asm-offsets.c | 12 ++
>>> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
>>> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
>>> 9 files changed, 389 insertions(+), 3 deletions(-)
>>> create mode 100644 arch/riscv/include/asm/sse.h
>>> create mode 100644 arch/riscv/kernel/sse.c
>>> create mode 100644 arch/riscv/kernel/sse_entry.S
>>>
>>> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
>>> index 776354895b81..de8427c58f02 100644
>>> --- a/arch/riscv/include/asm/asm.h
>>> +++ b/arch/riscv/include/asm/asm.h
>>> @@ -89,16 +89,24 @@
>>> #define PER_CPU_OFFSET_SHIFT 3
>>> #endif
>>> -.macro asm_per_cpu dst sym tmp
>>> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
>>> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
>>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>>> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
>>> la \dst, __per_cpu_offset
>>> add \dst, \dst, \tmp
>>> REG_L \tmp, 0(\dst)
>>> la \dst, \sym
>>> add \dst, \dst, \tmp
>>> .endm
>>> +
>>> +.macro asm_per_cpu dst sym tmp
>>> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
>>> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
>>> +.endm
>>> #else /* CONFIG_SMP */
>>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>>> + la \dst, \sym
>>> +.endm
>>> +
>>> .macro asm_per_cpu dst sym tmp
>>> la \dst, \sym
>>> .endm
>>> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
>>> index 0e45db78b24b..62344daad73d 100644
>>> --- a/arch/riscv/include/asm/scs.h
>>> +++ b/arch/riscv/include/asm/scs.h
>>> @@ -18,6 +18,11 @@
>>> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
>>> .endm
>>> +/* Load the per-CPU IRQ shadow call stack to gp. */
>>> +.macro scs_load_sse_stack reg_evt
>>> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
>>> +.endm
>>> +
>>> /* Load task_scs_sp(current) to gp. */
>>> .macro scs_load_current
>>> REG_L gp, TASK_TI_SCS_SP(tp)
>>> @@ -41,6 +46,8 @@
>>> .endm
>>> .macro scs_load_irq_stack tmp
>>> .endm
>>> +.macro scs_load_sse_stack reg_evt
>>> +.endm
>>> .macro scs_load_current
>>> .endm
>>> .macro scs_load_current_if_task_changed prev
>>> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
>>> new file mode 100644
>>> index 000000000000..431a19d4cd9c
>>> --- /dev/null
>>> +++ b/arch/riscv/include/asm/sse.h
>>> @@ -0,0 +1,38 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +#ifndef __ASM_SSE_H
>>> +#define __ASM_SSE_H
>>> +
>>> +#ifdef CONFIG_RISCV_SSE
>>> +
>>> +struct sse_event_interrupted_state {
>>> + unsigned long a6;
>>> + unsigned long a7;
>>> +};
>>> +
>>> +struct sse_event_arch_data {
>>> + void *stack;
>>> + void *shadow_stack;
>>> + unsigned long tmp;
>>> + struct sse_event_interrupted_state interrupted;
>>> + unsigned long interrupted_state_phys;
>>> + u32 evt_id;
>>> +};
>>> +
>>> +struct sse_registered_event;
>>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>>> evt_id,
>>> + int cpu);
>>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
>>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
>>> +
>>> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
>>> + struct pt_regs *regs);
>>> +asmlinkage void handle_sse(void);
>>> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
>>> + struct pt_regs *reg);
>>> +
>>> +#endif
>>> +
>>> +#endif
>>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/
>>> asm/switch_to.h
>>> index 94e33216b2d9..e166fabe04ab 100644
>>> --- a/arch/riscv/include/asm/switch_to.h
>>> +++ b/arch/riscv/include/asm/switch_to.h
>>> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct
>>> task_struct *next)
>>> :: "r" (next->thread.envcfg) : "memory");
>>> }
>>> +#ifdef CONFIG_RISCV_SSE
>>> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
>>> +
>>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>>> +{
>>> + __this_cpu_write(__sse_entry_task, next);
>>> +}
>>> +#else
>>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>>> +{
>>> +}
>>> +#endif
>>> +
>>> extern struct task_struct *__switch_to(struct task_struct *,
>>> struct task_struct *);
>>> @@ -122,6 +135,7 @@ do { \
>>> if (switch_to_should_flush_icache(__next)) \
>>> local_flush_icache_all(); \
>>> __switch_to_envcfg(__next); \
>>> + __switch_sse_entry_task(__next); \
>>> ((last) = __switch_to(__prev, __next)); \
>>> } while (0)
>>> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/
>>> include/asm/thread_info.h
>>> index f5916a70879a..28e9805e61fc 100644
>>> --- a/arch/riscv/include/asm/thread_info.h
>>> +++ b/arch/riscv/include/asm/thread_info.h
>>> @@ -36,6 +36,7 @@
>>> #define OVERFLOW_STACK_SIZE SZ_4K
>>> #define IRQ_STACK_SIZE THREAD_SIZE
>>> +#define SSE_STACK_SIZE THREAD_SIZE
>>> #ifndef __ASSEMBLY__
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index 063d1faf5a53..1e8fb83b1162 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
>>> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
>>> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
>>> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
>>> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
>>> ifeq ($(CONFIG_RISCV_SBI), y)
>>> obj-$(CONFIG_SMP) += sbi-ipi.o
>>> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-
>>> offsets.c
>>> index e89455a6a0e5..60590a3d9519 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -14,6 +14,8 @@
>>> #include <asm/ptrace.h>
>>> #include <asm/cpu_ops_sbi.h>
>>> #include <asm/stacktrace.h>
>>> +#include <asm/sbi.h>
>>> +#include <asm/sse.h>
>>> #include <asm/suspend.h>
>>> void asm_offsets(void);
>>> @@ -511,4 +513,14 @@ void asm_offsets(void)
>>> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
>>> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
>>> #endif
>>> +
>>> +#ifdef CONFIG_RISCV_SSE
>>> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
>>> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data,
>>> shadow_stack);
>>> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
>>> +
>>> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
>>> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
>>> + DEFINE(NR_CPUS, NR_CPUS);
>>> +#endif
>>> }
>>> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
>>> new file mode 100644
>>> index 000000000000..b48ae69dad8d
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/sse.c
>>> @@ -0,0 +1,134 @@
>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +#include <linux/nmi.h>
>>> +#include <linux/scs.h>
>>> +#include <linux/bitfield.h>
>>> +#include <linux/riscv_sse.h>
>>> +#include <linux/percpu-defs.h>
>>> +
>>> +#include <asm/asm-prototypes.h>
>>> +#include <asm/switch_to.h>
>>> +#include <asm/irq_stack.h>
>>> +#include <asm/sbi.h>
>>> +#include <asm/sse.h>
>>> +
>>> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
>>> +
>>> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt,
>>> struct pt_regs *regs)
>>> +{
>>> +}
>>> +
>>> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
>>> +{
>>> + nmi_enter();
>>> +
>>> + /* Retrieve missing GPRs from SBI */
>>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
>>> + SBI_SSE_ATTR_INTERRUPTED_A6,
>>> + (SBI_SSE_ATTR_INTERRUPTED_A7 -
>>> SBI_SSE_ATTR_INTERRUPTED_A6) + 1,
>>> + arch_evt->interrupted_state_phys, 0, 0);
>>> +
>>> + memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt-
>>> >interrupted));
>>> +
>>> + sse_handle_event(arch_evt, regs);
>>> +
>>> + /*
>>> + * The SSE delivery path does not uses the "standard" exception
>>> path and
>>> + * thus does not process any pending signal/softirqs. Some
>>> drivers might
>>> + * enqueue pending work that needs to be handled as soon as
>>> possible.
>>> + * For that purpose, set the software interrupt pending bit
>>> which will
>>> + * be serviced once interrupts are reenabled
>>> + */
>>> + csr_set(CSR_IP, IE_SIE);
>>
>>
>> This looks a bit hackish and under performant to trigger an IRQ at
>> each SSE event, why is it necessary? I understand that we may want to
>> service signals right away, for example in case of a uncorrectable
>> memory error in order to send a SIGBUS to the process before it goes
>> on, but why should we care about softirqs here?
>>
>>
>>> +
>>> + nmi_exit();
>>> +}
>>> +
>>> +#ifdef CONFIG_VMAP_STACK
>>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>>> size)
>>> +{
>>> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
>>> +}
>>> +
>>> +static void sse_stack_free(unsigned long *stack)
>>> +{
>>> + vfree(stack);
>>> +}
>>> +#else /* CONFIG_VMAP_STACK */
>>> +
>>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>>> size)
>>> +{
>>> + return kmalloc(size, GFP_KERNEL);
>>> +}
>>> +
>>> +static void sse_stack_free(unsigned long *stack)
>>> +{
>>> + kfree(stack);
>>> +}
>>> +
>>> +#endif /* CONFIG_VMAP_STACK */
>>
>>
>> Can't we use kvmalloc() here to avoid the #ifdef? Or is there a real
>> benefit of using vmalloced stacks?
>>
>>
>>> +
>>> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
>>> +{
>>> + void *stack;
>>> +
>>> + if (!scs_is_enabled())
>>> + return 0;
>>> +
>>> + stack = scs_alloc(cpu_to_node(cpu));
>>> + if (!stack)
>>> + return 1;
>>
>>
>> Nit: return -ENOMEM
>>
>>
>>> +
>>> + arch_evt->shadow_stack = stack;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>>> evt_id, int cpu)
>>> +{
>>> + void *stack;
>>> +
>>> + arch_evt->evt_id = evt_id;
>>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>>> + if (!stack)
>>> + return -ENOMEM;
>>> +
>>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>>> +
>>> + if (sse_init_scs(cpu, arch_evt))
>>> + goto free_stack;
>>> +
>>> + if (is_kernel_percpu_address((unsigned long)&arch_evt-
>>> >interrupted)) {
>>> + arch_evt->interrupted_state_phys =
>>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>>> + } else {
>>> + arch_evt->interrupted_state_phys =
>>> + virt_to_phys(&arch_evt->interrupted);
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +free_stack:
>>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>>> +
>>> + return -ENOMEM;
>>> +}
>>> +
>>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
>>> +{
>>> + scs_free(arch_evt->shadow_stack);
>>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>>> +}
>>> +
>>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
>>> +{
>>> + struct sbiret sret;
>>> +
>>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt-
>>> >evt_id,
>>> + (unsigned long) handle_sse, (unsigned long) arch_evt,
>>> + 0, 0, 0);
>>> +
>>> + return sbi_err_map_linux_errno(sret.error);
>>> +}
>>> diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/
>>> sse_entry.S
>>> new file mode 100644
>>> index 000000000000..0b2f890edd89
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/sse_entry.S
>>> @@ -0,0 +1,171 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +
>>> +#include <linux/init.h>
>>> +#include <linux/linkage.h>
>>> +
>>> +#include <asm/asm.h>
>>> +#include <asm/csr.h>
>>> +#include <asm/scs.h>
>>> +
>>> +/* When entering handle_sse, the following registers are set:
>>> + * a6: contains the hartid
>>> + * a6: contains struct sse_registered_event pointer
>>> + */
>>> +SYM_CODE_START(handle_sse)
>>> + /* Save stack temporarily */
>>> + REG_S sp, SSE_REG_EVT_TMP(a7)
>>> + /* Set entry stack */
>>> + REG_L sp, SSE_REG_EVT_STACK(a7)
>>> +
>>> + addi sp, sp, -(PT_SIZE_ON_STACK)
>>> + REG_S ra, PT_RA(sp)
>>> + REG_S s0, PT_S0(sp)
>>> + REG_S s1, PT_S1(sp)
>>> + REG_S s2, PT_S2(sp)
>>> + REG_S s3, PT_S3(sp)
>>> + REG_S s4, PT_S4(sp)
>>> + REG_S s5, PT_S5(sp)
>>> + REG_S s6, PT_S6(sp)
>>> + REG_S s7, PT_S7(sp)
>>> + REG_S s8, PT_S8(sp)
>>> + REG_S s9, PT_S9(sp)
>>> + REG_S s10, PT_S10(sp)
>>> + REG_S s11, PT_S11(sp)
>>> + REG_S tp, PT_TP(sp)
>>> + REG_S t0, PT_T0(sp)
>>> + REG_S t1, PT_T1(sp)
>>> + REG_S t2, PT_T2(sp)
>>> + REG_S t3, PT_T3(sp)
>>> + REG_S t4, PT_T4(sp)
>>> + REG_S t5, PT_T5(sp)
>>> + REG_S t6, PT_T6(sp)
>>> + REG_S gp, PT_GP(sp)
>>> + REG_S a0, PT_A0(sp)
>>> + REG_S a1, PT_A1(sp)
>>> + REG_S a2, PT_A2(sp)
>>> + REG_S a3, PT_A3(sp)
>>> + REG_S a4, PT_A4(sp)
>>> + REG_S a5, PT_A5(sp)
>>> +
>>> + /* Retrieve entry sp */
>>> + REG_L a4, SSE_REG_EVT_TMP(a7)
>>> + /* Save CSRs */
>>> + csrr a0, CSR_EPC
>>> + csrr a1, CSR_SSTATUS
>>> + csrr a2, CSR_STVAL
>>> + csrr a3, CSR_SCAUSE
>>> +
>>> + REG_S a0, PT_EPC(sp)
>>> + REG_S a1, PT_STATUS(sp)
>>> + REG_S a2, PT_BADADDR(sp)
>>> + REG_S a3, PT_CAUSE(sp)
>>> + REG_S a4, PT_SP(sp)
>>> +
>>> + /* Disable user memory access and floating/vector computing */
>>> + li t0, SR_SUM | SR_FS_VS
>>> + csrc CSR_STATUS, t0
>>> +
>>> + load_global_pointer
>>> + scs_load_sse_stack a7
>>> +
>>> + /* Restore current task struct from __sse_entry_task */
>>> + li t1, NR_CPUS
>>> + move t3, zero
>>> +
>>> +#ifdef CONFIG_SMP
>>> + /* Find the CPU id associated to the hart id */
>>> + la t0, __cpuid_to_hartid_map
>>> +.Lhart_id_loop:
>>> + REG_L t2, 0(t0)
>>> + beq t2, a6, .Lcpu_id_found
>>> +
>>> + /* Increment pointer and CPU number */
>>> + addi t3, t3, 1
>>> + addi t0, t0, RISCV_SZPTR
>>> + bltu t3, t1, .Lhart_id_loop
>>> +
>>> + /*
>>> + * This should never happen since we expect the hart_id to match
>>> one
>>> + * of our CPU, but better be safe than sorry
>>> + */
>>> + la tp, init_task
>>> + la a0, sse_hart_id_panic_string
>>> + la t0, panic
>>> + jalr t0
>>> +
>>> +.Lcpu_id_found:
>>> +#endif
>>> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
>>> + REG_L tp, 0(t2)
>>> +
>>> + move a1, sp /* pt_regs on stack */
>>> + /* Kernel was interrupted, create stack frame */
>>> + beqz s1, .Lcall_do_sse
>>
>>
>> I don't understand this since in any case we will go to .Lcall_do_sse
>> right? And I don't see where s1 is initialized.
>>
>>
>>> +
>>> +.Lcall_do_sse:
>>> + /*
>>> + * Save sscratch for restoration since we might have interrupted
>>> the
>>> + * kernel in early exception path and thus, we don't know the
>>> content of
>>> + * sscratch.
>>> + */
>>> + csrr s4, CSR_SSCRATCH
>>> + /* In-kernel scratch is 0 */
>>> + csrw CSR_SCRATCH, x0
>>> +
>>> + move a0, a7
>>> +
>>> + call do_sse
>>> +
>>> + csrw CSR_SSCRATCH, s4
>>> +
>>> + REG_L a0, PT_EPC(sp)
>>> + REG_L a1, PT_STATUS(sp)
>>> + REG_L a2, PT_BADADDR(sp)
>>> + REG_L a3, PT_CAUSE(sp)
>>> + csrw CSR_EPC, a0
>>> + csrw CSR_SSTATUS, a1
>>> + csrw CSR_STVAL, a2
>>> + csrw CSR_SCAUSE, a3
>>> +
>>> + REG_L ra, PT_RA(sp)
>>> + REG_L s0, PT_S0(sp)
>>> + REG_L s1, PT_S1(sp)
>>> + REG_L s2, PT_S2(sp)
>>> + REG_L s3, PT_S3(sp)
>>> + REG_L s4, PT_S4(sp)
>>> + REG_L s5, PT_S5(sp)
>>> + REG_L s6, PT_S6(sp)
>>> + REG_L s7, PT_S7(sp)
>>> + REG_L s8, PT_S8(sp)
>>> + REG_L s9, PT_S9(sp)
>>> + REG_L s10, PT_S10(sp)
>>> + REG_L s11, PT_S11(sp)
>>> + REG_L tp, PT_TP(sp)
>>> + REG_L t0, PT_T0(sp)
>>> + REG_L t1, PT_T1(sp)
>>> + REG_L t2, PT_T2(sp)
>>> + REG_L t3, PT_T3(sp)
>>> + REG_L t4, PT_T4(sp)
>>> + REG_L t5, PT_T5(sp)
>>> + REG_L t6, PT_T6(sp)
>>> + REG_L gp, PT_GP(sp)
>>> + REG_L a0, PT_A0(sp)
>>> + REG_L a1, PT_A1(sp)
>>> + REG_L a2, PT_A2(sp)
>>> + REG_L a3, PT_A3(sp)
>>> + REG_L a4, PT_A4(sp)
>>> + REG_L a5, PT_A5(sp)
>>> +
>>> + REG_L sp, PT_SP(sp)
>>> +
>>> + li a7, SBI_EXT_SSE
>>> + li a6, SBI_SSE_EVENT_COMPLETE
>>> + ecall
>>> +
>>> +SYM_CODE_END(handle_sse)
>>> +
>>> +sse_hart_id_panic_string:
>>> + .ascii "Unable to match hart_id with cpu\0"
>>
>>
>> Thanks,
>>
>> Alex
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2025-01-16 13:58 ` Conor Dooley
@ 2025-01-23 10:52 ` Clément Léger
2025-01-24 14:15 ` Conor Dooley
0 siblings, 1 reply; 22+ messages in thread
From: Clément Léger @ 2025-01-23 10:52 UTC (permalink / raw)
To: Conor Dooley
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
On 16/01/2025 14:58, Conor Dooley wrote:
> On Fri, Dec 06, 2024 at 05:30:59PM +0100, Clément Léger wrote:
>> Add driver level interface to use RISC-V SSE arch support. This interface
>> allows registering SSE handlers, and receive them. This will be used by
>> PMU and GHES driver.
>>
>> Signed-off-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
>> Co-developed-by: Himanshu Chauhan <hchauhan@ventanamicro.com>
>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>> ---
>> MAINTAINERS | 14 +
>> drivers/firmware/Kconfig | 1 +
>> drivers/firmware/Makefile | 1 +
>> drivers/firmware/riscv/Kconfig | 15 +
>> drivers/firmware/riscv/Makefile | 3 +
>> drivers/firmware/riscv/riscv_sse.c | 691 +++++++++++++++++++++++++++++
>> include/linux/riscv_sse.h | 56 +++
>> 7 files changed, 781 insertions(+)
>> create mode 100644 drivers/firmware/riscv/Kconfig
>> create mode 100644 drivers/firmware/riscv/Makefile
>> create mode 100644 drivers/firmware/riscv/riscv_sse.c
>> create mode 100644 include/linux/riscv_sse.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 686109008d8e..a3ddde7fe9fb 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -20125,6 +20125,13 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
>> F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
>> F: drivers/iommu/riscv/
>>
>> +RISC-V FIRMWARE DRIVERS
>> +M: Conor Dooley <conor@kernel.org>
>> +L: linux-riscv@lists.infradead.org
>> +S: Maintained
>> +T: git git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
>> +F: drivers/firmware/riscv/*
>
> Acked-by: Conor Dooley <conor.dooley@microchip.com>
>
> (got some, mostly minor, comments below)
>
>> diff --git a/drivers/firmware/riscv/Makefile b/drivers/firmware/riscv/Makefile
>> new file mode 100644
>> index 000000000000..4ccfcbbc28ea
>> --- /dev/null
>> +++ b/drivers/firmware/riscv/Makefile
>> @@ -0,0 +1,3 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +obj-$(CONFIG_RISCV_SSE) += riscv_sse.o
>> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
>> new file mode 100644
>> index 000000000000..c165e32cc9a5
>> --- /dev/null
>> +++ b/drivers/firmware/riscv/riscv_sse.c
>> @@ -0,0 +1,691 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (C) 2024 Rivos Inc.
>> + */
>> +
>> +#define pr_fmt(fmt) "sse: " fmt
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/cpuhotplug.h>
>> +#include <linux/cpu_pm.h>
>> +#include <linux/hardirq.h>
>> +#include <linux/list.h>
>> +#include <linux/percpu-defs.h>
>> +#include <linux/reboot.h>
>> +#include <linux/riscv_sse.h>
>> +#include <linux/slab.h>
>> +
>> +#include <asm/sbi.h>
>> +#include <asm/sse.h>
>> +
>> +struct sse_event {
>> + struct list_head list;
>> + u32 evt;
>> + u32 priority;
>> + sse_event_handler *handler;
>> + void *handler_arg;
>> + bool is_enabled;
>> + /* Only valid for global events */
>> + unsigned int cpu;
>> +
>> + union {
>> + struct sse_registered_event *global;
>> + struct sse_registered_event __percpu *local;
>> + };
>> +};
>> +
>> +static int sse_hp_state;
>> +static bool sse_available;
>> +static DEFINE_SPINLOCK(events_list_lock);
>> +static LIST_HEAD(events);
>> +static DEFINE_MUTEX(sse_mutex);
>> +
>> +struct sse_registered_event {
>> + struct sse_event_arch_data arch;
>> + struct sse_event *evt;
>> + unsigned long attr_buf;
>> +};
>> +
>> +void sse_handle_event(struct sse_event_arch_data *arch_event,
>> + struct pt_regs *regs)
>> +{
>> + int ret;
>> + struct sse_registered_event *reg_evt =
>> + container_of(arch_event, struct sse_registered_event, arch);
>> + struct sse_event *evt = reg_evt->evt;
>> +
>> + ret = evt->handler(evt->evt, evt->handler_arg, regs);
>
> Is it possible to get here with a null handler? Or will !registered
> events not lead to the handler getting called?
Hi Conor,
Basically yes, if we receive an event, it means it was registered and
enabled. Since when we register an event, we associate it with an
handler, it can not be NULL.
>
>> + if (ret)
>> + pr_warn("event %x handler failed with error %d\n", evt->evt,
>> + ret);
>> +}
>> +
>> +static bool sse_event_is_global(u32 evt)
>> +{
>> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
>> +}
>> +
>> +static
>> +struct sse_event *sse_event_get(u32 evt)
>
> nit: Could you shift this into one line?
Yeah sure.
>
>> +{
>> + struct sse_event *sse_evt = NULL, *tmp;
>> +
>> + scoped_guard(spinlock, &events_list_lock) {
>> + list_for_each_entry(tmp, &events, list) {
>> + if (tmp->evt == evt) {
>> + return sse_evt;
>> + }
>> + }
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static phys_addr_t sse_event_get_phys(struct sse_registered_event *reg_evt,
>> + void *addr)
>> +{
>> + phys_addr_t phys;
>> +
>> + if (sse_event_is_global(reg_evt->evt->evt))
>> + phys = virt_to_phys(addr);
>> + else
>> + phys = per_cpu_ptr_to_phys(addr);
>> +
>> + return phys;
>> +}
>> +
>> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
>> +{
>> + struct sbiret ret;
>> + u32 evt = event->evt;
>> +
>> + ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
>> + if (ret.error)
>> + pr_debug("Failed to execute func %lx, event %x, error %ld\n",
>> + func, evt, ret.error);
>
> Why's this only at a debug level?
That's only really meaningful for debugging, this error is often
reported to the upper level and ends up to the final caller. I don't
think we should be too verbose for such drivers, but rather propagate
the error. If one wants to debug, then, just enable DEBUG.
But that's only my opinion, if you'd prefer all pr_debug to be either
removed or changed to pr_err(), I'll do it.
>
>> +
>> + return sbi_err_map_linux_errno(ret.error);
>> +}
>> +
>> +static int sse_sbi_disable_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_DISABLE);
>> +}
>> +
>> +static int sse_sbi_enable_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_ENABLE);
>> +}
>> +
>> +static int sse_event_attr_get_no_lock(struct sse_registered_event *reg_evt,
>> + unsigned long attr_id, unsigned long *val)
>> +{
>> + struct sbiret sret;
>> + u32 evt = reg_evt->evt->evt;
>> + unsigned long phys;
>> +
>> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, evt,
>> + attr_id, 1, phys, 0, 0);
>> + if (sret.error) {
>> + pr_debug("Failed to get event %x attr %lx, error %ld\n", evt,
>> + attr_id, sret.error);
>> + return sbi_err_map_linux_errno(sret.error);
>> + }
>> +
>> + *val = reg_evt->attr_buf;
>> +
>> + return 0;
>> +}
>> +
>> +static int sse_event_attr_set_nolock(struct sse_registered_event *reg_evt,
>> + unsigned long attr_id, unsigned long val)
>> +{
>> + struct sbiret sret;
>> + u32 evt = reg_evt->evt->evt;
>> + unsigned long phys;
>> +
>> + reg_evt->attr_buf = val;
>> + phys = sse_event_get_phys(reg_evt, ®_evt->attr_buf);
>> +
>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_WRITE, evt,
>> + attr_id, 1, phys, 0, 0);
>> + if (sret.error && sret.error != SBI_ERR_INVALID_STATE) {
>
> Why's the invalid state error not treated as an error?
Nice catch. That's a leftover of a previous implementation. That needs
to be removed.
>
>> + pr_debug("Failed to set event %x attr %lx, error %ld\n", evt,
>> + attr_id, sret.error);
>> + return sbi_err_map_linux_errno(sret.error);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int sse_event_set_target_cpu_nolock(struct sse_event *event,
>> + unsigned int cpu)
>> +{
>> + unsigned int hart_id = cpuid_to_hartid_map(cpu);
>> + struct sse_registered_event *reg_evt = event->global;
>> + u32 evt = event->evt;
>> + bool was_enabled;
>> + int ret;
>> +
>> + if (!sse_event_is_global(evt))
>> + return -EINVAL;
>> +
>> + was_enabled = event->is_enabled;
>> + if (was_enabled)
>> + sse_sbi_disable_event(event);
>> + do {
>> + ret = sse_event_attr_set_nolock(reg_evt,
>> + SBI_SSE_ATTR_PREFERRED_HART,
>> + hart_id);
>> + } while (ret == -EINVAL);
>> +
>> + if (ret == 0)
>> + event->cpu = cpu;
>> +
>> + if (was_enabled)
>> + sse_sbi_enable_event(event);
>> +
>> + return 0;
>> +}
>> +
>> +int sse_event_set_target_cpu(struct sse_event *event, unsigned int cpu)
>> +{
>> + int ret;
>> +
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> +
>> + if (!cpu_online(cpu))
>> + return -EINVAL;
>> +
>> + ret = sse_event_set_target_cpu_nolock(event, cpu);
>> +
>> + cpus_read_unlock();
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int sse_event_init_registered(unsigned int cpu,
>> + struct sse_registered_event *reg_evt,
>> + struct sse_event *event)
>> +{
>> + reg_evt->evt = event;
>> + arch_sse_init_event(®_evt->arch, event->evt, cpu);
>> +
>> + return 0;
>> +}
>> +
>> +static void sse_event_free_registered(struct sse_registered_event *reg_evt)
>> +{
>> + arch_sse_free_event(®_evt->arch);
>> +}
>> +
>> +static int sse_event_alloc_global(struct sse_event *event)
>> +{
>> + int err;
>> + struct sse_registered_event *reg_evt;
>> +
>> + reg_evt = kzalloc(sizeof(*reg_evt), GFP_KERNEL);
>> + if (!reg_evt)
>> + return -ENOMEM;
>> +
>> + event->global = reg_evt;
>> + err = sse_event_init_registered(smp_processor_id(), reg_evt,
>> + event);
>> + if (err)
>> + kfree(reg_evt);
>> +
>> + return err;
>> +}
>> +
>> +static int sse_event_alloc_local(struct sse_event *event)
>> +{
>> + int err;
>> + unsigned int cpu, err_cpu;
>> + struct sse_registered_event *reg_evt;
>> + struct sse_registered_event __percpu *reg_evts;
>> +
>> + reg_evts = alloc_percpu(struct sse_registered_event);
>> + if (!reg_evts)
>> + return -ENOMEM;
>> +
>> + event->local = reg_evts;
>> +
>> + for_each_possible_cpu(cpu) {
>> + reg_evt = per_cpu_ptr(reg_evts, cpu);
>> + err = sse_event_init_registered(cpu, reg_evt, event);
>> + if (err) {
>> + err_cpu = cpu;
>> + goto err_free_per_cpu;
>> + }
>> + }
>> +
>> + return 0;
>> +
>> +err_free_per_cpu:
>> + for_each_possible_cpu(cpu) {
>> + if (cpu == err_cpu)
>> + break;
>> + reg_evt = per_cpu_ptr(reg_evts, cpu);
>> + sse_event_free_registered(reg_evt);
>> + }
>> +
>> + free_percpu(reg_evts);
>> +
>> + return err;
>> +}
>> +
>> +static struct sse_event *sse_event_alloc(u32 evt,
>> + u32 priority,
>> + sse_event_handler *handler, void *arg)
>> +{
>> + int err;
>> + struct sse_event *event;
>> +
>> + event = kzalloc(sizeof(*event), GFP_KERNEL);
>> + if (!event)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + event->evt = evt;
>> + event->priority = priority;
>> + event->handler_arg = arg;
>> + event->handler = handler;
>> +
>> + if (sse_event_is_global(evt)) {
>> + err = sse_event_alloc_global(event);
>> + if (err)
>> + goto err_alloc_reg_evt;
>> + } else {
>> + err = sse_event_alloc_local(event);
>> + if (err)
>> + goto err_alloc_reg_evt;
>> + }
>> +
>> + return event;
>> +
>> +err_alloc_reg_evt:
>> + kfree(event);
>> +
>> + return ERR_PTR(err);
>> +}
>> +
>> +static int sse_sbi_register_event(struct sse_event *event,
>> + struct sse_registered_event *reg_evt)
>> +{
>> + int ret;
>> +
>> + ret = sse_event_attr_set_nolock(reg_evt, SBI_SSE_ATTR_PRIO,
>> + event->priority);
>> + if (ret)
>> + return ret;
>> +
>> + return arch_sse_register_event(®_evt->arch);
>> +}
>> +
>> +static int sse_event_register_local(struct sse_event *event)
>> +{
>> + int ret;
>> + struct sse_registered_event *reg_evt = per_cpu_ptr(event->local,
>> + smp_processor_id());
>> +
>> + ret = sse_sbi_register_event(event, reg_evt);
>> + if (ret)
>> + pr_debug("Failed to register event %x: err %d\n", event->evt,
>> + ret);
>
> Same here I guess, why's a registration failure only a debug print?
Same reason than before, this should be used for debug purposes.
>
>> +
>> + return ret;
>> +}
>> +
>> +
>> +static int sse_sbi_unregister_event(struct sse_event *event)
>> +{
>> + return sse_sbi_event_func(event, SBI_SSE_EVENT_UNREGISTER);
>> +}
>> +
>> +struct sse_per_cpu_evt {
>> + struct sse_event *event;
>> + unsigned long func;
>> + atomic_t error;
>> +};
>> +
>> +static void sse_event_per_cpu_func(void *info)
>> +{
>> + int ret;
>> + struct sse_per_cpu_evt *cpu_evt = info;
>> +
>> + if (cpu_evt->func == SBI_SSE_EVENT_REGISTER)
>> + ret = sse_event_register_local(cpu_evt->event);
>> + else
>> + ret = sse_sbi_event_func(cpu_evt->event, cpu_evt->func);
>> +
>> + if (ret)
>> + atomic_set(&cpu_evt->error, ret);
>> +}
>> +
>> +static void sse_event_free(struct sse_event *event)
>> +{
>> + unsigned int cpu;
>> + struct sse_registered_event *reg_evt;
>> +
>> + if (sse_event_is_global(event->evt)) {
>> + sse_event_free_registered(event->global);
>> + kfree(event->global);
>> + } else {
>> + for_each_possible_cpu(cpu) {
>> + reg_evt = per_cpu_ptr(event->local, cpu);
>> + sse_event_free_registered(reg_evt);
>> + }
>> + free_percpu(event->local);
>> + }
>> +
>> + kfree(event);
>> +}
>> +
>> +int sse_event_enable(struct sse_event *event)
>> +{
>> + int ret = 0;
>> + struct sse_per_cpu_evt cpu_evt;
>> +
>> + scoped_guard(mutex, &sse_mutex) {
>> + cpus_read_lock();
>> + if (sse_event_is_global(event->evt)) {
>> + ret = sse_sbi_enable_event(event);
>> + } else {
>> + cpu_evt.event = event;
>> + atomic_set(&cpu_evt.error, 0);
>> + cpu_evt.func = SBI_SSE_EVENT_ENABLE;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
>> + ret = atomic_read(&cpu_evt.error);
>> + if (ret) {
>> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
>> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
>> + 1);
>
> nit: this should fit on one line, no?
the trailing ; is above 80 characters. But if you are ok with 100 char,
I can go for it.
Thanks !
Clément
>
>> + }
>> + }
>> + cpus_read_unlock();
>> +
>> + if (ret == 0)
>> + event->is_enabled = true;
>> + }
>> +
>> + return ret;
>> +}
>
>> 2.45.2
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] drivers: firmware: add riscv SSE support
2025-01-23 10:52 ` Clément Léger
@ 2025-01-24 14:15 ` Conor Dooley
0 siblings, 0 replies; 22+ messages in thread
From: Conor Dooley @ 2025-01-24 14:15 UTC (permalink / raw)
To: Clément Léger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
On Thu, Jan 23, 2025 at 11:52:35AM +0100, Clément Léger wrote:
> On 16/01/2025 14:58, Conor Dooley wrote:
> >> +static int sse_sbi_event_func(struct sse_event *event, unsigned long func)
> >> +{
> >> + struct sbiret ret;
> >> + u32 evt = event->evt;
> >> +
> >> + ret = sbi_ecall(SBI_EXT_SSE, func, evt, 0, 0, 0, 0, 0);
> >> + if (ret.error)
> >> + pr_debug("Failed to execute func %lx, event %x, error %ld\n",
> >> + func, evt, ret.error);
> >
> > Why's this only at a debug level?
>
> That's only really meaningful for debugging, this error is often
> reported to the upper level and ends up to the final caller. I don't
> think we should be too verbose for such drivers, but rather propagate
> the error. If one wants to debug, then, just enable DEBUG.
>
> But that's only my opinion, if you'd prefer all pr_debug to be either
> removed or changed to pr_err(), I'll do it.
Nah, you can leave it as is, you know better than I about how helpful it
would be as an error.
> >> +int sse_event_enable(struct sse_event *event)
> >> +{
> >> + int ret = 0;
> >> + struct sse_per_cpu_evt cpu_evt;
> >> +
> >> + scoped_guard(mutex, &sse_mutex) {
> >> + cpus_read_lock();
> >> + if (sse_event_is_global(event->evt)) {
> >> + ret = sse_sbi_enable_event(event);
> >> + } else {
> >> + cpu_evt.event = event;
> >> + atomic_set(&cpu_evt.error, 0);
> >> + cpu_evt.func = SBI_SSE_EVENT_ENABLE;
> >> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt, 1);
> >> + ret = atomic_read(&cpu_evt.error);
> >> + if (ret) {
> >> + cpu_evt.func = SBI_SSE_EVENT_DISABLE;
> >> + on_each_cpu(sse_event_per_cpu_func, &cpu_evt,
> >> + 1);
> >
> > nit: this should fit on one line, no?
>
> the trailing ; is above 80 characters. But if you are ok with 100 char,
> I can go for it.
The slight improvement in readability trumps the slight increase over 80
characters every time for me.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-23 8:39 ` Clément Léger
@ 2025-01-27 8:09 ` Alexandre Ghiti
2025-01-28 8:10 ` Clément Léger
0 siblings, 1 reply; 22+ messages in thread
From: Alexandre Ghiti @ 2025-01-27 8:09 UTC (permalink / raw)
To: Clément Léger, Paul Walmsley, Palmer Dabbelt,
linux-riscv, linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
Hi Clément,
On 23/01/2025 09:39, Clément Léger wrote:
>
> On 22/01/2025 13:15, Alexandre Ghiti wrote:
>> Hi Clément,
>>
>> On 06/12/2024 17:30, Clément Léger wrote:
>>> The SBI SSE extension allows the supervisor software to be notified by
>>> the SBI of specific events that are not maskable. The context switch is
>>> handled partially by the firmware which will save registers a6 and a7.
>>> When entering kernel we can rely on these 2 registers to setup the stack
>>> and save all the registers.
>>>
>>> Since SSE events can be delivered at any time to the kernel (including
>>> during exception handling, we need a way to locate the current_task for
>>> context tracking. On RISC-V, it is sotred in scratch when in user space
>>> or tp when in kernel space (in which case SSCRATCH is zero). But at a
>>> at the beginning of exception handling, SSCRATCH is used to swap tp and
>>> check the origin of the exception. If interrupted at that point, then,
>>> there is no way to reliably know were is located the current
>>> task_struct. Even checking the interruption location won't work as SSE
>>> event can be nested on top of each other so the original interruption
>>> site might be lost at some point. In order to retrieve it reliably,
>>> store the current task in an additionnal __sse_entry_task per_cpu array.
>>> This array is then used to retrieve the current task based on the
>>> hart ID that is passed to the SSE event handler in a6.
>>>
>>> That being said, the way the current task struct is stored should
>>> probably be reworked to find a better reliable alternative.
>>>
>>> Since each events (and each CPU for local events) have their own
>>> context and can preempt each other, allocate a stack (and a shadow stack
>>> if needed for each of them (and for each cpu for local events).
>>>
>>> When completing the event, if we were coming from kernel with interrupts
>>> disabled, simply return there. If coming from userspace or kernel with
>>> interrupts enabled, simulate an interrupt exception by setting IE_SIE in
>>> CSR_IP to allow delivery of signals to user task. For instance this can
>>> happen, when a RAS event has been generated by a user application and a
>>> SIGBUS has been sent to a task.
>>
>> Nit: there are some typos in the commit log and missing ')'.
> Acked, I'll spellcheck that.
>
>>
>>> Signed-off-by: Clément Léger <cleger@rivosinc.com>
>>> ---
>>> arch/riscv/include/asm/asm.h | 14 ++-
>>> arch/riscv/include/asm/scs.h | 7 ++
>>> arch/riscv/include/asm/sse.h | 38 ++++++
>>> arch/riscv/include/asm/switch_to.h | 14 +++
>>> arch/riscv/include/asm/thread_info.h | 1 +
>>> arch/riscv/kernel/Makefile | 1 +
>>> arch/riscv/kernel/asm-offsets.c | 12 ++
>>> arch/riscv/kernel/sse.c | 134 +++++++++++++++++++++
>>> arch/riscv/kernel/sse_entry.S | 171 +++++++++++++++++++++++++++
>>> 9 files changed, 389 insertions(+), 3 deletions(-)
>>> create mode 100644 arch/riscv/include/asm/sse.h
>>> create mode 100644 arch/riscv/kernel/sse.c
>>> create mode 100644 arch/riscv/kernel/sse_entry.S
>>>
>>> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
>>> index 776354895b81..de8427c58f02 100644
>>> --- a/arch/riscv/include/asm/asm.h
>>> +++ b/arch/riscv/include/asm/asm.h
>>> @@ -89,16 +89,24 @@
>>> #define PER_CPU_OFFSET_SHIFT 3
>>> #endif
>>> -.macro asm_per_cpu dst sym tmp
>>> - REG_L \tmp, TASK_TI_CPU_NUM(tp)
>>> - slli \tmp, \tmp, PER_CPU_OFFSET_SHIFT
>>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>>> + slli \tmp, \cpu, PER_CPU_OFFSET_SHIFT
>>> la \dst, __per_cpu_offset
>>> add \dst, \dst, \tmp
>>> REG_L \tmp, 0(\dst)
>>> la \dst, \sym
>>> add \dst, \dst, \tmp
>>> .endm
>>> +
>>> +.macro asm_per_cpu dst sym tmp
>>> + REG_L \tmp, TASK_TI_CPU_NUM(tp)
>>> + asm_per_cpu_with_cpu \dst \sym \tmp \tmp
>>> +.endm
>>> #else /* CONFIG_SMP */
>>> +.macro asm_per_cpu_with_cpu dst sym tmp cpu
>>> + la \dst, \sym
>>> +.endm
>>> +
>>> .macro asm_per_cpu dst sym tmp
>>> la \dst, \sym
>>> .endm
>>> diff --git a/arch/riscv/include/asm/scs.h b/arch/riscv/include/asm/scs.h
>>> index 0e45db78b24b..62344daad73d 100644
>>> --- a/arch/riscv/include/asm/scs.h
>>> +++ b/arch/riscv/include/asm/scs.h
>>> @@ -18,6 +18,11 @@
>>> load_per_cpu gp, irq_shadow_call_stack_ptr, \tmp
>>> .endm
>>> +/* Load the per-CPU IRQ shadow call stack to gp. */
>>> +.macro scs_load_sse_stack reg_evt
>>> + REG_L gp, SSE_REG_EVT_SHADOW_STACK(\reg_evt)
>>> +.endm
>>> +
>>> /* Load task_scs_sp(current) to gp. */
>>> .macro scs_load_current
>>> REG_L gp, TASK_TI_SCS_SP(tp)
>>> @@ -41,6 +46,8 @@
>>> .endm
>>> .macro scs_load_irq_stack tmp
>>> .endm
>>> +.macro scs_load_sse_stack reg_evt
>>> +.endm
>>> .macro scs_load_current
>>> .endm
>>> .macro scs_load_current_if_task_changed prev
>>> diff --git a/arch/riscv/include/asm/sse.h b/arch/riscv/include/asm/sse.h
>>> new file mode 100644
>>> index 000000000000..431a19d4cd9c
>>> --- /dev/null
>>> +++ b/arch/riscv/include/asm/sse.h
>>> @@ -0,0 +1,38 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +#ifndef __ASM_SSE_H
>>> +#define __ASM_SSE_H
>>> +
>>> +#ifdef CONFIG_RISCV_SSE
>>> +
>>> +struct sse_event_interrupted_state {
>>> + unsigned long a6;
>>> + unsigned long a7;
>>> +};
>>> +
>>> +struct sse_event_arch_data {
>>> + void *stack;
>>> + void *shadow_stack;
>>> + unsigned long tmp;
>>> + struct sse_event_interrupted_state interrupted;
>>> + unsigned long interrupted_state_phys;
>>> + u32 evt_id;
>>> +};
>>> +
>>> +struct sse_registered_event;
>>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>>> evt_id,
>>> + int cpu);
>>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt);
>>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt);
>>> +
>>> +void sse_handle_event(struct sse_event_arch_data *arch_evt,
>>> + struct pt_regs *regs);
>>> +asmlinkage void handle_sse(void);
>>> +asmlinkage void do_sse(struct sse_event_arch_data *arch_evt,
>>> + struct pt_regs *reg);
>>> +
>>> +#endif
>>> +
>>> +#endif
>>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/
>>> asm/switch_to.h
>>> index 94e33216b2d9..e166fabe04ab 100644
>>> --- a/arch/riscv/include/asm/switch_to.h
>>> +++ b/arch/riscv/include/asm/switch_to.h
>>> @@ -88,6 +88,19 @@ static inline void __switch_to_envcfg(struct
>>> task_struct *next)
>>> :: "r" (next->thread.envcfg) : "memory");
>>> }
>>> +#ifdef CONFIG_RISCV_SSE
>>> +DECLARE_PER_CPU(struct task_struct *, __sse_entry_task);
>>> +
>>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>>> +{
>>> + __this_cpu_write(__sse_entry_task, next);
>>> +}
>>> +#else
>>> +static inline void __switch_sse_entry_task(struct task_struct *next)
>>> +{
>>> +}
>>> +#endif
>>> +
>>> extern struct task_struct *__switch_to(struct task_struct *,
>>> struct task_struct *);
>>> @@ -122,6 +135,7 @@ do { \
>>> if (switch_to_should_flush_icache(__next)) \
>>> local_flush_icache_all(); \
>>> __switch_to_envcfg(__next); \
>>> + __switch_sse_entry_task(__next); \
>>> ((last) = __switch_to(__prev, __next)); \
>>> } while (0)
>>> diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/
>>> include/asm/thread_info.h
>>> index f5916a70879a..28e9805e61fc 100644
>>> --- a/arch/riscv/include/asm/thread_info.h
>>> +++ b/arch/riscv/include/asm/thread_info.h
>>> @@ -36,6 +36,7 @@
>>> #define OVERFLOW_STACK_SIZE SZ_4K
>>> #define IRQ_STACK_SIZE THREAD_SIZE
>>> +#define SSE_STACK_SIZE THREAD_SIZE
>>> #ifndef __ASSEMBLY__
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index 063d1faf5a53..1e8fb83b1162 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -99,6 +99,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
>>> obj-$(CONFIG_PERF_EVENTS) += perf_callchain.o
>>> obj-$(CONFIG_HAVE_PERF_REGS) += perf_regs.o
>>> obj-$(CONFIG_RISCV_SBI) += sbi.o sbi_ecall.o
>>> +obj-$(CONFIG_RISCV_SSE) += sse.o sse_entry.o
>>> ifeq ($(CONFIG_RISCV_SBI), y)
>>> obj-$(CONFIG_SMP) += sbi-ipi.o
>>> obj-$(CONFIG_SMP) += cpu_ops_sbi.o
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-
>>> offsets.c
>>> index e89455a6a0e5..60590a3d9519 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -14,6 +14,8 @@
>>> #include <asm/ptrace.h>
>>> #include <asm/cpu_ops_sbi.h>
>>> #include <asm/stacktrace.h>
>>> +#include <asm/sbi.h>
>>> +#include <asm/sse.h>
>>> #include <asm/suspend.h>
>>> void asm_offsets(void);
>>> @@ -511,4 +513,14 @@ void asm_offsets(void)
>>> DEFINE(FREGS_A6, offsetof(struct __arch_ftrace_regs, a6));
>>> DEFINE(FREGS_A7, offsetof(struct __arch_ftrace_regs, a7));
>>> #endif
>>> +
>>> +#ifdef CONFIG_RISCV_SSE
>>> + OFFSET(SSE_REG_EVT_STACK, sse_event_arch_data, stack);
>>> + OFFSET(SSE_REG_EVT_SHADOW_STACK, sse_event_arch_data, shadow_stack);
>>> + OFFSET(SSE_REG_EVT_TMP, sse_event_arch_data, tmp);
>>> +
>>> + DEFINE(SBI_EXT_SSE, SBI_EXT_SSE);
>>> + DEFINE(SBI_SSE_EVENT_COMPLETE, SBI_SSE_EVENT_COMPLETE);
>>> + DEFINE(NR_CPUS, NR_CPUS);
>>> +#endif
>>> }
>>> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
>>> new file mode 100644
>>> index 000000000000..b48ae69dad8d
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/sse.c
>>> @@ -0,0 +1,134 @@
>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +#include <linux/nmi.h>
>>> +#include <linux/scs.h>
>>> +#include <linux/bitfield.h>
>>> +#include <linux/riscv_sse.h>
>>> +#include <linux/percpu-defs.h>
>>> +
>>> +#include <asm/asm-prototypes.h>
>>> +#include <asm/switch_to.h>
>>> +#include <asm/irq_stack.h>
>>> +#include <asm/sbi.h>
>>> +#include <asm/sse.h>
>>> +
>>> +DEFINE_PER_CPU(struct task_struct *, __sse_entry_task);
>>> +
>>> +void __weak sse_handle_event(struct sse_event_arch_data *arch_evt,
>>> struct pt_regs *regs)
>>> +{
>>> +}
>>> +
>>> +void do_sse(struct sse_event_arch_data *arch_evt, struct pt_regs *regs)
>>> +{
>>> + nmi_enter();
>>> +
>>> + /* Retrieve missing GPRs from SBI */
>>> + sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_ATTR_READ, arch_evt->evt_id,
>>> + SBI_SSE_ATTR_INTERRUPTED_A6,
>>> + (SBI_SSE_ATTR_INTERRUPTED_A7 - SBI_SSE_ATTR_INTERRUPTED_A6)
>>> + 1,
>>> + arch_evt->interrupted_state_phys, 0, 0);
>>> +
>>> + memcpy(®s->a6, &arch_evt->interrupted, sizeof(arch_evt-
>>>> interrupted));
>>> +
>>> + sse_handle_event(arch_evt, regs);
>>> +
>>> + /*
>>> + * The SSE delivery path does not uses the "standard" exception
>>> path and
>>> + * thus does not process any pending signal/softirqs. Some
>>> drivers might
>>> + * enqueue pending work that needs to be handled as soon as
>>> possible.
>>> + * For that purpose, set the software interrupt pending bit which
>>> will
>>> + * be serviced once interrupts are reenabled
>>> + */
>>> + csr_set(CSR_IP, IE_SIE);
>>
>> This looks a bit hackish and under performant to trigger an IRQ at each
>> SSE event, why is it necessary? I understand that we may want to service
>> signals right away, for example in case of a uncorrectable memory error
>> in order to send a SIGBUS to the process before it goes on, but why
>> should we care about softirqs here?
> Hi Alex,
>
> SSE events are run in a NMI context. Basically, nothing is executed in
> this context, except signaling that there is work to do. For instance,
> the GHES handler (currently in a ventana branch) just enqueue some work
> to be done in a workqueue. The same goes for the PMU, it just enqueue
> some work in case of a NMI.
>
> While it might not be strictly necessary for the PMU, it is for the GHES
> handler. Not doing so would allow the user application to continue it's
> execution until the next IRQ even though an error was reported. A late
> signal handling coulmd be really problematic. That would be even worse
> for the kernel.
>
> ARM SDEI does the same, except for a single case that I can add (ie,
> interrupted a kernel with interrupts disabled, thus there is no need to
> trig softirqs, they will be handled when returning from it).
Ok got it, thanks.
>
>>
>>> +
>>> + nmi_exit();
>>> +}
>>> +
>>> +#ifdef CONFIG_VMAP_STACK
>>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>>> size)
>>> +{
>>> + return arch_alloc_vmap_stack(size, cpu_to_node(cpu));
>>> +}
>>> +
>>> +static void sse_stack_free(unsigned long *stack)
>>> +{
>>> + vfree(stack);
>>> +}
>>> +#else /* CONFIG_VMAP_STACK */
>>> +
>>> +static unsigned long *sse_stack_alloc(unsigned int cpu, unsigned int
>>> size)
>>> +{
>>> + return kmalloc(size, GFP_KERNEL);
>>> +}
>>> +
>>> +static void sse_stack_free(unsigned long *stack)
>>> +{
>>> + kfree(stack);
>>> +}
>>> +
>>> +#endif /* CONFIG_VMAP_STACK */
>>
>> Can't we use kvmalloc() here to avoid the #ifdef? Or is there a real
>> benefit of using vmalloced stacks?
> I believe the goal is not the same. Using CONFIG_VMAP_STACK allows the
> kernel exception handling to catch any stack overflow when entering the
> kernel and thus using vmalloc is required to allocate twice the page
> size (overflow is when sp is located in the upper half of the allocated
> vmalloc stack. So basically, this is two distinct purposes.
>
> AFAIU, kvmalloc allows to fallback to vmalloc if kmalloc fails. This is
> not what we are looking for here since our allocation size is always
> quite small and known (STACK_SIZE basically).
>
> But I might be missing something.
arch_alloc_vmap_stack() only vmalloc the stack and does not implement
any stack overflow mechanism, so I'm still unsure we need the define.
Thanks,
Alex
>
>>
>>> +
>>> +static int sse_init_scs(int cpu, struct sse_event_arch_data *arch_evt)
>>> +{
>>> + void *stack;
>>> +
>>> + if (!scs_is_enabled())
>>> + return 0;
>>> +
>>> + stack = scs_alloc(cpu_to_node(cpu));
>>> + if (!stack)
>>> + return 1;
>>
>> Nit: return -ENOMEM
> That's better indeed.
>
>>
>>> +
>>> + arch_evt->shadow_stack = stack;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32
>>> evt_id, int cpu)
>>> +{
>>> + void *stack;
>>> +
>>> + arch_evt->evt_id = evt_id;
>>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>>> + if (!stack)
>>> + return -ENOMEM;
>>> +
>>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>>> +
>>> + if (sse_init_scs(cpu, arch_evt))
>>> + goto free_stack;
>>> +
>>> + if (is_kernel_percpu_address((unsigned long)&arch_evt-
>>>> interrupted)) {
>>> + arch_evt->interrupted_state_phys =
>>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>>> + } else {
>>> + arch_evt->interrupted_state_phys =
>>> + virt_to_phys(&arch_evt->interrupted);
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +free_stack:
>>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>>> +
>>> + return -ENOMEM;
>>> +}
>>> +
>>> +void arch_sse_free_event(struct sse_event_arch_data *arch_evt)
>>> +{
>>> + scs_free(arch_evt->shadow_stack);
>>> + sse_stack_free(arch_evt->stack - SSE_STACK_SIZE);
>>> +}
>>> +
>>> +int arch_sse_register_event(struct sse_event_arch_data *arch_evt)
>>> +{
>>> + struct sbiret sret;
>>> +
>>> + sret = sbi_ecall(SBI_EXT_SSE, SBI_SSE_EVENT_REGISTER, arch_evt-
>>>> evt_id,
>>> + (unsigned long) handle_sse, (unsigned long) arch_evt,
>>> + 0, 0, 0);
>>> +
>>> + return sbi_err_map_linux_errno(sret.error);
>>> +}
>>> diff --git a/arch/riscv/kernel/sse_entry.S b/arch/riscv/kernel/
>>> sse_entry.S
>>> new file mode 100644
>>> index 000000000000..0b2f890edd89
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/sse_entry.S
>>> @@ -0,0 +1,171 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2024 Rivos Inc.
>>> + */
>>> +
>>> +#include <linux/init.h>
>>> +#include <linux/linkage.h>
>>> +
>>> +#include <asm/asm.h>
>>> +#include <asm/csr.h>
>>> +#include <asm/scs.h>
>>> +
>>> +/* When entering handle_sse, the following registers are set:
>>> + * a6: contains the hartid
>>> + * a6: contains struct sse_registered_event pointer
>>> + */
>>> +SYM_CODE_START(handle_sse)
>>> + /* Save stack temporarily */
>>> + REG_S sp, SSE_REG_EVT_TMP(a7)
>>> + /* Set entry stack */
>>> + REG_L sp, SSE_REG_EVT_STACK(a7)
>>> +
>>> + addi sp, sp, -(PT_SIZE_ON_STACK)
>>> + REG_S ra, PT_RA(sp)
>>> + REG_S s0, PT_S0(sp)
>>> + REG_S s1, PT_S1(sp)
>>> + REG_S s2, PT_S2(sp)
>>> + REG_S s3, PT_S3(sp)
>>> + REG_S s4, PT_S4(sp)
>>> + REG_S s5, PT_S5(sp)
>>> + REG_S s6, PT_S6(sp)
>>> + REG_S s7, PT_S7(sp)
>>> + REG_S s8, PT_S8(sp)
>>> + REG_S s9, PT_S9(sp)
>>> + REG_S s10, PT_S10(sp)
>>> + REG_S s11, PT_S11(sp)
>>> + REG_S tp, PT_TP(sp)
>>> + REG_S t0, PT_T0(sp)
>>> + REG_S t1, PT_T1(sp)
>>> + REG_S t2, PT_T2(sp)
>>> + REG_S t3, PT_T3(sp)
>>> + REG_S t4, PT_T4(sp)
>>> + REG_S t5, PT_T5(sp)
>>> + REG_S t6, PT_T6(sp)
>>> + REG_S gp, PT_GP(sp)
>>> + REG_S a0, PT_A0(sp)
>>> + REG_S a1, PT_A1(sp)
>>> + REG_S a2, PT_A2(sp)
>>> + REG_S a3, PT_A3(sp)
>>> + REG_S a4, PT_A4(sp)
>>> + REG_S a5, PT_A5(sp)
>>> +
>>> + /* Retrieve entry sp */
>>> + REG_L a4, SSE_REG_EVT_TMP(a7)
>>> + /* Save CSRs */
>>> + csrr a0, CSR_EPC
>>> + csrr a1, CSR_SSTATUS
>>> + csrr a2, CSR_STVAL
>>> + csrr a3, CSR_SCAUSE
>>> +
>>> + REG_S a0, PT_EPC(sp)
>>> + REG_S a1, PT_STATUS(sp)
>>> + REG_S a2, PT_BADADDR(sp)
>>> + REG_S a3, PT_CAUSE(sp)
>>> + REG_S a4, PT_SP(sp)
>>> +
>>> + /* Disable user memory access and floating/vector computing */
>>> + li t0, SR_SUM | SR_FS_VS
>>> + csrc CSR_STATUS, t0
>>> +
>>> + load_global_pointer
>>> + scs_load_sse_stack a7
>>> +
>>> + /* Restore current task struct from __sse_entry_task */
>>> + li t1, NR_CPUS
>>> + move t3, zero
>>> +
>>> +#ifdef CONFIG_SMP
>>> + /* Find the CPU id associated to the hart id */
>>> + la t0, __cpuid_to_hartid_map
>>> +.Lhart_id_loop:
>>> + REG_L t2, 0(t0)
>>> + beq t2, a6, .Lcpu_id_found
>>> +
>>> + /* Increment pointer and CPU number */
>>> + addi t3, t3, 1
>>> + addi t0, t0, RISCV_SZPTR
>>> + bltu t3, t1, .Lhart_id_loop
>>> +
>>> + /*
>>> + * This should never happen since we expect the hart_id to match one
>>> + * of our CPU, but better be safe than sorry
>>> + */
>>> + la tp, init_task
>>> + la a0, sse_hart_id_panic_string
>>> + la t0, panic
>>> + jalr t0
>>> +
>>> +.Lcpu_id_found:
>>> +#endif
>>> + asm_per_cpu_with_cpu t2 __sse_entry_task t1 t3
>>> + REG_L tp, 0(t2)
>>> +
>>> + move a1, sp /* pt_regs on stack */
>>> + /* Kernel was interrupted, create stack frame */
>>> + beqz s1, .Lcall_do_sse
>>
>> I don't understand this since in any case we will go to .Lcall_do_sse
>> right? And I don't see where s1 is initialized.
> Yeah indeed, that's a leftover of some stack frame creation that I
> forgot to remove. I'll remove that !
>
> Thanks for the review.
>
> Clément
>
>>
>>> +
>>> +.Lcall_do_sse:
>>> + /*
>>> + * Save sscratch for restoration since we might have interrupted the
>>> + * kernel in early exception path and thus, we don't know the
>>> content of
>>> + * sscratch.
>>> + */
>>> + csrr s4, CSR_SSCRATCH
>>> + /* In-kernel scratch is 0 */
>>> + csrw CSR_SCRATCH, x0
>>> +
>>> + move a0, a7
>>> +
>>> + call do_sse
>>> +
>>> + csrw CSR_SSCRATCH, s4
>>> +
>>> + REG_L a0, PT_EPC(sp)
>>> + REG_L a1, PT_STATUS(sp)
>>> + REG_L a2, PT_BADADDR(sp)
>>> + REG_L a3, PT_CAUSE(sp)
>>> + csrw CSR_EPC, a0
>>> + csrw CSR_SSTATUS, a1
>>> + csrw CSR_STVAL, a2
>>> + csrw CSR_SCAUSE, a3
>>> +
>>> + REG_L ra, PT_RA(sp)
>>> + REG_L s0, PT_S0(sp)
>>> + REG_L s1, PT_S1(sp)
>>> + REG_L s2, PT_S2(sp)
>>> + REG_L s3, PT_S3(sp)
>>> + REG_L s4, PT_S4(sp)
>>> + REG_L s5, PT_S5(sp)
>>> + REG_L s6, PT_S6(sp)
>>> + REG_L s7, PT_S7(sp)
>>> + REG_L s8, PT_S8(sp)
>>> + REG_L s9, PT_S9(sp)
>>> + REG_L s10, PT_S10(sp)
>>> + REG_L s11, PT_S11(sp)
>>> + REG_L tp, PT_TP(sp)
>>> + REG_L t0, PT_T0(sp)
>>> + REG_L t1, PT_T1(sp)
>>> + REG_L t2, PT_T2(sp)
>>> + REG_L t3, PT_T3(sp)
>>> + REG_L t4, PT_T4(sp)
>>> + REG_L t5, PT_T5(sp)
>>> + REG_L t6, PT_T6(sp)
>>> + REG_L gp, PT_GP(sp)
>>> + REG_L a0, PT_A0(sp)
>>> + REG_L a1, PT_A1(sp)
>>> + REG_L a2, PT_A2(sp)
>>> + REG_L a3, PT_A3(sp)
>>> + REG_L a4, PT_A4(sp)
>>> + REG_L a5, PT_A5(sp)
>>> +
>>> + REG_L sp, PT_SP(sp)
>>> +
>>> + li a7, SBI_EXT_SSE
>>> + li a6, SBI_SSE_EVENT_COMPLETE
>>> + ecall
>>> +
>>> +SYM_CODE_END(handle_sse)
>>> +
>>> +sse_hart_id_panic_string:
>>> + .ascii "Unable to match hart_id with cpu\0"
>>
>> Thanks,
>>
>> Alex
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-27 8:09 ` Alexandre Ghiti
@ 2025-01-28 8:10 ` Clément Léger
2025-01-30 10:01 ` Alexandre Ghiti
0 siblings, 1 reply; 22+ messages in thread
From: Clément Léger @ 2025-01-28 8:10 UTC (permalink / raw)
To: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, linux-riscv,
linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
On 27/01/2025 09:09, Alexandre Ghiti wrote:
>> I believe the goal is not the same. Using CONFIG_VMAP_STACK allows the
>> kernel exception handling to catch any stack overflow when entering the
>> kernel and thus using vmalloc is required to allocate twice the page
>> size (overflow is when sp is located in the upper half of the allocated
>> vmalloc stack. So basically, this is two distinct purposes.
>>
>> AFAIU, kvmalloc allows to fallback to vmalloc if kmalloc fails. This is
>> not what we are looking for here since our allocation size is always
>> quite small and known (STACK_SIZE basically).
>>
>> But I might be missing something.
>
>
> arch_alloc_vmap_stack() only vmalloc the stack and does not implement
> any stack overflow mechanism, so I'm still unsure we need the define.
Hi Alex,
So actually, the stack overflow check itself is done in the exception
entry. It check if the stack pointer did passed in the upper part of the
vmalloc allocation (see entry.S:122). In this allocation, the stack size
is actually * 2:
#ifdef CONFIG_VMAP_STACK
#define THREAD_ALIGN (2 * THREAD_SIZE)
#else
#define THREAD_ALIGN THREAD_SIZE
#endif
So even though it does nothing special by itself, it centralize the
allocation size/method. And size the size is larger, using vamlloc makes
sense I guess. The same mechanism is used to allocate irq stack as well.
Thanks,
Clément
>
> Thanks,
>
> Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-01-28 8:10 ` Clément Léger
@ 2025-01-30 10:01 ` Alexandre Ghiti
0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2025-01-30 10:01 UTC (permalink / raw)
To: Clément Léger, Paul Walmsley, Palmer Dabbelt,
linux-riscv, linux-kernel, linux-arm-kernel
Cc: Himanshu Chauhan, Anup Patel, Xu Lu, Atish Patra
Hi Clément,
On 28/01/2025 09:10, Clément Léger wrote:
>
> On 27/01/2025 09:09, Alexandre Ghiti wrote:
>>> I believe the goal is not the same. Using CONFIG_VMAP_STACK allows the
>>> kernel exception handling to catch any stack overflow when entering the
>>> kernel and thus using vmalloc is required to allocate twice the page
>>> size (overflow is when sp is located in the upper half of the allocated
>>> vmalloc stack. So basically, this is two distinct purposes.
>>>
>>> AFAIU, kvmalloc allows to fallback to vmalloc if kmalloc fails. This is
>>> not what we are looking for here since our allocation size is always
>>> quite small and known (STACK_SIZE basically).
>>>
>>> But I might be missing something.
>>
>> arch_alloc_vmap_stack() only vmalloc the stack and does not implement
>> any stack overflow mechanism, so I'm still unsure we need the define.
> Hi Alex,
>
> So actually, the stack overflow check itself is done in the exception
> entry. It check if the stack pointer did passed in the upper part of the
> vmalloc allocation (see entry.S:122). In this allocation, the stack size
> is actually * 2:
>
> #ifdef CONFIG_VMAP_STACK
> #define THREAD_ALIGN (2 * THREAD_SIZE)
> #else
> #define THREAD_ALIGN THREAD_SIZE
> #endif
>
> So even though it does nothing special by itself, it centralize the
> allocation size/method. And size the size is larger, using vamlloc makes
> sense I guess. The same mechanism is used to allocate irq stack as well.
You're right, it makes sense! Nit: we can avoid the ifdef by using
IS_ENABLED() but do as you prefer.
Thanks for the explanation,
Alex
>
> Thanks,
>
> Clément
>
>> Thanks,
>>
>> Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
2024-12-10 4:51 ` Himanshu Chauhan
2025-01-22 12:15 ` Alexandre Ghiti
@ 2025-03-19 17:08 ` Andrew Jones
2025-03-20 8:16 ` Clément Léger
2 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2025-03-19 17:08 UTC (permalink / raw)
To: Clément Léger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
On Fri, Dec 06, 2024 at 05:30:58PM +0100, Clément Léger wrote:
...
> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
> +{
> + void *stack;
> +
> + arch_evt->evt_id = evt_id;
> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
> + if (!stack)
> + return -ENOMEM;
> +
> + arch_evt->stack = stack + SSE_STACK_SIZE;
> +
> + if (sse_init_scs(cpu, arch_evt))
> + goto free_stack;
> +
> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> + arch_evt->interrupted_state_phys =
> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> + } else {
> + arch_evt->interrupted_state_phys =
> + virt_to_phys(&arch_evt->interrupted);
> + }
> +
> + return 0;
Hi Clément,
Testing SSE support with tools/testing/selftests/kvm/riscv/sbi_pmu_test
led to an opensbi sbi_trap_error because the output_phys_lo address passed
to sbi_sse_read_attrs() wasn't a physical address. The reason is that
is_kernel_percpu_address() can only be used on static percpu addresses,
but local sse events get their percpu addresses with alloc_percpu(), so
is_kernel_percpu_address() was returning false even for local events. I
made the following changes to get things working.
Thanks,
drew
diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
index b48ae69dad8d..f46893946086 100644
--- a/arch/riscv/kernel/sse.c
+++ b/arch/riscv/kernel/sse.c
@@ -100,12 +100,12 @@ int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cp
if (sse_init_scs(cpu, arch_evt))
goto free_stack;
- if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
+ if (sse_event_is_global(evt_id)) {
arch_evt->interrupted_state_phys =
- per_cpu_ptr_to_phys(&arch_evt->interrupted);
+ virt_to_phys(&arch_evt->interrupted);
} else {
arch_evt->interrupted_state_phys =
- virt_to_phys(&arch_evt->interrupted);
+ per_cpu_ptr_to_phys(&arch_evt->interrupted);
}
return 0;
diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
index 511db9ad7a9e..fef375046f75 100644
--- a/drivers/firmware/riscv/riscv_sse.c
+++ b/drivers/firmware/riscv/riscv_sse.c
@@ -62,11 +62,6 @@ void sse_handle_event(struct sse_event_arch_data *arch_event,
ret);
}
-static bool sse_event_is_global(u32 evt)
-{
- return !!(evt & SBI_SSE_EVENT_GLOBAL);
-}
-
static
struct sse_event *sse_event_get(u32 evt)
{
diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
index 16700677f1e8..06b757b036b0 100644
--- a/include/linux/riscv_sse.h
+++ b/include/linux/riscv_sse.h
@@ -8,6 +8,7 @@
#include <linux/types.h>
#include <linux/linkage.h>
+#include <asm/sbi.h>
struct sse_event;
struct pt_regs;
@@ -16,6 +17,11 @@ struct ghes;
typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
+static inline bool sse_event_is_global(u32 evt)
+{
+ return !!(evt & SBI_SSE_EVENT_GLOBAL);
+}
+
#ifdef CONFIG_RISCV_SSE
struct sse_event *sse_event_register(u32 event_num, u32 priority,
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-03-19 17:08 ` Andrew Jones
@ 2025-03-20 8:16 ` Clément Léger
2025-03-20 11:52 ` Andrew Jones
0 siblings, 1 reply; 22+ messages in thread
From: Clément Léger @ 2025-03-20 8:16 UTC (permalink / raw)
To: Andrew Jones
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
On 19/03/2025 18:08, Andrew Jones wrote:
> On Fri, Dec 06, 2024 at 05:30:58PM +0100, Clément Léger wrote:
> ...
>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
>> +{
>> + void *stack;
>> +
>> + arch_evt->evt_id = evt_id;
>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>> + if (!stack)
>> + return -ENOMEM;
>> +
>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>> +
>> + if (sse_init_scs(cpu, arch_evt))
>> + goto free_stack;
>> +
>> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
>> + arch_evt->interrupted_state_phys =
>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>> + } else {
>> + arch_evt->interrupted_state_phys =
>> + virt_to_phys(&arch_evt->interrupted);
>> + }
>> +
>> + return 0;
>
> Hi Clément,
>
> Testing SSE support with tools/testing/selftests/kvm/riscv/sbi_pmu_test
> led to an opensbi sbi_trap_error because the output_phys_lo address passed
> to sbi_sse_read_attrs() wasn't a physical address. The reason is that
> is_kernel_percpu_address() can only be used on static percpu addresses,
> but local sse events get their percpu addresses with alloc_percpu(), so
> is_kernel_percpu_address() was returning false even for local events. I
> made the following changes to get things working.
Hi Andrew,
Did something changed recently ? Because I tested that when it was send
(PMU + some kernel internal testsuite) and didn't saw that. Anyway, I'll
respin it with your changes as well.
Thanks !
Clément
>
> Thanks,
> drew
>
> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
> index b48ae69dad8d..f46893946086 100644
> --- a/arch/riscv/kernel/sse.c
> +++ b/arch/riscv/kernel/sse.c
> @@ -100,12 +100,12 @@ int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cp
> if (sse_init_scs(cpu, arch_evt))
> goto free_stack;
>
> - if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> + if (sse_event_is_global(evt_id)) {
> arch_evt->interrupted_state_phys =
> - per_cpu_ptr_to_phys(&arch_evt->interrupted);
> + virt_to_phys(&arch_evt->interrupted);
> } else {
> arch_evt->interrupted_state_phys =
> - virt_to_phys(&arch_evt->interrupted);
> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> }
>
> return 0;
> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
> index 511db9ad7a9e..fef375046f75 100644
> --- a/drivers/firmware/riscv/riscv_sse.c
> +++ b/drivers/firmware/riscv/riscv_sse.c
> @@ -62,11 +62,6 @@ void sse_handle_event(struct sse_event_arch_data *arch_event,
> ret);
> }
>
> -static bool sse_event_is_global(u32 evt)
> -{
> - return !!(evt & SBI_SSE_EVENT_GLOBAL);
> -}
> -
> static
> struct sse_event *sse_event_get(u32 evt)
> {
> diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
> index 16700677f1e8..06b757b036b0 100644
> --- a/include/linux/riscv_sse.h
> +++ b/include/linux/riscv_sse.h
> @@ -8,6 +8,7 @@
>
> #include <linux/types.h>
> #include <linux/linkage.h>
> +#include <asm/sbi.h>
>
> struct sse_event;
> struct pt_regs;
> @@ -16,6 +17,11 @@ struct ghes;
>
> typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
>
> +static inline bool sse_event_is_global(u32 evt)
> +{
> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
> +}
> +
> #ifdef CONFIG_RISCV_SSE
>
> struct sse_event *sse_event_register(u32 event_num, u32 priority,
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-03-20 8:16 ` Clément Léger
@ 2025-03-20 11:52 ` Andrew Jones
2025-03-20 12:26 ` Clément Léger
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2025-03-20 11:52 UTC (permalink / raw)
To: Clément Léger
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
On Thu, Mar 20, 2025 at 09:16:07AM +0100, Clément Léger wrote:
>
>
> On 19/03/2025 18:08, Andrew Jones wrote:
> > On Fri, Dec 06, 2024 at 05:30:58PM +0100, Clément Léger wrote:
> > ...
> >> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
> >> +{
> >> + void *stack;
> >> +
> >> + arch_evt->evt_id = evt_id;
> >> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
> >> + if (!stack)
> >> + return -ENOMEM;
> >> +
> >> + arch_evt->stack = stack + SSE_STACK_SIZE;
> >> +
> >> + if (sse_init_scs(cpu, arch_evt))
> >> + goto free_stack;
> >> +
> >> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> >> + arch_evt->interrupted_state_phys =
> >> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> >> + } else {
> >> + arch_evt->interrupted_state_phys =
> >> + virt_to_phys(&arch_evt->interrupted);
> >> + }
> >> +
> >> + return 0;
> >
> > Hi Clément,
> >
> > Testing SSE support with tools/testing/selftests/kvm/riscv/sbi_pmu_test
> > led to an opensbi sbi_trap_error because the output_phys_lo address passed
> > to sbi_sse_read_attrs() wasn't a physical address. The reason is that
> > is_kernel_percpu_address() can only be used on static percpu addresses,
> > but local sse events get their percpu addresses with alloc_percpu(), so
> > is_kernel_percpu_address() was returning false even for local events. I
> > made the following changes to get things working.
>
> Hi Andrew,
>
> Did something changed recently ? Because I tested that when it was send
> (PMU + some kernel internal testsuite) and didn't saw that. Anyway, I'll
> respin it with your changes as well.
It depends on the kernel config. Configs that don't have many
alloc_percpu() calls prior to the one made by sse can work, because,
iiuc, alloc_percpu() will get its allocation from the percpu allocator's
first chunk until that chunck fills up. The first chunck is shared with
the static allocations.
Thanks,
drew
>
> Thanks !
>
> Clément
>
> >
> > Thanks,
> > drew
> >
> > diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
> > index b48ae69dad8d..f46893946086 100644
> > --- a/arch/riscv/kernel/sse.c
> > +++ b/arch/riscv/kernel/sse.c
> > @@ -100,12 +100,12 @@ int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cp
> > if (sse_init_scs(cpu, arch_evt))
> > goto free_stack;
> >
> > - if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
> > + if (sse_event_is_global(evt_id)) {
> > arch_evt->interrupted_state_phys =
> > - per_cpu_ptr_to_phys(&arch_evt->interrupted);
> > + virt_to_phys(&arch_evt->interrupted);
> > } else {
> > arch_evt->interrupted_state_phys =
> > - virt_to_phys(&arch_evt->interrupted);
> > + per_cpu_ptr_to_phys(&arch_evt->interrupted);
> > }
> >
> > return 0;
> > diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
> > index 511db9ad7a9e..fef375046f75 100644
> > --- a/drivers/firmware/riscv/riscv_sse.c
> > +++ b/drivers/firmware/riscv/riscv_sse.c
> > @@ -62,11 +62,6 @@ void sse_handle_event(struct sse_event_arch_data *arch_event,
> > ret);
> > }
> >
> > -static bool sse_event_is_global(u32 evt)
> > -{
> > - return !!(evt & SBI_SSE_EVENT_GLOBAL);
> > -}
> > -
> > static
> > struct sse_event *sse_event_get(u32 evt)
> > {
> > diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
> > index 16700677f1e8..06b757b036b0 100644
> > --- a/include/linux/riscv_sse.h
> > +++ b/include/linux/riscv_sse.h
> > @@ -8,6 +8,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/linkage.h>
> > +#include <asm/sbi.h>
> >
> > struct sse_event;
> > struct pt_regs;
> > @@ -16,6 +17,11 @@ struct ghes;
> >
> > typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
> >
> > +static inline bool sse_event_is_global(u32 evt)
> > +{
> > + return !!(evt & SBI_SSE_EVENT_GLOBAL);
> > +}
> > +
> > #ifdef CONFIG_RISCV_SSE
> >
> > struct sse_event *sse_event_register(u32 event_num, u32 priority,
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension
2025-03-20 11:52 ` Andrew Jones
@ 2025-03-20 12:26 ` Clément Léger
0 siblings, 0 replies; 22+ messages in thread
From: Clément Léger @ 2025-03-20 12:26 UTC (permalink / raw)
To: Andrew Jones
Cc: Paul Walmsley, Palmer Dabbelt, linux-riscv, linux-kernel,
linux-arm-kernel, Himanshu Chauhan, Anup Patel, Xu Lu,
Atish Patra
On 20/03/2025 12:52, Andrew Jones wrote:
> On Thu, Mar 20, 2025 at 09:16:07AM +0100, Clément Léger wrote:
>>
>>
>> On 19/03/2025 18:08, Andrew Jones wrote:
>>> On Fri, Dec 06, 2024 at 05:30:58PM +0100, Clément Léger wrote:
>>> ...
>>>> +int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cpu)
>>>> +{
>>>> + void *stack;
>>>> +
>>>> + arch_evt->evt_id = evt_id;
>>>> + stack = sse_stack_alloc(cpu, SSE_STACK_SIZE);
>>>> + if (!stack)
>>>> + return -ENOMEM;
>>>> +
>>>> + arch_evt->stack = stack + SSE_STACK_SIZE;
>>>> +
>>>> + if (sse_init_scs(cpu, arch_evt))
>>>> + goto free_stack;
>>>> +
>>>> + if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
>>>> + arch_evt->interrupted_state_phys =
>>>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>>>> + } else {
>>>> + arch_evt->interrupted_state_phys =
>>>> + virt_to_phys(&arch_evt->interrupted);
>>>> + }
>>>> +
>>>> + return 0;
>>>
>>> Hi Clément,
>>>
>>> Testing SSE support with tools/testing/selftests/kvm/riscv/sbi_pmu_test
>>> led to an opensbi sbi_trap_error because the output_phys_lo address passed
>>> to sbi_sse_read_attrs() wasn't a physical address. The reason is that
>>> is_kernel_percpu_address() can only be used on static percpu addresses,
>>> but local sse events get their percpu addresses with alloc_percpu(), so
>>> is_kernel_percpu_address() was returning false even for local events. I
>>> made the following changes to get things working.
>>
>> Hi Andrew,
>>
>> Did something changed recently ? Because I tested that when it was send
>> (PMU + some kernel internal testsuite) and didn't saw that. Anyway, I'll
>> respin it with your changes as well.
>
> It depends on the kernel config. Configs that don't have many
> alloc_percpu() calls prior to the one made by sse can work, because,
> iiuc, alloc_percpu() will get its allocation from the percpu allocator's
> first chunk until that chunck fills up. The first chunck is shared with
> the static allocations.
Makes sense ! Thanks, I'll look at it.
>
> Thanks,
> drew
>
>>
>> Thanks !
>>
>> Clément
>>
>>>
>>> Thanks,
>>> drew
>>>
>>> diff --git a/arch/riscv/kernel/sse.c b/arch/riscv/kernel/sse.c
>>> index b48ae69dad8d..f46893946086 100644
>>> --- a/arch/riscv/kernel/sse.c
>>> +++ b/arch/riscv/kernel/sse.c
>>> @@ -100,12 +100,12 @@ int arch_sse_init_event(struct sse_event_arch_data *arch_evt, u32 evt_id, int cp
>>> if (sse_init_scs(cpu, arch_evt))
>>> goto free_stack;
>>>
>>> - if (is_kernel_percpu_address((unsigned long)&arch_evt->interrupted)) {
>>> + if (sse_event_is_global(evt_id)) {
>>> arch_evt->interrupted_state_phys =
>>> - per_cpu_ptr_to_phys(&arch_evt->interrupted);
>>> + virt_to_phys(&arch_evt->interrupted);
>>> } else {
>>> arch_evt->interrupted_state_phys =
>>> - virt_to_phys(&arch_evt->interrupted);
>>> + per_cpu_ptr_to_phys(&arch_evt->interrupted);
>>> }
>>>
>>> return 0;
>>> diff --git a/drivers/firmware/riscv/riscv_sse.c b/drivers/firmware/riscv/riscv_sse.c
>>> index 511db9ad7a9e..fef375046f75 100644
>>> --- a/drivers/firmware/riscv/riscv_sse.c
>>> +++ b/drivers/firmware/riscv/riscv_sse.c
>>> @@ -62,11 +62,6 @@ void sse_handle_event(struct sse_event_arch_data *arch_event,
>>> ret);
>>> }
>>>
>>> -static bool sse_event_is_global(u32 evt)
>>> -{
>>> - return !!(evt & SBI_SSE_EVENT_GLOBAL);
>>> -}
>>> -
>>> static
>>> struct sse_event *sse_event_get(u32 evt)
>>> {
>>> diff --git a/include/linux/riscv_sse.h b/include/linux/riscv_sse.h
>>> index 16700677f1e8..06b757b036b0 100644
>>> --- a/include/linux/riscv_sse.h
>>> +++ b/include/linux/riscv_sse.h
>>> @@ -8,6 +8,7 @@
>>>
>>> #include <linux/types.h>
>>> #include <linux/linkage.h>
>>> +#include <asm/sbi.h>
>>>
>>> struct sse_event;
>>> struct pt_regs;
>>> @@ -16,6 +17,11 @@ struct ghes;
>>>
>>> typedef int (sse_event_handler)(u32 event_num, void *arg, struct pt_regs *regs);
>>>
>>> +static inline bool sse_event_is_global(u32 evt)
>>> +{
>>> + return !!(evt & SBI_SSE_EVENT_GLOBAL);
>>> +}
>>> +
>>> #ifdef CONFIG_RISCV_SSE
>>>
>>> struct sse_event *sse_event_register(u32 event_num, u32 priority,
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-03-20 12:28 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 16:30 [PATCH v3 0/4] riscv: add support for SBI Supervisor Software Events Clément Léger
2024-12-06 16:30 ` [PATCH v3 1/4] riscv: add SBI SSE extension definitions Clément Léger
2024-12-06 16:30 ` [PATCH v3 2/4] riscv: add support for SBI Supervisor Software Events extension Clément Léger
2024-12-10 4:51 ` Himanshu Chauhan
2025-01-22 12:15 ` Alexandre Ghiti
2025-01-22 12:23 ` Alexandre Ghiti
2025-01-23 8:41 ` Clément Léger
2025-01-23 8:39 ` Clément Léger
2025-01-27 8:09 ` Alexandre Ghiti
2025-01-28 8:10 ` Clément Léger
2025-01-30 10:01 ` Alexandre Ghiti
2025-03-19 17:08 ` Andrew Jones
2025-03-20 8:16 ` Clément Léger
2025-03-20 11:52 ` Andrew Jones
2025-03-20 12:26 ` Clément Léger
2024-12-06 16:30 ` [PATCH v3 3/4] drivers: firmware: add riscv SSE support Clément Léger
2024-12-13 5:03 ` Himanshu Chauhan
2024-12-13 8:33 ` Clément Léger
2025-01-16 13:58 ` Conor Dooley
2025-01-23 10:52 ` Clément Léger
2025-01-24 14:15 ` Conor Dooley
2024-12-06 16:31 ` [PATCH v3 4/4] perf: RISC-V: add support for SSE event Clément Léger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).