linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/5] riscv: uaccess: optimisations
@ 2025-04-10  7:05 Cyril Bur
  2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang

This series tries to optimize riscv uaccess by allowing the use of
user_access_begin() and user_access_end() which permits grouping user accesses
and avoiding the CSR write penalty for each access.

The error path can also be optimised using asm goto which patches 3 and 4
achieve. This will speed up jumping to labels by avoiding the need of an
intermediary error type variable within the uaccess macros

I did read the discussion this series generated. It isn't clear to me
which direction to take the patches, if any.

V2:
I've taken on this series as there isn't any response from Jisheng. No
significant changes other than build fixes.
- Fixes build breakage in patch 3 to do with not having used 'goto' keyword.
- Fixes build breakage in patch 4 on 32bit not having declared __ptr in the
  macro.

V3:
Significant commit message rewrites.
 - Corrected the justification for patch 2
 - Better explained/justified patches 3 and 4
Minor code changes for legibility and more comments

V4:
Fixed checkpatch errors
Added a unsafe_copy_from_user()
Added patch from Ben Dooks to save SR_SUM bit on switch

V5:
Fixed mistakes in adding unsafe_copy_from_user()
 - Sorry about the noise

V6:
Reworded patch 1 commit message
Patch 1 no longer clears SR_SUM, just saves/restores

Ben Dooks (1):
  riscv: save the SR_SUM status over switches

Jisheng Zhang (4):
  riscv: implement user_access_begin() and families
  riscv: uaccess: use input constraints for ptr of __put_user()
  riscv: uaccess: use 'asm goto' for put_user()
  riscv: uaccess: use 'asm_goto_output' for get_user()

 arch/riscv/include/asm/processor.h |   1 +
 arch/riscv/include/asm/uaccess.h   | 218 ++++++++++++++++++++++-------
 arch/riscv/kernel/asm-offsets.c    |   5 +
 arch/riscv/kernel/entry.S          |   8 ++
 4 files changed, 179 insertions(+), 53 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
@ 2025-04-10  7:05 ` Cyril Bur
  2025-04-22 10:22   ` Alexandre Ghiti
  2025-04-22 23:01   ` Deepak Gupta
  2025-04-10  7:05 ` [PATCH v6 2/5] riscv: implement user_access_begin() and families Cyril Bur
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

From: Ben Dooks <ben.dooks@codethink.co.uk>

When threads/tasks are switched we need to ensure the old execution's
SR_SUM state is saved and the new thread has the old SR_SUM state
restored.

The issue was seen under heavy load especially with the syz-stress tool
running, with crashes as follows in schedule_tail:

Unable to handle kernel access to user memory without uaccess routines
at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
 ra : task_pid_vnr include/linux/sched.h:1421 [inline]
 ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
 s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
 t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause:
000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---

The issue comes from the put_user() in schedule_tail
(kernel/sched/core.c) doing the following:

asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
...
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
...
}

the put_user() macro causes the code sequence to come out as follows:

1:	__enable_user_access()
2:	reg = task_pid_vnr(current);
3:	*current->set_child_tid = reg;
4:	__disable_user_access()

The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
enabling user access")"

In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.

Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/processor.h | 1 +
 arch/riscv/kernel/asm-offsets.c    | 5 +++++
 arch/riscv/kernel/entry.S          | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 5f56eb9d114a..58fd11c89fe9 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -103,6 +103,7 @@ struct thread_struct {
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
 	unsigned long envcfg;
+	unsigned long status;
 	u32 riscv_v_flags;
 	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 16490755304e..969c65b1fe41 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -34,6 +34,7 @@ void asm_offsets(void)
 	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
 	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
 	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
+	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
 
 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
@@ -346,6 +347,10 @@ void asm_offsets(void)
 		  offsetof(struct task_struct, thread.s[11])
 		- offsetof(struct task_struct, thread.ra)
 	);
+	DEFINE(TASK_THREAD_STATUS_RA,
+		  offsetof(struct task_struct, thread.status)
+		- offsetof(struct task_struct, thread.ra)
+	);
 
 	DEFINE(TASK_THREAD_F0_F0,
 		  offsetof(struct task_struct, thread.fstate.f[0])
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 33a5a9f2a0d4..00bd0de9faa2 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
 	REG_S s9,  TASK_THREAD_S9_RA(a3)
 	REG_S s10, TASK_THREAD_S10_RA(a3)
 	REG_S s11, TASK_THREAD_S11_RA(a3)
+
+	/* save the user space access flag */
+	li    s0, SR_SUM
+	csrr  s1, CSR_STATUS
+	REG_S s1, TASK_THREAD_STATUS_RA(a3)
+
 	/* Save the kernel shadow call stack pointer */
 	scs_save_current
 	/* Restore context from next->thread */
+	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
+	csrs  CSR_STATUS, s0
 	REG_L ra,  TASK_THREAD_RA_RA(a4)
 	REG_L sp,  TASK_THREAD_SP_RA(a4)
 	REG_L s0,  TASK_THREAD_S0_RA(a4)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 2/5] riscv: implement user_access_begin() and families
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
  2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
@ 2025-04-10  7:05 ` Cyril Bur
  2025-04-22 10:26   ` Alexandre Ghiti
  2025-04-10  7:05 ` [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user() Cyril Bur
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang

From: Jisheng Zhang <jszhang@kernel.org>

Currently, when a function like strncpy_from_user() is called,
the userspace access protection is disabled and enabled
for every word read.

By implementing user_access_begin() and families, the protection
is disabled at the beginning of the copy and enabled at the end.

The __inttype macro is borrowed from x86 implementation.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/uaccess.h | 76 ++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index fee56b0c8058..c9a461467bf4 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -61,6 +61,19 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, unsigne
 #define __disable_user_access()							\
 	__asm__ __volatile__ ("csrc sstatus, %0" : : "r" (SR_SUM) : "memory")
 
+/*
+ * This is the smallest unsigned integer type that can fit a value
+ * (up to 'long long')
+ */
+#define __inttype(x) __typeof__(		\
+	__typefits(x, char,			\
+	  __typefits(x, short,			\
+	    __typefits(x, int,			\
+	      __typefits(x, long, 0ULL)))))
+
+#define __typefits(x, type, not) \
+	__builtin_choose_expr(sizeof(x) <= sizeof(type), (unsigned type)0, not)
+
 /*
  * The exception table consists of pairs of addresses: the first is the
  * address of an instruction that is allowed to fault, and the second is
@@ -368,6 +381,69 @@ do {									\
 		goto err_label;						\
 } while (0)
 
+static __must_check __always_inline bool user_access_begin(const void __user *ptr, size_t len)
+{
+	if (unlikely(!access_ok(ptr, len)))
+		return 0;
+	__enable_user_access();
+	return 1;
+}
+#define user_access_begin user_access_begin
+#define user_access_end __disable_user_access
+
+static inline unsigned long user_access_save(void) { return 0UL; }
+static inline void user_access_restore(unsigned long enabled) { }
+
+/*
+ * We want the unsafe accessors to always be inlined and use
+ * the error labels - thus the macro games.
+ */
+#define unsafe_put_user(x, ptr, label)	do {				\
+	long __err = 0;							\
+	__put_user_nocheck(x, (ptr), __err);				\
+	if (__err)							\
+		goto label;						\
+} while (0)
+
+#define unsafe_get_user(x, ptr, label)	do {				\
+	long __err = 0;							\
+	__inttype(*(ptr)) __gu_val;					\
+	__get_user_nocheck(__gu_val, (ptr), __err);			\
+	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
+	if (__err)							\
+		goto label;						\
+} while (0)
+
+#define unsafe_copy_loop(dst, src, len, type, op, label)		\
+	while (len >= sizeof(type)) {					\
+		op(*(type *)(src), (type __user *)(dst), label);	\
+		dst += sizeof(type);					\
+		src += sizeof(type);					\
+		len -= sizeof(type);					\
+	}
+
+#define unsafe_copy_to_user(_dst, _src, _len, label)			\
+do {									\
+	char __user *__ucu_dst = (_dst);				\
+	const char *__ucu_src = (_src);					\
+	size_t __ucu_len = (_len);					\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, unsafe_put_user, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, unsafe_put_user, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, unsafe_put_user, label);	\
+	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, unsafe_put_user, label);	\
+} while (0)
+
+#define unsafe_copy_from_user(_dst, _src, _len, label)			\
+do {									\
+	char *__ucu_dst = (_dst);					\
+	const char __user *__ucu_src = (_src);				\
+	size_t __ucu_len = (_len);					\
+	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u64, unsafe_get_user, label);	\
+	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u32, unsafe_get_user, label);	\
+	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u16, unsafe_get_user, label);	\
+	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u8, unsafe_get_user, label);	\
+} while (0)
+
 #else /* CONFIG_MMU */
 #include <asm-generic/uaccess.h>
 #endif /* CONFIG_MMU */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user()
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
  2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
  2025-04-10  7:05 ` [PATCH v6 2/5] riscv: implement user_access_begin() and families Cyril Bur
@ 2025-04-10  7:05 ` Cyril Bur
  2025-04-22 12:10   ` Alexandre Ghiti
  2025-04-10  7:05 ` [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user() Cyril Bur
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang

From: Jisheng Zhang <jszhang@kernel.org>

Putting ptr in the inputs as opposed to output may seem incorrect but
this is done for a few reasons:
- Not having it in the output permits the use of asm goto in a
  subsequent patch. There are bugs in gcc [1] which would otherwise
  prevent it.
- Since the output memory is userspace there isn't any real benefit from
  telling the compiler about the memory clobber.
- x86, arm and powerpc all use this technique.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/uaccess.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index c9a461467bf4..da36057847f0 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -219,11 +219,11 @@ do {								\
 	__typeof__(*(ptr)) __x = x;				\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
-		"	" insn " %z2, %1\n"			\
+		"	" insn " %z1, %2\n"			\
 		"2:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
-		: "+r" (err), "=m" (*(ptr))			\
-		: "rJ" (__x));					\
+		: "+r" (err)					\
+		: "rJ" (__x), "m"(*(ptr)));			\
 } while (0)
 
 #ifdef CONFIG_64BIT
@@ -236,16 +236,16 @@ do {								\
 	u64 __x = (__typeof__((x)-(x)))(x);			\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
-		"	sw %z3, %1\n"				\
+		"	sw %z1, %3\n"				\
 		"2:\n"						\
-		"	sw %z4, %2\n"				\
+		"	sw %z2, %4\n"				\
 		"3:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0)		\
 		_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0)		\
-		: "+r" (err),					\
-			"=m" (__ptr[__LSW]),			\
-			"=m" (__ptr[__MSW])			\
-		: "rJ" (__x), "rJ" (__x >> 32));		\
+		: "+r" (err)					\
+		: "rJ" (__x), "rJ" (__x >> 32),			\
+			"m" (__ptr[__LSW]),			\
+			"m" (__ptr[__MSW]));			\
 } while (0)
 #endif /* CONFIG_64BIT */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user()
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
                   ` (2 preceding siblings ...)
  2025-04-10  7:05 ` [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user() Cyril Bur
@ 2025-04-10  7:05 ` Cyril Bur
  2025-04-22 10:36   ` Alexandre Ghiti
  2025-04-10  7:05 ` [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user() Cyril Bur
  2025-05-09 17:30 ` [PATCH v6 0/5] riscv: uaccess: optimisations patchwork-bot+linux-riscv
  5 siblings, 1 reply; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang

From: Jisheng Zhang <jszhang@kernel.org>

With 'asm goto' we don't need to test the error etc, the exception just
jumps to the error handling directly.

Because there are no output clobbers which could trigger gcc bugs [1]
the use of asm_goto_output() macro is not necessary here. Not using
asm_goto_output() is desirable as the generated output asm will be
cleaner.

Use of the volatile keyword is redundant as per gcc 14.2.0 manual section
6.48.2.7 Goto Labels:
> Also note that an asm goto statement is always implicitly considered
  volatile.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/uaccess.h | 71 +++++++++++++++-----------------
 1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index da36057847f0..719c9179a751 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -214,61 +214,66 @@ do {								\
 		((x) = (__force __typeof__(x))0, -EFAULT);	\
 })
 
-#define __put_user_asm(insn, x, ptr, err)			\
+#define __put_user_asm(insn, x, ptr, label)			\
 do {								\
 	__typeof__(*(ptr)) __x = x;				\
-	__asm__ __volatile__ (					\
+	asm goto(						\
 		"1:\n"						\
-		"	" insn " %z1, %2\n"			\
-		"2:\n"						\
-		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
-		: "+r" (err)					\
-		: "rJ" (__x), "m"(*(ptr)));			\
+		"	" insn " %z0, %1\n"			\
+		_ASM_EXTABLE(1b, %l2)				\
+		: : "rJ" (__x), "m"(*(ptr)) : : label);		\
 } while (0)
 
 #ifdef CONFIG_64BIT
-#define __put_user_8(x, ptr, err) \
-	__put_user_asm("sd", x, ptr, err)
+#define __put_user_8(x, ptr, label) \
+	__put_user_asm("sd", x, ptr, label)
 #else /* !CONFIG_64BIT */
-#define __put_user_8(x, ptr, err)				\
+#define __put_user_8(x, ptr, label)				\
 do {								\
 	u32 __user *__ptr = (u32 __user *)(ptr);		\
 	u64 __x = (__typeof__((x)-(x)))(x);			\
-	__asm__ __volatile__ (					\
+	asm goto(						\
 		"1:\n"						\
-		"	sw %z1, %3\n"				\
+		"	sw %z0, %2\n"				\
 		"2:\n"						\
-		"	sw %z2, %4\n"				\
-		"3:\n"						\
-		_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0)		\
-		_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0)		\
-		: "+r" (err)					\
-		: "rJ" (__x), "rJ" (__x >> 32),			\
+		"	sw %z1, %3\n"				\
+		_ASM_EXTABLE(1b, %l4)				\
+		_ASM_EXTABLE(2b, %l4)				\
+		: : "rJ" (__x), "rJ" (__x >> 32),		\
 			"m" (__ptr[__LSW]),			\
-			"m" (__ptr[__MSW]));			\
+			"m" (__ptr[__MSW]) : : label);		\
 } while (0)
 #endif /* CONFIG_64BIT */
 
-#define __put_user_nocheck(x, __gu_ptr, __pu_err)					\
+#define __put_user_nocheck(x, __gu_ptr, label)			\
 do {								\
 	switch (sizeof(*__gu_ptr)) {				\
 	case 1:							\
-		__put_user_asm("sb", (x), __gu_ptr, __pu_err);	\
+		__put_user_asm("sb", (x), __gu_ptr, label);	\
 		break;						\
 	case 2:							\
-		__put_user_asm("sh", (x), __gu_ptr, __pu_err);	\
+		__put_user_asm("sh", (x), __gu_ptr, label);	\
 		break;						\
 	case 4:							\
-		__put_user_asm("sw", (x), __gu_ptr, __pu_err);	\
+		__put_user_asm("sw", (x), __gu_ptr, label);	\
 		break;						\
 	case 8:							\
-		__put_user_8((x), __gu_ptr, __pu_err);	\
+		__put_user_8((x), __gu_ptr, label);		\
 		break;						\
 	default:						\
 		BUILD_BUG();					\
 	}							\
 } while (0)
 
+#define __put_user_error(x, ptr, err)				\
+do {								\
+	__label__ err_label;					\
+	__put_user_nocheck(x, ptr, err_label);			\
+	break;							\
+err_label:							\
+	(err) = -EFAULT;					\
+} while (0)
+
 /**
  * __put_user: - Write a simple value into user space, with less checking.
  * @x:   Value to copy to user space.
@@ -299,7 +304,7 @@ do {								\
 	__chk_user_ptr(__gu_ptr);				\
 								\
 	__enable_user_access();					\
-	__put_user_nocheck(__val, __gu_ptr, __pu_err);		\
+	__put_user_error(__val, __gu_ptr, __pu_err);		\
 	__disable_user_access();				\
 								\
 	__pu_err;						\
@@ -373,13 +378,7 @@ do {									\
 } while (0)
 
 #define __put_kernel_nofault(dst, src, type, err_label)			\
-do {									\
-	long __kr_err = 0;						\
-									\
-	__put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err);	\
-	if (unlikely(__kr_err))						\
-		goto err_label;						\
-} while (0)
+	__put_user_nocheck(*((type *)(src)), (type *)(dst), err_label)
 
 static __must_check __always_inline bool user_access_begin(const void __user *ptr, size_t len)
 {
@@ -398,12 +397,8 @@ static inline void user_access_restore(unsigned long enabled) { }
  * We want the unsafe accessors to always be inlined and use
  * the error labels - thus the macro games.
  */
-#define unsafe_put_user(x, ptr, label)	do {				\
-	long __err = 0;							\
-	__put_user_nocheck(x, (ptr), __err);				\
-	if (__err)							\
-		goto label;						\
-} while (0)
+#define unsafe_put_user(x, ptr, label)					\
+	__put_user_nocheck(x, (ptr), label)
 
 #define unsafe_get_user(x, ptr, label)	do {				\
 	long __err = 0;							\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user()
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
                   ` (3 preceding siblings ...)
  2025-04-10  7:05 ` [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user() Cyril Bur
@ 2025-04-10  7:05 ` Cyril Bur
  2025-04-22 12:19   ` Alexandre Ghiti
  2025-05-09 17:30 ` [PATCH v6 0/5] riscv: uaccess: optimisations patchwork-bot+linux-riscv
  5 siblings, 1 reply; 32+ messages in thread
From: Cyril Bur @ 2025-04-10  7:05 UTC (permalink / raw)
  To: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex
  Cc: linux-riscv, linux-kernel, jszhang

From: Jisheng Zhang <jszhang@kernel.org>

With 'asm goto' we don't need to test the error etc, the exception just
jumps to the error handling directly.

Unlike put_user(), get_user() must work around GCC bugs [1] when using
output clobbers in an asm goto statement.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
[Cyril Bur: Rewritten commit message]
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/uaccess.h | 95 +++++++++++++++++++++++---------
 1 file changed, 68 insertions(+), 27 deletions(-)

diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index 719c9179a751..87d01168f80a 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -96,27 +96,58 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, unsigne
  * call.
  */
 
-#define __get_user_asm(insn, x, ptr, err)			\
+#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+#define __get_user_asm(insn, x, ptr, label)			\
+	asm_goto_output(					\
+		"1:\n"						\
+		"	" insn " %0, %1\n"			\
+		_ASM_EXTABLE_UACCESS_ERR(1b, %l2, %0)		\
+		: "=&r" (x)					\
+		: "m" (*(ptr)) : : label)
+#else /* !CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
+#define __get_user_asm(insn, x, ptr, label)			\
 do {								\
-	__typeof__(x) __x;					\
+	long __gua_err = 0;					\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
 		"	" insn " %1, %2\n"			\
 		"2:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %0, %1)	\
-		: "+r" (err), "=&r" (__x)			\
+		: "+r" (__gua_err), "=&r" (x)			\
 		: "m" (*(ptr)));				\
-	(x) = __x;						\
+	if (__gua_err)						\
+		goto label;					\
 } while (0)
+#endif /* CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
 
 #ifdef CONFIG_64BIT
-#define __get_user_8(x, ptr, err) \
-	__get_user_asm("ld", x, ptr, err)
+#define __get_user_8(x, ptr, label) \
+	__get_user_asm("ld", x, ptr, label)
 #else /* !CONFIG_64BIT */
-#define __get_user_8(x, ptr, err)				\
+
+#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+#define __get_user_8(x, ptr, label)				\
+	u32 __user *__ptr = (u32 __user *)(ptr);		\
+	u32 __lo, __hi;						\
+	asm_goto_output(					\
+		"1:\n"						\
+		"	lw %0, %2\n"				\
+		"2:\n"						\
+		"	lw %1, %3\n"				\
+		_ASM_EXTABLE_UACCESS_ERR(1b, %l4, %0)		\
+		_ASM_EXTABLE_UACCESS_ERR(2b, %l4, %0)		\
+		: "=&r" (__lo), "=r" (__hi)			\
+		: "m" (__ptr[__LSW]), "m" (__ptr[__MSW])	\
+		: : label);                                     \
+	(x) = (__typeof__(x))((__typeof__((x) - (x)))(		\
+		(((u64)__hi << 32) | __lo)));			\
+
+#else /* !CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
+#define __get_user_8(x, ptr, label)				\
 do {								\
 	u32 __user *__ptr = (u32 __user *)(ptr);		\
 	u32 __lo, __hi;						\
+	long __gu8_err = 0;					\
 	__asm__ __volatile__ (					\
 		"1:\n"						\
 		"	lw %1, %3\n"				\
@@ -125,35 +156,51 @@ do {								\
 		"3:\n"						\
 		_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 3b, %0, %1)	\
 		_ASM_EXTABLE_UACCESS_ERR_ZERO(2b, 3b, %0, %1)	\
-		: "+r" (err), "=&r" (__lo), "=r" (__hi)		\
+		: "+r" (__gu8_err), "=&r" (__lo), "=r" (__hi)	\
 		: "m" (__ptr[__LSW]), "m" (__ptr[__MSW]));	\
-	if (err)						\
+	if (__gu8_err) {					\
 		__hi = 0;					\
-	(x) = (__typeof__(x))((__typeof__((x)-(x)))(		\
+		goto label;					\
+	}							\
+	(x) = (__typeof__(x))((__typeof__((x) - (x)))(		\
 		(((u64)__hi << 32) | __lo)));			\
 } while (0)
+#endif /* CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
+
 #endif /* CONFIG_64BIT */
 
-#define __get_user_nocheck(x, __gu_ptr, __gu_err)		\
+#define __get_user_nocheck(x, __gu_ptr, label)			\
 do {								\
 	switch (sizeof(*__gu_ptr)) {				\
 	case 1:							\
-		__get_user_asm("lb", (x), __gu_ptr, __gu_err);	\
+		__get_user_asm("lb", (x), __gu_ptr, label);	\
 		break;						\
 	case 2:							\
-		__get_user_asm("lh", (x), __gu_ptr, __gu_err);	\
+		__get_user_asm("lh", (x), __gu_ptr, label);	\
 		break;						\
 	case 4:							\
-		__get_user_asm("lw", (x), __gu_ptr, __gu_err);	\
+		__get_user_asm("lw", (x), __gu_ptr, label);	\
 		break;						\
 	case 8:							\
-		__get_user_8((x), __gu_ptr, __gu_err);	\
+		__get_user_8((x), __gu_ptr, label);		\
 		break;						\
 	default:						\
 		BUILD_BUG();					\
 	}							\
 } while (0)
 
+#define __get_user_error(x, ptr, err)					\
+do {									\
+	__label__ __gu_failed;						\
+									\
+	__get_user_nocheck(x, ptr, __gu_failed);			\
+		err = 0;						\
+		break;							\
+__gu_failed:								\
+		x = 0;							\
+		err = -EFAULT;						\
+} while (0)
+
 /**
  * __get_user: - Get a simple variable from user space, with less checking.
  * @x:   Variable to store result.
@@ -178,13 +225,16 @@ do {								\
 ({								\
 	const __typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
 	long __gu_err = 0;					\
+	__typeof__(x) __gu_val;					\
 								\
 	__chk_user_ptr(__gu_ptr);				\
 								\
 	__enable_user_access();					\
-	__get_user_nocheck(x, __gu_ptr, __gu_err);		\
+	__get_user_error(__gu_val, __gu_ptr, __gu_err);		\
 	__disable_user_access();				\
 								\
+	(x) = __gu_val;						\
+								\
 	__gu_err;						\
 })
 
@@ -369,13 +419,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
 }
 
 #define __get_kernel_nofault(dst, src, type, err_label)			\
-do {									\
-	long __kr_err = 0;						\
-									\
-	__get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err);	\
-	if (unlikely(__kr_err))						\
-		goto err_label;						\
-} while (0)
+	__get_user_nocheck(*((type *)(dst)), (type *)(src), err_label)
 
 #define __put_kernel_nofault(dst, src, type, err_label)			\
 	__put_user_nocheck(*((type *)(src)), (type *)(dst), err_label)
@@ -401,12 +445,9 @@ static inline void user_access_restore(unsigned long enabled) { }
 	__put_user_nocheck(x, (ptr), label)
 
 #define unsafe_get_user(x, ptr, label)	do {				\
-	long __err = 0;							\
 	__inttype(*(ptr)) __gu_val;					\
-	__get_user_nocheck(__gu_val, (ptr), __err);			\
+	__get_user_nocheck(__gu_val, (ptr), label);			\
 	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
-	if (__err)							\
-		goto label;						\
 } while (0)
 
 #define unsafe_copy_loop(dst, src, len, type, op, label)		\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
@ 2025-04-22 10:22   ` Alexandre Ghiti
  2025-05-21  8:26     ` Ben Dooks
  2025-04-22 23:01   ` Deepak Gupta
  1 sibling, 1 reply; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-22 10:22 UTC (permalink / raw)
  To: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

Hi Cyril,

On 10/04/2025 09:05, Cyril Bur wrote:
> From: Ben Dooks <ben.dooks@codethink.co.uk>
>
> When threads/tasks are switched we need to ensure the old execution's
> SR_SUM state is saved and the new thread has the old SR_SUM state
> restored.
>
> The issue was seen under heavy load especially with the syz-stress tool
> running, with crashes as follows in schedule_tail:
>
> Unable to handle kernel access to user memory without uaccess routines
> at virtual address 000000002749f0d0
> Oops [#1]
> Modules linked in:
> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> Hardware name: riscv-virtio,qemu (DT)
> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>   t5 : ffffffc4043cafba t6 : 0000000000040000
> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> 000000000000000f
> Call Trace:
> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> [<ffffffe000005570>] ret_from_exception+0x0/0x14
> Dumping ftrace buffer:
>     (ftrace buffer empty)
> ---[ end trace b5f8f9231dc87dda ]---
>
> The issue comes from the put_user() in schedule_tail
> (kernel/sched/core.c) doing the following:
>
> asmlinkage __visible void schedule_tail(struct task_struct *prev)
> {
> ...
>          if (current->set_child_tid)
>                  put_user(task_pid_vnr(current), current->set_child_tid);
> ...
> }
>
> the put_user() macro causes the code sequence to come out as follows:
>
> 1:	__enable_user_access()
> 2:	reg = task_pid_vnr(current);
> 3:	*current->set_child_tid = reg;
> 4:	__disable_user_access()
>
> The problem is that we may have a sleeping function as argument which
> could clear SR_SUM causing the panic above. This was fixed by
> evaluating the argument of the put_user() macro outside the user-enabled
> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> enabling user access")"
>
> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> to avoid the same issue we had with put_user() and sleeping functions we
> must ensure code flow can go through switch_to() from within a region of
> code with SR_SUM enabled and come back with SR_SUM still enabled. This
> patch addresses the problem allowing future work to enable full use of
> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> on every access. Make switch_to() save and restore SR_SUM.
>
> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/processor.h | 1 +
>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>   arch/riscv/kernel/entry.S          | 8 ++++++++
>   3 files changed, 14 insertions(+)
>
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 5f56eb9d114a..58fd11c89fe9 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -103,6 +103,7 @@ struct thread_struct {
>   	struct __riscv_d_ext_state fstate;
>   	unsigned long bad_cause;
>   	unsigned long envcfg;
> +	unsigned long status;
>   	u32 riscv_v_flags;
>   	u32 vstate_ctrl;
>   	struct __riscv_v_ext_state vstate;
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index 16490755304e..969c65b1fe41 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -34,6 +34,7 @@ void asm_offsets(void)
>   	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>   	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>   	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> +	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>   
>   	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>   	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> @@ -346,6 +347,10 @@ void asm_offsets(void)
>   		  offsetof(struct task_struct, thread.s[11])
>   		- offsetof(struct task_struct, thread.ra)
>   	);
> +	DEFINE(TASK_THREAD_STATUS_RA,
> +		  offsetof(struct task_struct, thread.status)
> +		- offsetof(struct task_struct, thread.ra)
> +	);
>   
>   	DEFINE(TASK_THREAD_F0_F0,
>   		  offsetof(struct task_struct, thread.fstate.f[0])
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 33a5a9f2a0d4..00bd0de9faa2 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>   	REG_S s9,  TASK_THREAD_S9_RA(a3)
>   	REG_S s10, TASK_THREAD_S10_RA(a3)
>   	REG_S s11, TASK_THREAD_S11_RA(a3)
> +
> +	/* save the user space access flag */
> +	li    s0, SR_SUM


This is not needed anymore ^ but I'll remove it when merging your patchset.


> +	csrr  s1, CSR_STATUS
> +	REG_S s1, TASK_THREAD_STATUS_RA(a3)
> +
>   	/* Save the kernel shadow call stack pointer */
>   	scs_save_current
>   	/* Restore context from next->thread */
> +	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
> +	csrs  CSR_STATUS, s0
>   	REG_L ra,  TASK_THREAD_RA_RA(a4)
>   	REG_L sp,  TASK_THREAD_SP_RA(a4)
>   	REG_L s0,  TASK_THREAD_S0_RA(a4)

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks for the multiple revisions!

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 2/5] riscv: implement user_access_begin() and families
  2025-04-10  7:05 ` [PATCH v6 2/5] riscv: implement user_access_begin() and families Cyril Bur
@ 2025-04-22 10:26   ` Alexandre Ghiti
  0 siblings, 0 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-22 10:26 UTC (permalink / raw)
  To: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks
  Cc: linux-riscv, linux-kernel, jszhang

On 10/04/2025 09:05, Cyril Bur wrote:
> From: Jisheng Zhang <jszhang@kernel.org>
>
> Currently, when a function like strncpy_from_user() is called,
> the userspace access protection is disabled and enabled
> for every word read.
>
> By implementing user_access_begin() and families, the protection
> is disabled at the beginning of the copy and enabled at the end.
>
> The __inttype macro is borrowed from x86 implementation.
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/uaccess.h | 76 ++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
>
> diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
> index fee56b0c8058..c9a461467bf4 100644
> --- a/arch/riscv/include/asm/uaccess.h
> +++ b/arch/riscv/include/asm/uaccess.h
> @@ -61,6 +61,19 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, unsigne
>   #define __disable_user_access()							\
>   	__asm__ __volatile__ ("csrc sstatus, %0" : : "r" (SR_SUM) : "memory")
>   
> +/*
> + * This is the smallest unsigned integer type that can fit a value
> + * (up to 'long long')
> + */
> +#define __inttype(x) __typeof__(		\
> +	__typefits(x, char,			\
> +	  __typefits(x, short,			\
> +	    __typefits(x, int,			\
> +	      __typefits(x, long, 0ULL)))))
> +
> +#define __typefits(x, type, not) \
> +	__builtin_choose_expr(sizeof(x) <= sizeof(type), (unsigned type)0, not)
> +
>   /*
>    * The exception table consists of pairs of addresses: the first is the
>    * address of an instruction that is allowed to fault, and the second is
> @@ -368,6 +381,69 @@ do {									\
>   		goto err_label;						\
>   } while (0)
>   
> +static __must_check __always_inline bool user_access_begin(const void __user *ptr, size_t len)
> +{
> +	if (unlikely(!access_ok(ptr, len)))
> +		return 0;
> +	__enable_user_access();
> +	return 1;
> +}
> +#define user_access_begin user_access_begin
> +#define user_access_end __disable_user_access
> +
> +static inline unsigned long user_access_save(void) { return 0UL; }
> +static inline void user_access_restore(unsigned long enabled) { }
> +
> +/*
> + * We want the unsafe accessors to always be inlined and use
> + * the error labels - thus the macro games.
> + */
> +#define unsafe_put_user(x, ptr, label)	do {				\
> +	long __err = 0;							\
> +	__put_user_nocheck(x, (ptr), __err);				\
> +	if (__err)							\
> +		goto label;						\
> +} while (0)
> +
> +#define unsafe_get_user(x, ptr, label)	do {				\
> +	long __err = 0;							\
> +	__inttype(*(ptr)) __gu_val;					\
> +	__get_user_nocheck(__gu_val, (ptr), __err);			\
> +	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
> +	if (__err)							\
> +		goto label;						\
> +} while (0)
> +
> +#define unsafe_copy_loop(dst, src, len, type, op, label)		\
> +	while (len >= sizeof(type)) {					\
> +		op(*(type *)(src), (type __user *)(dst), label);	\
> +		dst += sizeof(type);					\
> +		src += sizeof(type);					\
> +		len -= sizeof(type);					\
> +	}
> +
> +#define unsafe_copy_to_user(_dst, _src, _len, label)			\
> +do {									\
> +	char __user *__ucu_dst = (_dst);				\
> +	const char *__ucu_src = (_src);					\
> +	size_t __ucu_len = (_len);					\
> +	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u64, unsafe_put_user, label);	\
> +	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u32, unsafe_put_user, label);	\
> +	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u16, unsafe_put_user, label);	\
> +	unsafe_copy_loop(__ucu_dst, __ucu_src, __ucu_len, u8, unsafe_put_user, label);	\
> +} while (0)
> +
> +#define unsafe_copy_from_user(_dst, _src, _len, label)			\
> +do {									\
> +	char *__ucu_dst = (_dst);					\
> +	const char __user *__ucu_src = (_src);				\
> +	size_t __ucu_len = (_len);					\
> +	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u64, unsafe_get_user, label);	\
> +	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u32, unsafe_get_user, label);	\
> +	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u16, unsafe_get_user, label);	\
> +	unsafe_copy_loop(__ucu_src, __ucu_dst, __ucu_len, u8, unsafe_get_user, label);	\
> +} while (0)
> +
>   #else /* CONFIG_MMU */
>   #include <asm-generic/uaccess.h>
>   #endif /* CONFIG_MMU */


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user()
  2025-04-10  7:05 ` [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user() Cyril Bur
@ 2025-04-22 10:36   ` Alexandre Ghiti
  0 siblings, 0 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-22 10:36 UTC (permalink / raw)
  To: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks
  Cc: linux-riscv, linux-kernel, jszhang


On 10/04/2025 09:05, Cyril Bur wrote:
> From: Jisheng Zhang <jszhang@kernel.org>
>
> With 'asm goto' we don't need to test the error etc, the exception just
> jumps to the error handling directly.
>
> Because there are no output clobbers which could trigger gcc bugs [1]
> the use of asm_goto_output() macro is not necessary here. Not using
> asm_goto_output() is desirable as the generated output asm will be
> cleaner.
>
> Use of the volatile keyword is redundant as per gcc 14.2.0 manual section
> 6.48.2.7 Goto Labels:
>> Also note that an asm goto statement is always implicitly considered
>    volatile.
>
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> [Cyril Bur: Rewritten commit message]
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/uaccess.h | 71 +++++++++++++++-----------------
>   1 file changed, 33 insertions(+), 38 deletions(-)
>
> diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
> index da36057847f0..719c9179a751 100644
> --- a/arch/riscv/include/asm/uaccess.h
> +++ b/arch/riscv/include/asm/uaccess.h
> @@ -214,61 +214,66 @@ do {								\
>   		((x) = (__force __typeof__(x))0, -EFAULT);	\
>   })
>   
> -#define __put_user_asm(insn, x, ptr, err)			\
> +#define __put_user_asm(insn, x, ptr, label)			\
>   do {								\
>   	__typeof__(*(ptr)) __x = x;				\
> -	__asm__ __volatile__ (					\
> +	asm goto(						\
>   		"1:\n"						\
> -		"	" insn " %z1, %2\n"			\
> -		"2:\n"						\
> -		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> -		: "+r" (err)					\
> -		: "rJ" (__x), "m"(*(ptr)));			\
> +		"	" insn " %z0, %1\n"			\
> +		_ASM_EXTABLE(1b, %l2)				\
> +		: : "rJ" (__x), "m"(*(ptr)) : : label);		\
>   } while (0)
>   
>   #ifdef CONFIG_64BIT
> -#define __put_user_8(x, ptr, err) \
> -	__put_user_asm("sd", x, ptr, err)
> +#define __put_user_8(x, ptr, label) \
> +	__put_user_asm("sd", x, ptr, label)
>   #else /* !CONFIG_64BIT */
> -#define __put_user_8(x, ptr, err)				\
> +#define __put_user_8(x, ptr, label)				\
>   do {								\
>   	u32 __user *__ptr = (u32 __user *)(ptr);		\
>   	u64 __x = (__typeof__((x)-(x)))(x);			\
> -	__asm__ __volatile__ (					\
> +	asm goto(						\
>   		"1:\n"						\
> -		"	sw %z1, %3\n"				\
> +		"	sw %z0, %2\n"				\
>   		"2:\n"						\
> -		"	sw %z2, %4\n"				\
> -		"3:\n"						\
> -		_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0)		\
> -		_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0)		\
> -		: "+r" (err)					\
> -		: "rJ" (__x), "rJ" (__x >> 32),			\
> +		"	sw %z1, %3\n"				\
> +		_ASM_EXTABLE(1b, %l4)				\
> +		_ASM_EXTABLE(2b, %l4)				\
> +		: : "rJ" (__x), "rJ" (__x >> 32),		\
>   			"m" (__ptr[__LSW]),			\
> -			"m" (__ptr[__MSW]));			\
> +			"m" (__ptr[__MSW]) : : label);		\
>   } while (0)
>   #endif /* CONFIG_64BIT */
>   
> -#define __put_user_nocheck(x, __gu_ptr, __pu_err)					\
> +#define __put_user_nocheck(x, __gu_ptr, label)			\
>   do {								\
>   	switch (sizeof(*__gu_ptr)) {				\
>   	case 1:							\
> -		__put_user_asm("sb", (x), __gu_ptr, __pu_err);	\
> +		__put_user_asm("sb", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 2:							\
> -		__put_user_asm("sh", (x), __gu_ptr, __pu_err);	\
> +		__put_user_asm("sh", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 4:							\
> -		__put_user_asm("sw", (x), __gu_ptr, __pu_err);	\
> +		__put_user_asm("sw", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 8:							\
> -		__put_user_8((x), __gu_ptr, __pu_err);	\
> +		__put_user_8((x), __gu_ptr, label);		\
>   		break;						\
>   	default:						\
>   		BUILD_BUG();					\
>   	}							\
>   } while (0)
>   
> +#define __put_user_error(x, ptr, err)				\
> +do {								\
> +	__label__ err_label;					\
> +	__put_user_nocheck(x, ptr, err_label);			\
> +	break;							\
> +err_label:							\
> +	(err) = -EFAULT;					\
> +} while (0)
> +
>   /**
>    * __put_user: - Write a simple value into user space, with less checking.
>    * @x:   Value to copy to user space.
> @@ -299,7 +304,7 @@ do {								\
>   	__chk_user_ptr(__gu_ptr);				\
>   								\
>   	__enable_user_access();					\
> -	__put_user_nocheck(__val, __gu_ptr, __pu_err);		\
> +	__put_user_error(__val, __gu_ptr, __pu_err);		\
>   	__disable_user_access();				\
>   								\
>   	__pu_err;						\
> @@ -373,13 +378,7 @@ do {									\
>   } while (0)
>   
>   #define __put_kernel_nofault(dst, src, type, err_label)			\
> -do {									\
> -	long __kr_err = 0;						\
> -									\
> -	__put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err);	\
> -	if (unlikely(__kr_err))						\
> -		goto err_label;						\
> -} while (0)
> +	__put_user_nocheck(*((type *)(src)), (type *)(dst), err_label)
>   
>   static __must_check __always_inline bool user_access_begin(const void __user *ptr, size_t len)
>   {
> @@ -398,12 +397,8 @@ static inline void user_access_restore(unsigned long enabled) { }
>    * We want the unsafe accessors to always be inlined and use
>    * the error labels - thus the macro games.
>    */
> -#define unsafe_put_user(x, ptr, label)	do {				\
> -	long __err = 0;							\
> -	__put_user_nocheck(x, (ptr), __err);				\
> -	if (__err)							\
> -		goto label;						\
> -} while (0)
> +#define unsafe_put_user(x, ptr, label)					\
> +	__put_user_nocheck(x, (ptr), label)
>   
>   #define unsafe_get_user(x, ptr, label)	do {				\
>   	long __err = 0;							\


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user()
  2025-04-10  7:05 ` [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user() Cyril Bur
@ 2025-04-22 12:10   ` Alexandre Ghiti
  0 siblings, 0 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-22 12:10 UTC (permalink / raw)
  To: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks
  Cc: linux-riscv, linux-kernel, jszhang


On 10/04/2025 09:05, Cyril Bur wrote:
> From: Jisheng Zhang <jszhang@kernel.org>
>
> Putting ptr in the inputs as opposed to output may seem incorrect but
> this is done for a few reasons:
> - Not having it in the output permits the use of asm goto in a
>    subsequent patch. There are bugs in gcc [1] which would otherwise
>    prevent it.
> - Since the output memory is userspace there isn't any real benefit from
>    telling the compiler about the memory clobber.
> - x86, arm and powerpc all use this technique.
>
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> [Cyril Bur: Rewritten commit message]
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/uaccess.h | 18 +++++++++---------
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
> index c9a461467bf4..da36057847f0 100644
> --- a/arch/riscv/include/asm/uaccess.h
> +++ b/arch/riscv/include/asm/uaccess.h
> @@ -219,11 +219,11 @@ do {								\
>   	__typeof__(*(ptr)) __x = x;				\
>   	__asm__ __volatile__ (					\
>   		"1:\n"						\
> -		"	" insn " %z2, %1\n"			\
> +		"	" insn " %z1, %2\n"			\
>   		"2:\n"						\
>   		_ASM_EXTABLE_UACCESS_ERR(1b, 2b, %0)		\
> -		: "+r" (err), "=m" (*(ptr))			\
> -		: "rJ" (__x));					\
> +		: "+r" (err)					\
> +		: "rJ" (__x), "m"(*(ptr)));			\
>   } while (0)
>   
>   #ifdef CONFIG_64BIT
> @@ -236,16 +236,16 @@ do {								\
>   	u64 __x = (__typeof__((x)-(x)))(x);			\
>   	__asm__ __volatile__ (					\
>   		"1:\n"						\
> -		"	sw %z3, %1\n"				\
> +		"	sw %z1, %3\n"				\
>   		"2:\n"						\
> -		"	sw %z4, %2\n"				\
> +		"	sw %z2, %4\n"				\
>   		"3:\n"						\
>   		_ASM_EXTABLE_UACCESS_ERR(1b, 3b, %0)		\
>   		_ASM_EXTABLE_UACCESS_ERR(2b, 3b, %0)		\
> -		: "+r" (err),					\
> -			"=m" (__ptr[__LSW]),			\
> -			"=m" (__ptr[__MSW])			\
> -		: "rJ" (__x), "rJ" (__x >> 32));		\
> +		: "+r" (err)					\
> +		: "rJ" (__x), "rJ" (__x >> 32),			\
> +			"m" (__ptr[__LSW]),			\
> +			"m" (__ptr[__MSW]));			\
>   } while (0)
>   #endif /* CONFIG_64BIT */
>   


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user()
  2025-04-10  7:05 ` [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user() Cyril Bur
@ 2025-04-22 12:19   ` Alexandre Ghiti
  0 siblings, 0 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-22 12:19 UTC (permalink / raw)
  To: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks
  Cc: linux-riscv, linux-kernel, jszhang

On 10/04/2025 09:05, Cyril Bur wrote:
> From: Jisheng Zhang <jszhang@kernel.org>
>
> With 'asm goto' we don't need to test the error etc, the exception just
> jumps to the error handling directly.
>
> Unlike put_user(), get_user() must work around GCC bugs [1] when using
> output clobbers in an asm goto statement.
>
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> [Cyril Bur: Rewritten commit message]
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/uaccess.h | 95 +++++++++++++++++++++++---------
>   1 file changed, 68 insertions(+), 27 deletions(-)
>
> diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
> index 719c9179a751..87d01168f80a 100644
> --- a/arch/riscv/include/asm/uaccess.h
> +++ b/arch/riscv/include/asm/uaccess.h
> @@ -96,27 +96,58 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, unsigne
>    * call.
>    */
>   
> -#define __get_user_asm(insn, x, ptr, err)			\
> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
> +#define __get_user_asm(insn, x, ptr, label)			\
> +	asm_goto_output(					\
> +		"1:\n"						\
> +		"	" insn " %0, %1\n"			\
> +		_ASM_EXTABLE_UACCESS_ERR(1b, %l2, %0)		\
> +		: "=&r" (x)					\
> +		: "m" (*(ptr)) : : label)
> +#else /* !CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
> +#define __get_user_asm(insn, x, ptr, label)			\
>   do {								\
> -	__typeof__(x) __x;					\
> +	long __gua_err = 0;					\
>   	__asm__ __volatile__ (					\
>   		"1:\n"						\
>   		"	" insn " %1, %2\n"			\
>   		"2:\n"						\
>   		_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %0, %1)	\
> -		: "+r" (err), "=&r" (__x)			\
> +		: "+r" (__gua_err), "=&r" (x)			\
>   		: "m" (*(ptr)));				\
> -	(x) = __x;						\
> +	if (__gua_err)						\
> +		goto label;					\
>   } while (0)
> +#endif /* CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
>   
>   #ifdef CONFIG_64BIT
> -#define __get_user_8(x, ptr, err) \
> -	__get_user_asm("ld", x, ptr, err)
> +#define __get_user_8(x, ptr, label) \
> +	__get_user_asm("ld", x, ptr, label)
>   #else /* !CONFIG_64BIT */
> -#define __get_user_8(x, ptr, err)				\
> +
> +#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
> +#define __get_user_8(x, ptr, label)				\
> +	u32 __user *__ptr = (u32 __user *)(ptr);		\
> +	u32 __lo, __hi;						\
> +	asm_goto_output(					\
> +		"1:\n"						\
> +		"	lw %0, %2\n"				\
> +		"2:\n"						\
> +		"	lw %1, %3\n"				\
> +		_ASM_EXTABLE_UACCESS_ERR(1b, %l4, %0)		\
> +		_ASM_EXTABLE_UACCESS_ERR(2b, %l4, %0)		\
> +		: "=&r" (__lo), "=r" (__hi)			\
> +		: "m" (__ptr[__LSW]), "m" (__ptr[__MSW])	\
> +		: : label);                                     \
> +	(x) = (__typeof__(x))((__typeof__((x) - (x)))(		\
> +		(((u64)__hi << 32) | __lo)));			\
> +
> +#else /* !CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
> +#define __get_user_8(x, ptr, label)				\
>   do {								\
>   	u32 __user *__ptr = (u32 __user *)(ptr);		\
>   	u32 __lo, __hi;						\
> +	long __gu8_err = 0;					\
>   	__asm__ __volatile__ (					\
>   		"1:\n"						\
>   		"	lw %1, %3\n"				\
> @@ -125,35 +156,51 @@ do {								\
>   		"3:\n"						\
>   		_ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 3b, %0, %1)	\
>   		_ASM_EXTABLE_UACCESS_ERR_ZERO(2b, 3b, %0, %1)	\
> -		: "+r" (err), "=&r" (__lo), "=r" (__hi)		\
> +		: "+r" (__gu8_err), "=&r" (__lo), "=r" (__hi)	\
>   		: "m" (__ptr[__LSW]), "m" (__ptr[__MSW]));	\
> -	if (err)						\
> +	if (__gu8_err) {					\
>   		__hi = 0;					\
> -	(x) = (__typeof__(x))((__typeof__((x)-(x)))(		\
> +		goto label;					\
> +	}							\
> +	(x) = (__typeof__(x))((__typeof__((x) - (x)))(		\
>   		(((u64)__hi << 32) | __lo)));			\
>   } while (0)
> +#endif /* CONFIG_CC_HAS_ASM_GOTO_OUTPUT */
> +
>   #endif /* CONFIG_64BIT */
>   
> -#define __get_user_nocheck(x, __gu_ptr, __gu_err)		\
> +#define __get_user_nocheck(x, __gu_ptr, label)			\
>   do {								\
>   	switch (sizeof(*__gu_ptr)) {				\
>   	case 1:							\
> -		__get_user_asm("lb", (x), __gu_ptr, __gu_err);	\
> +		__get_user_asm("lb", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 2:							\
> -		__get_user_asm("lh", (x), __gu_ptr, __gu_err);	\
> +		__get_user_asm("lh", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 4:							\
> -		__get_user_asm("lw", (x), __gu_ptr, __gu_err);	\
> +		__get_user_asm("lw", (x), __gu_ptr, label);	\
>   		break;						\
>   	case 8:							\
> -		__get_user_8((x), __gu_ptr, __gu_err);	\
> +		__get_user_8((x), __gu_ptr, label);		\
>   		break;						\
>   	default:						\
>   		BUILD_BUG();					\
>   	}							\
>   } while (0)
>   
> +#define __get_user_error(x, ptr, err)					\
> +do {									\
> +	__label__ __gu_failed;						\
> +									\
> +	__get_user_nocheck(x, ptr, __gu_failed);			\
> +		err = 0;						\
> +		break;							\
> +__gu_failed:								\
> +		x = 0;							\
> +		err = -EFAULT;						\
> +} while (0)
> +
>   /**
>    * __get_user: - Get a simple variable from user space, with less checking.
>    * @x:   Variable to store result.
> @@ -178,13 +225,16 @@ do {								\
>   ({								\
>   	const __typeof__(*(ptr)) __user *__gu_ptr = untagged_addr(ptr); \
>   	long __gu_err = 0;					\
> +	__typeof__(x) __gu_val;					\
>   								\
>   	__chk_user_ptr(__gu_ptr);				\
>   								\
>   	__enable_user_access();					\
> -	__get_user_nocheck(x, __gu_ptr, __gu_err);		\
> +	__get_user_error(__gu_val, __gu_ptr, __gu_err);		\
>   	__disable_user_access();				\
>   								\
> +	(x) = __gu_val;						\
> +								\
>   	__gu_err;						\
>   })
>   
> @@ -369,13 +419,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
>   }
>   
>   #define __get_kernel_nofault(dst, src, type, err_label)			\
> -do {									\
> -	long __kr_err = 0;						\
> -									\
> -	__get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err);	\
> -	if (unlikely(__kr_err))						\
> -		goto err_label;						\
> -} while (0)
> +	__get_user_nocheck(*((type *)(dst)), (type *)(src), err_label)
>   
>   #define __put_kernel_nofault(dst, src, type, err_label)			\
>   	__put_user_nocheck(*((type *)(src)), (type *)(dst), err_label)
> @@ -401,12 +445,9 @@ static inline void user_access_restore(unsigned long enabled) { }
>   	__put_user_nocheck(x, (ptr), label)
>   
>   #define unsafe_get_user(x, ptr, label)	do {				\
> -	long __err = 0;							\
>   	__inttype(*(ptr)) __gu_val;					\
> -	__get_user_nocheck(__gu_val, (ptr), __err);			\
> +	__get_user_nocheck(__gu_val, (ptr), label);			\
>   	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
> -	if (__err)							\
> -		goto label;						\
>   } while (0)
>   
>   #define unsafe_copy_loop(dst, src, len, type, op, label)		\


Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
  2025-04-22 10:22   ` Alexandre Ghiti
@ 2025-04-22 23:01   ` Deepak Gupta
  2025-04-23  6:44     ` Alexandre Ghiti
  2025-05-20 16:49     ` Deepak Gupta
  1 sibling, 2 replies; 32+ messages in thread
From: Deepak Gupta @ 2025-04-22 23:01 UTC (permalink / raw)
  To: Cyril Bur
  Cc: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex,
	linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>From: Ben Dooks <ben.dooks@codethink.co.uk>
>
>When threads/tasks are switched we need to ensure the old execution's
>SR_SUM state is saved and the new thread has the old SR_SUM state
>restored.
>
>The issue was seen under heavy load especially with the syz-stress tool
>running, with crashes as follows in schedule_tail:
>
>Unable to handle kernel access to user memory without uaccess routines
>at virtual address 000000002749f0d0
>Oops [#1]
>Modules linked in:
>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>Hardware name: riscv-virtio,qemu (DT)
>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> t5 : ffffffc4043cafba t6 : 0000000000040000
>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>000000000000000f
>Call Trace:
>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>Dumping ftrace buffer:
>   (ftrace buffer empty)
>---[ end trace b5f8f9231dc87dda ]---
>
>The issue comes from the put_user() in schedule_tail
>(kernel/sched/core.c) doing the following:
>
>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>{
>...
>        if (current->set_child_tid)
>                put_user(task_pid_vnr(current), current->set_child_tid);
>...
>}
>
>the put_user() macro causes the code sequence to come out as follows:
>
>1:	__enable_user_access()
>2:	reg = task_pid_vnr(current);
>3:	*current->set_child_tid = reg;
>4:	__disable_user_access()
>
>The problem is that we may have a sleeping function as argument which
>could clear SR_SUM causing the panic above. This was fixed by
>evaluating the argument of the put_user() macro outside the user-enabled
>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>enabling user access")"
>
>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>to avoid the same issue we had with put_user() and sleeping functions we
>must ensure code flow can go through switch_to() from within a region of
>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>patch addresses the problem allowing future work to enable full use of
>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>on every access. Make switch_to() save and restore SR_SUM.
>
>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>---
> arch/riscv/include/asm/processor.h | 1 +
> arch/riscv/kernel/asm-offsets.c    | 5 +++++
> arch/riscv/kernel/entry.S          | 8 ++++++++
> 3 files changed, 14 insertions(+)
>
>diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>index 5f56eb9d114a..58fd11c89fe9 100644
>--- a/arch/riscv/include/asm/processor.h
>+++ b/arch/riscv/include/asm/processor.h
>@@ -103,6 +103,7 @@ struct thread_struct {
> 	struct __riscv_d_ext_state fstate;
> 	unsigned long bad_cause;
> 	unsigned long envcfg;
>+	unsigned long status;
> 	u32 riscv_v_flags;
> 	u32 vstate_ctrl;
> 	struct __riscv_v_ext_state vstate;
>diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>index 16490755304e..969c65b1fe41 100644
>--- a/arch/riscv/kernel/asm-offsets.c
>+++ b/arch/riscv/kernel/asm-offsets.c
>@@ -34,6 +34,7 @@ void asm_offsets(void)
> 	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> 	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> 	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>+	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>
> 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>@@ -346,6 +347,10 @@ void asm_offsets(void)
> 		  offsetof(struct task_struct, thread.s[11])
> 		- offsetof(struct task_struct, thread.ra)
> 	);
>+	DEFINE(TASK_THREAD_STATUS_RA,
>+		  offsetof(struct task_struct, thread.status)
>+		- offsetof(struct task_struct, thread.ra)
>+	);
>
> 	DEFINE(TASK_THREAD_F0_F0,
> 		  offsetof(struct task_struct, thread.fstate.f[0])
>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>index 33a5a9f2a0d4..00bd0de9faa2 100644
>--- a/arch/riscv/kernel/entry.S
>+++ b/arch/riscv/kernel/entry.S
>@@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
> 	REG_S s9,  TASK_THREAD_S9_RA(a3)
> 	REG_S s10, TASK_THREAD_S10_RA(a3)
> 	REG_S s11, TASK_THREAD_S11_RA(a3)
>+
>+	/* save the user space access flag */
>+	li    s0, SR_SUM
>+	csrr  s1, CSR_STATUS
>+	REG_S s1, TASK_THREAD_STATUS_RA(a3)
>+
> 	/* Save the kernel shadow call stack pointer */
> 	scs_save_current
> 	/* Restore context from next->thread */
>+	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>+	csrs  CSR_STATUS, s0
> 	REG_L ra,  TASK_THREAD_RA_RA(a4)
> 	REG_L sp,  TASK_THREAD_SP_RA(a4)
> 	REG_L s0,  TASK_THREAD_S0_RA(a4)

Reviewed-by: Deepak Gupta <debug@rivosinc.com>

Note to alex ghiti,

If this goes in before cfi changes, I might have to re-work some of the
changes with respect to zicfilp handling. zicfilp introduces `elp` state
in `sstatus`.

>-- 
>2.34.1
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-22 23:01   ` Deepak Gupta
@ 2025-04-23  6:44     ` Alexandre Ghiti
  2025-05-20 16:49     ` Deepak Gupta
  1 sibling, 0 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-04-23  6:44 UTC (permalink / raw)
  To: Deepak Gupta, Cyril Bur
  Cc: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks,
	linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

Hi Deepak,

On 23/04/2025 01:01, Deepak Gupta wrote:
> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>> When threads/tasks are switched we need to ensure the old execution's
>> SR_SUM state is saved and the new thread has the old SR_SUM state
>> restored.
>>
>> The issue was seen under heavy load especially with the syz-stress tool
>> running, with crashes as follows in schedule_tail:
>>
>> Unable to handle kernel access to user memory without uaccess routines
>> at virtual address 000000002749f0d0
>> Oops [#1]
>> Modules linked in:
>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> Hardware name: riscv-virtio,qemu (DT)
>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> t5 : ffffffc4043cafba t6 : 0000000000040000
>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> 000000000000000f
>> Call Trace:
>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>> Dumping ftrace buffer:
>>   (ftrace buffer empty)
>> ---[ end trace b5f8f9231dc87dda ]---
>>
>> The issue comes from the put_user() in schedule_tail
>> (kernel/sched/core.c) doing the following:
>>
>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> {
>> ...
>>        if (current->set_child_tid)
>>                put_user(task_pid_vnr(current), current->set_child_tid);
>> ...
>> }
>>
>> the put_user() macro causes the code sequence to come out as follows:
>>
>> 1:    __enable_user_access()
>> 2:    reg = task_pid_vnr(current);
>> 3:    *current->set_child_tid = reg;
>> 4:    __disable_user_access()
>>
>> The problem is that we may have a sleeping function as argument which
>> could clear SR_SUM causing the panic above. This was fixed by
>> evaluating the argument of the put_user() macro outside the user-enabled
>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> enabling user access")"
>>
>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> to avoid the same issue we had with put_user() and sleeping functions we
>> must ensure code flow can go through switch_to() from within a region of
>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> patch addresses the problem allowing future work to enable full use of
>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> on every access. Make switch_to() save and restore SR_SUM.
>>
>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> ---
>> arch/riscv/include/asm/processor.h | 1 +
>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> arch/riscv/kernel/entry.S          | 8 ++++++++
>> 3 files changed, 14 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/processor.h 
>> b/arch/riscv/include/asm/processor.h
>> index 5f56eb9d114a..58fd11c89fe9 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -103,6 +103,7 @@ struct thread_struct {
>>     struct __riscv_d_ext_state fstate;
>>     unsigned long bad_cause;
>>     unsigned long envcfg;
>> +    unsigned long status;
>>     u32 riscv_v_flags;
>>     u32 vstate_ctrl;
>>     struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/kernel/asm-offsets.c 
>> b/arch/riscv/kernel/asm-offsets.c
>> index 16490755304e..969c65b1fe41 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>
>>     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>     OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>> thread_info.preempt_count);
>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>           offsetof(struct task_struct, thread.s[11])
>>         - offsetof(struct task_struct, thread.ra)
>>     );
>> +    DEFINE(TASK_THREAD_STATUS_RA,
>> +          offsetof(struct task_struct, thread.status)
>> +        - offsetof(struct task_struct, thread.ra)
>> +    );
>>
>>     DEFINE(TASK_THREAD_F0_F0,
>>           offsetof(struct task_struct, thread.fstate.f[0])
>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>> --- a/arch/riscv/kernel/entry.S
>> +++ b/arch/riscv/kernel/entry.S
>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>     REG_S s9,  TASK_THREAD_S9_RA(a3)
>>     REG_S s10, TASK_THREAD_S10_RA(a3)
>>     REG_S s11, TASK_THREAD_S11_RA(a3)
>> +
>> +    /* save the user space access flag */
>> +    li    s0, SR_SUM
>> +    csrr  s1, CSR_STATUS
>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>> +
>>     /* Save the kernel shadow call stack pointer */
>>     scs_save_current
>>     /* Restore context from next->thread */
>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>> +    csrs  CSR_STATUS, s0
>>     REG_L ra,  TASK_THREAD_RA_RA(a4)
>>     REG_L sp,  TASK_THREAD_SP_RA(a4)
>>     REG_L s0,  TASK_THREAD_S0_RA(a4)
>
> Reviewed-by: Deepak Gupta <debug@rivosinc.com>
>
> Note to alex ghiti,
>
> If this goes in before cfi changes, I might have to re-work some of the
> changes with respect to zicfilp handling. zicfilp introduces `elp` state
> in `sstatus`.


This patchset is in my for-next branch, CFI depends on SBI v3.0 so we 
can't know for sure it will get merged in 6.16.

So I advise you to rebase on top of this patchset :)

Thanks,

Alex


>
>> -- 
>> 2.34.1
>>
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 0/5] riscv: uaccess: optimisations
  2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
                   ` (4 preceding siblings ...)
  2025-04-10  7:05 ` [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user() Cyril Bur
@ 2025-05-09 17:30 ` patchwork-bot+linux-riscv
  5 siblings, 0 replies; 32+ messages in thread
From: patchwork-bot+linux-riscv @ 2025-05-09 17:30 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-riscv, palmer, aou, paul.walmsley, charlie, jrtc27,
	ben.dooks, alex, linux-kernel, jszhang

Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Thu, 10 Apr 2025 07:05:21 +0000 you wrote:
> This series tries to optimize riscv uaccess by allowing the use of
> user_access_begin() and user_access_end() which permits grouping user accesses
> and avoiding the CSR write penalty for each access.
> 
> The error path can also be optimised using asm goto which patches 3 and 4
> achieve. This will speed up jumping to labels by avoiding the need of an
> intermediary error type variable within the uaccess macros
> 
> [...]

Here is the summary with links:
  - [v6,1/5] riscv: save the SR_SUM status over switches
    https://git.kernel.org/riscv/c/788aa64c01f1
  - [v6,2/5] riscv: implement user_access_begin() and families
    https://git.kernel.org/riscv/c/19500c6dbc5c
  - [v6,3/5] riscv: uaccess: use input constraints for ptr of __put_user()
    https://git.kernel.org/riscv/c/62135bf660b2
  - [v6,4/5] riscv: uaccess: use 'asm goto' for put_user()
    https://git.kernel.org/riscv/c/cdf647e81714
  - [v6,5/5] riscv: uaccess: use 'asm_goto_output' for get_user()
    https://git.kernel.org/riscv/c/f6bff7827a48

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-22 23:01   ` Deepak Gupta
  2025-04-23  6:44     ` Alexandre Ghiti
@ 2025-05-20 16:49     ` Deepak Gupta
  2025-05-22  6:23       ` Ben Dooks
  1 sibling, 1 reply; 32+ messages in thread
From: Deepak Gupta @ 2025-05-20 16:49 UTC (permalink / raw)
  To: Cyril Bur
  Cc: palmer, aou, paul.walmsley, charlie, jrtc27, ben.dooks, alex,
	linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

I did give this patch my RB and had planned to come back to it to see
if it impacts cfi related patches. Thanks to alex for brinigng to my
attention again. As it stands today, it doesn't impact cfi related
changes but I've some concerns.

Overall I do agree we should reduce number of SSTATUS accesses.

Couple of questions on introducing new `sstatus` field (inline)

On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>>When threads/tasks are switched we need to ensure the old execution's
>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>restored.
>>
>>The issue was seen under heavy load especially with the syz-stress tool
>>running, with crashes as follows in schedule_tail:
>>
>>Unable to handle kernel access to user memory without uaccess routines
>>at virtual address 000000002749f0d0
>>Oops [#1]
>>Modules linked in:
>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>Hardware name: riscv-virtio,qemu (DT)
>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>000000000000000f
>>Call Trace:
>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>Dumping ftrace buffer:
>>  (ftrace buffer empty)
>>---[ end trace b5f8f9231dc87dda ]---
>>
>>The issue comes from the put_user() in schedule_tail
>>(kernel/sched/core.c) doing the following:
>>
>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>{
>>...
>>       if (current->set_child_tid)
>>               put_user(task_pid_vnr(current), current->set_child_tid);
>>...
>>}
>>
>>the put_user() macro causes the code sequence to come out as follows:
>>
>>1:	__enable_user_access()
>>2:	reg = task_pid_vnr(current);
>>3:	*current->set_child_tid = reg;
>>4:	__disable_user_access()
>>
>>The problem is that we may have a sleeping function as argument which
>>could clear SR_SUM causing the panic above. This was fixed by
>>evaluating the argument of the put_user() macro outside the user-enabled
>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>enabling user access")"
>>
>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>to avoid the same issue we had with put_user() and sleeping functions we
>>must ensure code flow can go through switch_to() from within a region of
>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>patch addresses the problem allowing future work to enable full use of
>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>on every access. Make switch_to() save and restore SR_SUM.
>>
>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>---
>>arch/riscv/include/asm/processor.h | 1 +
>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>3 files changed, 14 insertions(+)
>>
>>diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>>index 5f56eb9d114a..58fd11c89fe9 100644
>>--- a/arch/riscv/include/asm/processor.h
>>+++ b/arch/riscv/include/asm/processor.h
>>@@ -103,6 +103,7 @@ struct thread_struct {
>>	struct __riscv_d_ext_state fstate;
>>	unsigned long bad_cause;
>>	unsigned long envcfg;
>>+	unsigned long status;

Do we really need a new member field in `thread_struct`. We already have
`sstatus` in `pt_regs` which reflects overall execution environment situation
for current thread. This gets saved and restored on trap entry and exit.

If we put `status` in `thread_struct` it creates ambiguity in terms of which
`status` to save to and pick from from future maintainibility purposes as the
fields get introduced to this CSR.

Why can't we access current trap frame's `sstatus` image in `__switch_to` to
save and restore?

Let me know if I am missing something obvious here. If there is a complication,
I am missing here and we do end up using this member field, I would rename it
to something like `status_kernel` to reflect that. So that future changes are
cognizant of the fact that we have split `status`. One for kernel execution env
per thread and one for controlling user execution env per thread.


>>	u32 riscv_v_flags;
>>	u32 vstate_ctrl;
>>	struct __riscv_v_ext_state vstate;
>>diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>index 16490755304e..969c65b1fe41 100644
>>--- a/arch/riscv/kernel/asm-offsets.c
>>+++ b/arch/riscv/kernel/asm-offsets.c
>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-04-22 10:22   ` Alexandre Ghiti
@ 2025-05-21  8:26     ` Ben Dooks
  2025-05-21 13:38       ` Samuel Holland
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Dooks @ 2025-05-21  8:26 UTC (permalink / raw)
  To: Alexandre Ghiti, Cyril Bur, palmer, aou, paul.walmsley, charlie,
	jrtc27
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

On 22/04/2025 11:22, Alexandre Ghiti wrote:
> Hi Cyril,
> 
> On 10/04/2025 09:05, Cyril Bur wrote:
>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>> When threads/tasks are switched we need to ensure the old execution's
>> SR_SUM state is saved and the new thread has the old SR_SUM state
>> restored.
>>
>> The issue was seen under heavy load especially with the syz-stress tool
>> running, with crashes as follows in schedule_tail:
>>
>> Unable to handle kernel access to user memory without uaccess routines
>> at virtual address 000000002749f0d0
>> Oops [#1]
>> Modules linked in:
>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> Hardware name: riscv-virtio,qemu (DT)
>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>   t5 : ffffffc4043cafba t6 : 0000000000040000
>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> 000000000000000f
>> Call Trace:
>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>> Dumping ftrace buffer:
>>     (ftrace buffer empty)
>> ---[ end trace b5f8f9231dc87dda ]---
>>
>> The issue comes from the put_user() in schedule_tail
>> (kernel/sched/core.c) doing the following:
>>
>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> {
>> ...
>>          if (current->set_child_tid)
>>                  put_user(task_pid_vnr(current), current->set_child_tid);
>> ...
>> }
>>
>> the put_user() macro causes the code sequence to come out as follows:
>>
>> 1:    __enable_user_access()
>> 2:    reg = task_pid_vnr(current);
>> 3:    *current->set_child_tid = reg;
>> 4:    __disable_user_access()
>>
>> The problem is that we may have a sleeping function as argument which
>> could clear SR_SUM causing the panic above. This was fixed by
>> evaluating the argument of the put_user() macro outside the user-enabled
>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> enabling user access")"
>>
>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> to avoid the same issue we had with put_user() and sleeping functions we
>> must ensure code flow can go through switch_to() from within a region of
>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> patch addresses the problem allowing future work to enable full use of
>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> on every access. Make switch_to() save and restore SR_SUM.
>>
>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> ---
>>   arch/riscv/include/asm/processor.h | 1 +
>>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>   arch/riscv/kernel/entry.S          | 8 ++++++++
>>   3 files changed, 14 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ 
>> asm/processor.h
>> index 5f56eb9d114a..58fd11c89fe9 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -103,6 +103,7 @@ struct thread_struct {
>>       struct __riscv_d_ext_state fstate;
>>       unsigned long bad_cause;
>>       unsigned long envcfg;
>> +    unsigned long status;
>>       u32 riscv_v_flags;
>>       u32 vstate_ctrl;
>>       struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- 
>> offsets.c
>> index 16490755304e..969c65b1fe41 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>       OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>       OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>       OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>       OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>> thread_info.preempt_count);
>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>             offsetof(struct task_struct, thread.s[11])
>>           - offsetof(struct task_struct, thread.ra)
>>       );
>> +    DEFINE(TASK_THREAD_STATUS_RA,
>> +          offsetof(struct task_struct, thread.status)
>> +        - offsetof(struct task_struct, thread.ra)
>> +    );
>>       DEFINE(TASK_THREAD_F0_F0,
>>             offsetof(struct task_struct, thread.fstate.f[0])
>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>> --- a/arch/riscv/kernel/entry.S
>> +++ b/arch/riscv/kernel/entry.S
>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>       REG_S s9,  TASK_THREAD_S9_RA(a3)
>>       REG_S s10, TASK_THREAD_S10_RA(a3)
>>       REG_S s11, TASK_THREAD_S11_RA(a3)
>> +
>> +    /* save the user space access flag */
>> +    li    s0, SR_SUM
> 
> 
> This is not needed anymore ^ but I'll remove it when merging your patchset.
> 

Could you be more specific about what "this" is?

If we don't save/restore the SR_SUM bit I think our old friend
the sched_tail bug will just return.


>> +    csrr  s1, CSR_STATUS
>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>> +
>>       /* Save the kernel shadow call stack pointer */
>>       scs_save_current
>>       /* Restore context from next->thread */
>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>> +    csrs  CSR_STATUS, s0
>>       REG_L ra,  TASK_THREAD_RA_RA(a4)
>>       REG_L sp,  TASK_THREAD_SP_RA(a4)
>>       REG_L s0,  TASK_THREAD_S0_RA(a4)
> 
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> 
> Thanks for the multiple revisions!
> 
> Alex
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-21  8:26     ` Ben Dooks
@ 2025-05-21 13:38       ` Samuel Holland
  2025-05-21 14:30         ` Alexandre Ghiti
  0 siblings, 1 reply; 32+ messages in thread
From: Samuel Holland @ 2025-05-21 13:38 UTC (permalink / raw)
  To: Ben Dooks, Alexandre Ghiti, palmer
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69,
	Cyril Bur, aou, paul.walmsley, charlie, jrtc27

Hi Alex, Ben,

On 2025-05-21 3:26 AM, Ben Dooks wrote:
> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>> Hi Cyril,
>>
>> On 10/04/2025 09:05, Cyril Bur wrote:
>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> When threads/tasks are switched we need to ensure the old execution's
>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>> restored.
>>>
>>> The issue was seen under heavy load especially with the syz-stress tool
>>> running, with crashes as follows in schedule_tail:
>>>
>>> Unable to handle kernel access to user memory without uaccess routines
>>> at virtual address 000000002749f0d0
>>> Oops [#1]
>>> Modules linked in:
>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> Hardware name: riscv-virtio,qemu (DT)
>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>   t5 : ffffffc4043cafba t6 : 0000000000040000
>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> 000000000000000f
>>> Call Trace:
>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> Dumping ftrace buffer:
>>>     (ftrace buffer empty)
>>> ---[ end trace b5f8f9231dc87dda ]---
>>>
>>> The issue comes from the put_user() in schedule_tail
>>> (kernel/sched/core.c) doing the following:
>>>
>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> {
>>> ...
>>>          if (current->set_child_tid)
>>>                  put_user(task_pid_vnr(current), current->set_child_tid);
>>> ...
>>> }
>>>
>>> the put_user() macro causes the code sequence to come out as follows:
>>>
>>> 1:    __enable_user_access()
>>> 2:    reg = task_pid_vnr(current);
>>> 3:    *current->set_child_tid = reg;
>>> 4:    __disable_user_access()
>>>
>>> The problem is that we may have a sleeping function as argument which
>>> could clear SR_SUM causing the panic above. This was fixed by
>>> evaluating the argument of the put_user() macro outside the user-enabled
>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>> enabling user access")"
>>>
>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>> to avoid the same issue we had with put_user() and sleeping functions we
>>> must ensure code flow can go through switch_to() from within a region of
>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>> patch addresses the problem allowing future work to enable full use of
>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>> on every access. Make switch_to() save and restore SR_SUM.
>>>
>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> ---
>>>   arch/riscv/include/asm/processor.h | 1 +
>>>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>   arch/riscv/kernel/entry.S          | 8 ++++++++
>>>   3 files changed, 14 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>> processor.h
>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>> --- a/arch/riscv/include/asm/processor.h
>>> +++ b/arch/riscv/include/asm/processor.h
>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>       struct __riscv_d_ext_state fstate;
>>>       unsigned long bad_cause;
>>>       unsigned long envcfg;
>>> +    unsigned long status;
>>>       u32 riscv_v_flags;
>>>       u32 vstate_ctrl;
>>>       struct __riscv_v_ext_state vstate;
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>> index 16490755304e..969c65b1fe41 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>       OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>       OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>       OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>       OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>             offsetof(struct task_struct, thread.s[11])
>>>           - offsetof(struct task_struct, thread.ra)
>>>       );
>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>> +          offsetof(struct task_struct, thread.status)
>>> +        - offsetof(struct task_struct, thread.ra)
>>> +    );
>>>       DEFINE(TASK_THREAD_F0_F0,
>>>             offsetof(struct task_struct, thread.fstate.f[0])
>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>> --- a/arch/riscv/kernel/entry.S
>>> +++ b/arch/riscv/kernel/entry.S
>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>       REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>       REG_S s10, TASK_THREAD_S10_RA(a3)
>>>       REG_S s11, TASK_THREAD_S11_RA(a3)
>>> +
>>> +    /* save the user space access flag */
>>> +    li    s0, SR_SUM
>>
>>
>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>
> 
> Could you be more specific about what "this" is?
> 
> If we don't save/restore the SR_SUM bit I think our old friend
> the sched_tail bug will just return.

I think Alex is saying the `li` instruction above is not needed because s0 is
unused. But instead I think there is an `and` instruction missing here. The
patch as merged ORs the entirety of the old sstatus with the new sstatus, not
just the SUM bit, which seems extremely dangerous.

Regards,
Samuel

>>> +    csrr  s1, CSR_STATUS
>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>> +
>>>       /* Save the kernel shadow call stack pointer */
>>>       scs_save_current
>>>       /* Restore context from next->thread */
>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>> +    csrs  CSR_STATUS, s0
>>>       REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>       REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>       REG_L s0,  TASK_THREAD_S0_RA(a4)
>>
>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>
>> Thanks for the multiple revisions!
>>
>> Alex
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-21 13:38       ` Samuel Holland
@ 2025-05-21 14:30         ` Alexandre Ghiti
  2025-05-21 14:45           ` Cyril Bur
                             ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Alexandre Ghiti @ 2025-05-21 14:30 UTC (permalink / raw)
  To: Samuel Holland, Ben Dooks, palmer
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69,
	Cyril Bur, aou, paul.walmsley, charlie, jrtc27

Hi Samuel,

On 5/21/25 15:38, Samuel Holland wrote:
> Hi Alex, Ben,
>
> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>> Hi Cyril,
>>>
>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>
>>>> When threads/tasks are switched we need to ensure the old execution's
>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>> restored.
>>>>
>>>> The issue was seen under heavy load especially with the syz-stress tool
>>>> running, with crashes as follows in schedule_tail:
>>>>
>>>> Unable to handle kernel access to user memory without uaccess routines
>>>> at virtual address 000000002749f0d0
>>>> Oops [#1]
>>>> Modules linked in:
>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>> Hardware name: riscv-virtio,qemu (DT)
>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>> 000000000000000f
>>>> Call Trace:
>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>> Dumping ftrace buffer:
>>>>      (ftrace buffer empty)
>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>
>>>> The issue comes from the put_user() in schedule_tail
>>>> (kernel/sched/core.c) doing the following:
>>>>
>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>> {
>>>> ...
>>>>           if (current->set_child_tid)
>>>>                   put_user(task_pid_vnr(current), current->set_child_tid);
>>>> ...
>>>> }
>>>>
>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>
>>>> 1:    __enable_user_access()
>>>> 2:    reg = task_pid_vnr(current);
>>>> 3:    *current->set_child_tid = reg;
>>>> 4:    __disable_user_access()
>>>>
>>>> The problem is that we may have a sleeping function as argument which
>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>> evaluating the argument of the put_user() macro outside the user-enabled
>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>> enabling user access")"
>>>>
>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>> to avoid the same issue we had with put_user() and sleeping functions we
>>>> must ensure code flow can go through switch_to() from within a region of
>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>> patch addresses the problem allowing future work to enable full use of
>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>
>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>> ---
>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>    3 files changed, 14 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>>> processor.h
>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>> --- a/arch/riscv/include/asm/processor.h
>>>> +++ b/arch/riscv/include/asm/processor.h
>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>        struct __riscv_d_ext_state fstate;
>>>>        unsigned long bad_cause;
>>>>        unsigned long envcfg;
>>>> +    unsigned long status;
>>>>        u32 riscv_v_flags;
>>>>        u32 vstate_ctrl;
>>>>        struct __riscv_v_ext_state vstate;
>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>>> index 16490755304e..969c65b1fe41 100644
>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>              offsetof(struct task_struct, thread.s[11])
>>>>            - offsetof(struct task_struct, thread.ra)
>>>>        );
>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>> +          offsetof(struct task_struct, thread.status)
>>>> +        - offsetof(struct task_struct, thread.ra)
>>>> +    );
>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>> --- a/arch/riscv/kernel/entry.S
>>>> +++ b/arch/riscv/kernel/entry.S
>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>> +
>>>> +    /* save the user space access flag */
>>>> +    li    s0, SR_SUM
>>>
>>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>>
>> Could you be more specific about what "this" is?
>>
>> If we don't save/restore the SR_SUM bit I think our old friend
>> the sched_tail bug will just return.
> I think Alex is saying the `li` instruction above is not needed because s0 is
> unused. But instead I think there is an `and` instruction missing here. The
> patch as merged ORs the entirety of the old sstatus with the new sstatus, not
> just the SUM bit, which seems extremely dangerous.


I should have checked the definition of csrs, I thought it would write 
the csr, but you're right it ORs with the current csr value which isn't 
good at all.

@Cyril Can you send a patch for that? Which also removes the `li` 
instruction that I forgot to remove :) I think we can even ask Palmer to 
squash those fixes directly into the patch.

Let me know if you can't do it and I'll do.

Thanks Samuel for noticing,

Alex


>
> Regards,
> Samuel
>
>>>> +    csrr  s1, CSR_STATUS
>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>> +
>>>>        /* Save the kernel shadow call stack pointer */
>>>>        scs_save_current
>>>>        /* Restore context from next->thread */
>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>> +    csrs  CSR_STATUS, s0
>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>
>>> Thanks for the multiple revisions!
>>>
>>> Alex
>>>
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-21 14:30         ` Alexandre Ghiti
@ 2025-05-21 14:45           ` Cyril Bur
  2025-05-22 16:15           ` [EXT] " Cyril Bur
  2025-05-22 17:40           ` Andy Chiu
  2 siblings, 0 replies; 32+ messages in thread
From: Cyril Bur @ 2025-05-21 14:45 UTC (permalink / raw)
  To: Alexandre Ghiti, Samuel Holland, Ben Dooks, palmer
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69,
	aou, paul.walmsley, charlie, jrtc27

Hi Alex,

On 21/5/2025 12:30 am, Alexandre Ghiti wrote:
> Hi Samuel,
> 
> On 5/21/25 15:38, Samuel Holland wrote:
>> Hi Alex, Ben,
>>
>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>> Hi Cyril,
>>>>
>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>
>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>> restored.
>>>>>
>>>>> The issue was seen under heavy load especially with the syz-stress 
>>>>> tool
>>>>> running, with crashes as follows in schedule_tail:
>>>>>
>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>> at virtual address 000000002749f0d0
>>>>> Oops [#1]
>>>>> Modules linked in:
>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>> 000000000000000f
>>>>> Call Trace:
>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>> Dumping ftrace buffer:
>>>>>      (ftrace buffer empty)
>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>
>>>>> The issue comes from the put_user() in schedule_tail
>>>>> (kernel/sched/core.c) doing the following:
>>>>>
>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>> {
>>>>> ...
>>>>>           if (current->set_child_tid)
>>>>>                   put_user(task_pid_vnr(current), current- 
>>>>> >set_child_tid);
>>>>> ...
>>>>> }
>>>>>
>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>
>>>>> 1:    __enable_user_access()
>>>>> 2:    reg = task_pid_vnr(current);
>>>>> 3:    *current->set_child_tid = reg;
>>>>> 4:    __disable_user_access()
>>>>>
>>>>> The problem is that we may have a sleeping function as argument which
>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>> evaluating the argument of the put_user() macro outside the user- 
>>>>> enabled
>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>> enabling user access")"
>>>>>
>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros 
>>>>> and
>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>> must ensure code flow can go through switch_to() from within a 
>>>>> region of
>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>> patch addresses the problem allowing future work to enable full use of
>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip 
>>>>> cost
>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>
>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>> ---
>>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>    3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/ 
>>>>> include/ asm/
>>>>> processor.h
>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>        struct __riscv_d_ext_state fstate;
>>>>>        unsigned long bad_cause;
>>>>>        unsigned long envcfg;
>>>>> +    unsigned long status;
>>>>>        u32 riscv_v_flags;
>>>>>        u32 vstate_ctrl;
>>>>>        struct __riscv_v_ext_state vstate;
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/ 
>>>>> asm- offsets.c
>>>>> index 16490755304e..969c65b1fe41 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>>>>> thread_info.preempt_count);
>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>              offsetof(struct task_struct, thread.s[11])
>>>>>            - offsetof(struct task_struct, thread.ra)
>>>>>        );
>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>> +          offsetof(struct task_struct, thread.status)
>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>> +    );
>>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>> --- a/arch/riscv/kernel/entry.S
>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>> +
>>>>> +    /* save the user space access flag */
>>>>> +    li    s0, SR_SUM
>>>>
>>>> This is not needed anymore ^ but I'll remove it when merging your 
>>>> patchset.
>>>>
>>> Could you be more specific about what "this" is?
>>>
>>> If we don't save/restore the SR_SUM bit I think our old friend
>>> the sched_tail bug will just return.
>> I think Alex is saying the `li` instruction above is not needed 
>> because s0 is
>> unused. But instead I think there is an `and` instruction missing 
>> here. The
>> patch as merged ORs the entirety of the old sstatus with the new 
>> sstatus, not
>> just the SUM bit, which seems extremely dangerous.
> 
> 
> I should have checked the definition of csrs, I thought it would write 
> the csr, but you're right it ORs with the current csr value which isn't 
> good at all.
> 
> @Cyril Can you send a patch for that? Which also removes the `li` 
> instruction that I forgot to remove :) I think we can even ask Palmer to 
> squash those fixes directly into the patch.

Yes can do, I'll whip something up.

Cyril
> 
> Let me know if you can't do it and I'll do.
> 
> Thanks Samuel for noticing,
> 
> Alex
> 
> 
>>
>> Regards,
>> Samuel
>>
>>>>> +    csrr  s1, CSR_STATUS
>>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>>> +
>>>>>        /* Save the kernel shadow call stack pointer */
>>>>>        scs_save_current
>>>>>        /* Restore context from next->thread */
>>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>>> +    csrs  CSR_STATUS, s0
>>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>
>>>> Thanks for the multiple revisions!
>>>>
>>>> Alex
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-riscv mailing list
>>>> linux-riscv@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-20 16:49     ` Deepak Gupta
@ 2025-05-22  6:23       ` Ben Dooks
  2025-05-22 14:49         ` Deepak Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Dooks @ 2025-05-22  6:23 UTC (permalink / raw)
  To: Deepak Gupta, Cyril Bur
  Cc: palmer, aou, paul.walmsley, charlie, jrtc27, alex, linux-riscv,
	linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

On 20/05/2025 17:49, Deepak Gupta wrote:
> I did give this patch my RB and had planned to come back to it to see
> if it impacts cfi related patches. Thanks to alex for brinigng to my
> attention again. As it stands today, it doesn't impact cfi related
> changes but I've some concerns.
> 
> Overall I do agree we should reduce number of SSTATUS accesses.
> 
> Couple of questions on introducing new `sstatus` field (inline)
> 
> On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> When threads/tasks are switched we need to ensure the old execution's
>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>> restored.
>>>
>>> The issue was seen under heavy load especially with the syz-stress tool
>>> running, with crashes as follows in schedule_tail:
>>>
>>> Unable to handle kernel access to user memory without uaccess routines
>>> at virtual address 000000002749f0d0
>>> Oops [#1]
>>> Modules linked in:
>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> Hardware name: riscv-virtio,qemu (DT)
>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>> t5 : ffffffc4043cafba t6 : 0000000000040000
>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> 000000000000000f
>>> Call Trace:
>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> Dumping ftrace buffer:
>>>  (ftrace buffer empty)
>>> ---[ end trace b5f8f9231dc87dda ]---
>>>
>>> The issue comes from the put_user() in schedule_tail
>>> (kernel/sched/core.c) doing the following:
>>>
>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> {
>>> ...
>>>       if (current->set_child_tid)
>>>               put_user(task_pid_vnr(current), current->set_child_tid);
>>> ...
>>> }
>>>
>>> the put_user() macro causes the code sequence to come out as follows:
>>>
>>> 1:    __enable_user_access()
>>> 2:    reg = task_pid_vnr(current);
>>> 3:    *current->set_child_tid = reg;
>>> 4:    __disable_user_access()
>>>
>>> The problem is that we may have a sleeping function as argument which
>>> could clear SR_SUM causing the panic above. This was fixed by
>>> evaluating the argument of the put_user() macro outside the user-enabled
>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>> enabling user access")"
>>>
>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>> to avoid the same issue we had with put_user() and sleeping functions we
>>> must ensure code flow can go through switch_to() from within a region of
>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>> patch addresses the problem allowing future work to enable full use of
>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>> on every access. Make switch_to() save and restore SR_SUM.
>>>
>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> ---
>>> arch/riscv/include/asm/processor.h | 1 +
>>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>> arch/riscv/kernel/entry.S          | 8 ++++++++
>>> 3 files changed, 14 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ 
>>> asm/processor.h
>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>> --- a/arch/riscv/include/asm/processor.h
>>> +++ b/arch/riscv/include/asm/processor.h
>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>     struct __riscv_d_ext_state fstate;
>>>     unsigned long bad_cause;
>>>     unsigned long envcfg;
>>> +    unsigned long status;
> 
> Do we really need a new member field in `thread_struct`. We already have
> `sstatus` in `pt_regs` which reflects overall execution environment 
> situation
> for current thread. This gets saved and restored on trap entry and exit.
> 
> If we put `status` in `thread_struct` it creates ambiguity in terms of 
> which
> `status` to save to and pick from from future maintainibility purposes 
> as the
> fields get introduced to this CSR.
> 
> Why can't we access current trap frame's `sstatus` image in 
> `__switch_to` to
> save and restore?
> 
> Let me know if I am missing something obvious here. If there is a 
> complication,
> I am missing here and we do end up using this member field, I would 
> rename it
> to something like `status_kernel` to reflect that. So that future 
> changes are
> cognizant of the fact that we have split `status`. One for kernel 
> execution env
> per thread and one for controlling user execution env per thread.

This is so long ago now I cannot remember if there was any sstatus in
the pt_regs field, and if kernel threads have the same context as their
userland parts.

Does anyone else have any comment on this?

> 
>>>     u32 riscv_v_flags;
>>>     u32 vstate_ctrl;
>>>     struct __riscv_v_ext_state vstate;
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- 
>>> offsets.c
>>> index 16490755304e..969c65b1fe41 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-22  6:23       ` Ben Dooks
@ 2025-05-22 14:49         ` Deepak Gupta
  2025-05-22 17:42           ` Andy Chiu
  0 siblings, 1 reply; 32+ messages in thread
From: Deepak Gupta @ 2025-05-22 14:49 UTC (permalink / raw)
  To: Ben Dooks
  Cc: Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27, alex,
	linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>On 20/05/2025 17:49, Deepak Gupta wrote:
>>I did give this patch my RB and had planned to come back to it to see
>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>attention again. As it stands today, it doesn't impact cfi related
>>changes but I've some concerns.
>>
>>Overall I do agree we should reduce number of SSTATUS accesses.
>>
>>Couple of questions on introducing new `sstatus` field (inline)
>>
>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>
>>>>When threads/tasks are switched we need to ensure the old execution's
>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>restored.
>>>>
>>>>The issue was seen under heavy load especially with the syz-stress tool
>>>>running, with crashes as follows in schedule_tail:
>>>>
>>>>Unable to handle kernel access to user memory without uaccess routines
>>>>at virtual address 000000002749f0d0
>>>>Oops [#1]
>>>>Modules linked in:
>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>000000000000000f
>>>>Call Trace:
>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>Dumping ftrace buffer:
>>>> (ftrace buffer empty)
>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>
>>>>The issue comes from the put_user() in schedule_tail
>>>>(kernel/sched/core.c) doing the following:
>>>>
>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>{
>>>>...
>>>>      if (current->set_child_tid)
>>>>              put_user(task_pid_vnr(current), current->set_child_tid);
>>>>...
>>>>}
>>>>
>>>>the put_user() macro causes the code sequence to come out as follows:
>>>>
>>>>1:    __enable_user_access()
>>>>2:    reg = task_pid_vnr(current);
>>>>3:    *current->set_child_tid = reg;
>>>>4:    __disable_user_access()
>>>>
>>>>The problem is that we may have a sleeping function as argument which
>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>evaluating the argument of the put_user() macro outside the user-enabled
>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>enabling user access")"
>>>>
>>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>>to avoid the same issue we had with put_user() and sleeping functions we
>>>>must ensure code flow can go through switch_to() from within a region of
>>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>patch addresses the problem allowing future work to enable full use of
>>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>
>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>---
>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>3 files changed, 14 insertions(+)
>>>>
>>>>diff --git a/arch/riscv/include/asm/processor.h 
>>>>b/arch/riscv/include/ asm/processor.h
>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>--- a/arch/riscv/include/asm/processor.h
>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>    struct __riscv_d_ext_state fstate;
>>>>    unsigned long bad_cause;
>>>>    unsigned long envcfg;
>>>>+    unsigned long status;
>>
>>Do we really need a new member field in `thread_struct`. We already have
>>`sstatus` in `pt_regs` which reflects overall execution environment 
>>situation
>>for current thread. This gets saved and restored on trap entry and exit.
>>
>>If we put `status` in `thread_struct` it creates ambiguity in terms 
>>of which
>>`status` to save to and pick from from future maintainibility 
>>purposes as the
>>fields get introduced to this CSR.
>>
>>Why can't we access current trap frame's `sstatus` image in 
>>`__switch_to` to
>>save and restore?
>>
>>Let me know if I am missing something obvious here. If there is a 
>>complication,
>>I am missing here and we do end up using this member field, I would 
>>rename it
>>to something like `status_kernel` to reflect that. So that future 
>>changes are
>>cognizant of the fact that we have split `status`. One for kernel 
>>execution env
>>per thread and one for controlling user execution env per thread.
>
>This is so long ago now I cannot remember if there was any sstatus in
>the pt_regs field, 

FS/VS bits encode status of floating point and vector on per-thread basis.
So `status` has been part of `pt_regs` for quite a while. 

> and if kernel threads have the same context as their
>userland parts.

I didn't mean kernel thread. What I meant was kernel execution environment
per-thread. A userland thread does spend sometime in kernel and kernel does
things on its behalf. One of those thing is touching user memory and that
requires mucking with this CSR. So what I meant was are we splitting `status`
on per-thread basis for their time spent in user and kernel.

Getting back to original question--
As I said, each thread spends sometime in user or in kernel. `status` in
`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
`status` field in `pt_regs` is reflecting execution status of the thread on per
trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
for today, certainly for future) of which `status` to pick from when we are
doing save/restore.

So my first question was why not to use `status` in `pt_regs`. It is granular
as it can get (it is available per thread context per trap basis). 


I did ask Alex as well. I'll ping him again.

>
>Does anyone else have any comment on this?
>
>>
>>>>    u32 riscv_v_flags;
>>>>    u32 vstate_ctrl;
>>>>    struct __riscv_v_ext_state vstate;
>>>>diff --git a/arch/riscv/kernel/asm-offsets.c 
>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>index 16490755304e..969c65b1fe41 100644
>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>
>
>-- 
>Ben Dooks				http://www.codethink.co.uk/
>Senior Engineer				Codethink - Providing Genius
>
>https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-21 14:30         ` Alexandre Ghiti
  2025-05-21 14:45           ` Cyril Bur
@ 2025-05-22 16:15           ` Cyril Bur
  2025-05-22 17:40           ` Andy Chiu
  2 siblings, 0 replies; 32+ messages in thread
From: Cyril Bur @ 2025-05-22 16:15 UTC (permalink / raw)
  To: Alexandre Ghiti, Samuel Holland, Ben Dooks, palmer
  Cc: linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69,
	aou, paul.walmsley, charlie, jrtc27

Hi all,

On 21/5/2025 12:30 am, Alexandre Ghiti wrote:
> Hi Samuel,
> 
> On 5/21/25 15:38, Samuel Holland wrote:
>> Hi Alex, Ben,
>>
>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>> Hi Cyril,
>>>>
>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>
>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>> restored.
>>>>>
>>>>> The issue was seen under heavy load especially with the syz-stress 
>>>>> tool
>>>>> running, with crashes as follows in schedule_tail:
>>>>>
>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>> at virtual address 000000002749f0d0
>>>>> Oops [#1]
>>>>> Modules linked in:
>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>> 000000000000000f
>>>>> Call Trace:
>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>> Dumping ftrace buffer:
>>>>>      (ftrace buffer empty)
>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>
>>>>> The issue comes from the put_user() in schedule_tail
>>>>> (kernel/sched/core.c) doing the following:
>>>>>
>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>> {
>>>>> ...
>>>>>           if (current->set_child_tid)
>>>>>                   put_user(task_pid_vnr(current), current- 
>>>>> >set_child_tid);
>>>>> ...
>>>>> }
>>>>>
>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>
>>>>> 1:    __enable_user_access()
>>>>> 2:    reg = task_pid_vnr(current);
>>>>> 3:    *current->set_child_tid = reg;
>>>>> 4:    __disable_user_access()
>>>>>
>>>>> The problem is that we may have a sleeping function as argument which
>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>> evaluating the argument of the put_user() macro outside the user- 
>>>>> enabled
>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>> enabling user access")"
>>>>>
>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros 
>>>>> and
>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>> must ensure code flow can go through switch_to() from within a 
>>>>> region of
>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>> patch addresses the problem allowing future work to enable full use of
>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip 
>>>>> cost
>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>
>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>> ---
>>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>    3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/ 
>>>>> include/ asm/
>>>>> processor.h
>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>        struct __riscv_d_ext_state fstate;
>>>>>        unsigned long bad_cause;
>>>>>        unsigned long envcfg;
>>>>> +    unsigned long status;
>>>>>        u32 riscv_v_flags;
>>>>>        u32 vstate_ctrl;
>>>>>        struct __riscv_v_ext_state vstate;
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/ 
>>>>> asm- offsets.c
>>>>> index 16490755304e..969c65b1fe41 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>>>>> thread_info.preempt_count);
>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>              offsetof(struct task_struct, thread.s[11])
>>>>>            - offsetof(struct task_struct, thread.ra)
>>>>>        );
>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>> +          offsetof(struct task_struct, thread.status)
>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>> +    );
>>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>> --- a/arch/riscv/kernel/entry.S
>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>> +
>>>>> +    /* save the user space access flag */
>>>>> +    li    s0, SR_SUM
>>>>
>>>> This is not needed anymore ^ but I'll remove it when merging your 
>>>> patchset.
>>>>
>>> Could you be more specific about what "this" is?
>>>
>>> If we don't save/restore the SR_SUM bit I think our old friend
>>> the sched_tail bug will just return.
>> I think Alex is saying the `li` instruction above is not needed 
>> because s0 is
>> unused. But instead I think there is an `and` instruction missing 
>> here. The
>> patch as merged ORs the entirety of the old sstatus with the new 
>> sstatus, not
>> just the SUM bit, which seems extremely dangerous.
> 
> 
> I should have checked the definition of csrs, I thought it would write 
> the csr, but you're right it ORs with the current csr value which isn't 
> good at all.
> 
> @Cyril Can you send a patch for that? Which also removes the `li` 
> instruction that I forgot to remove :) I think we can even ask Palmer to 
> squash those fixes directly into the patch.

So I've sent a patch. In writing it, I think Ben was correct to have the 
original patch clear the SUM bit. The way we have it now, if the SUM bit 
is ever set, we don't clear it when swapping to the new thread. The 
condition is unlikely but if you extrapolate far enough, in theory, we 
could start running with the SUM bit effectively permanently on.

Should I resend with also clearing the SUM bit in between?

Cyril
> 
> Let me know if you can't do it and I'll do.
> 
> Thanks Samuel for noticing,
> 
> Alex
> 
> 
>>
>> Regards,
>> Samuel
>>
>>>>> +    csrr  s1, CSR_STATUS
>>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>>> +
>>>>>        /* Save the kernel shadow call stack pointer */
>>>>>        scs_save_current
>>>>>        /* Restore context from next->thread */
>>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>>> +    csrs  CSR_STATUS, s0
>>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>
>>>> Thanks for the multiple revisions!
>>>>
>>>> Alex
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-riscv mailing list
>>>> linux-riscv@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-21 14:30         ` Alexandre Ghiti
  2025-05-21 14:45           ` Cyril Bur
  2025-05-22 16:15           ` [EXT] " Cyril Bur
@ 2025-05-22 17:40           ` Andy Chiu
  2025-05-22 20:03             ` Ben Dooks
  2 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2025-05-22 17:40 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Samuel Holland, Ben Dooks, palmer, linux-riscv, linux-kernel,
	jszhang, syzbot+e74b94fe601ab9552d69, Cyril Bur, aou,
	paul.walmsley, charlie, jrtc27

Hi Samuel and Alex,

On Wed, May 21, 2025 at 10:35 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Samuel,
>
> On 5/21/25 15:38, Samuel Holland wrote:
> > Hi Alex, Ben,
> >
> > On 2025-05-21 3:26 AM, Ben Dooks wrote:
> >> On 22/04/2025 11:22, Alexandre Ghiti wrote:
> >>> Hi Cyril,
> >>>
> >>> On 10/04/2025 09:05, Cyril Bur wrote:
> >>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>
> >>>> When threads/tasks are switched we need to ensure the old execution's
> >>>> SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>> restored.
> >>>>
> >>>> The issue was seen under heavy load especially with the syz-stress tool
> >>>> running, with crashes as follows in schedule_tail:
> >>>>
> >>>> Unable to handle kernel access to user memory without uaccess routines
> >>>> at virtual address 000000002749f0d0
> >>>> Oops [#1]
> >>>> Modules linked in:
> >>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>> Hardware name: riscv-virtio,qemu (DT)
> >>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>> 000000000000000f
> >>>> Call Trace:
> >>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>> Dumping ftrace buffer:
> >>>>      (ftrace buffer empty)
> >>>> ---[ end trace b5f8f9231dc87dda ]---
> >>>>
> >>>> The issue comes from the put_user() in schedule_tail
> >>>> (kernel/sched/core.c) doing the following:
> >>>>
> >>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>> {
> >>>> ...
> >>>>           if (current->set_child_tid)
> >>>>                   put_user(task_pid_vnr(current), current->set_child_tid);
> >>>> ...
> >>>> }
> >>>>
> >>>> the put_user() macro causes the code sequence to come out as follows:
> >>>>
> >>>> 1:    __enable_user_access()
> >>>> 2:    reg = task_pid_vnr(current);
> >>>> 3:    *current->set_child_tid = reg;
> >>>> 4:    __disable_user_access()
> >>>>
> >>>> The problem is that we may have a sleeping function as argument which
> >>>> could clear SR_SUM causing the panic above. This was fixed by
> >>>> evaluating the argument of the put_user() macro outside the user-enabled
> >>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> >>>> enabling user access")"
> >>>>
> >>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> >>>> to avoid the same issue we had with put_user() and sleeping functions we
> >>>> must ensure code flow can go through switch_to() from within a region of
> >>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
> >>>> patch addresses the problem allowing future work to enable full use of
> >>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> >>>> on every access. Make switch_to() save and restore SR_SUM.
> >>>>
> >>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>> ---
> >>>>    arch/riscv/include/asm/processor.h | 1 +
> >>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>    3 files changed, 14 insertions(+)
> >>>>
> >>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
> >>>> processor.h
> >>>> index 5f56eb9d114a..58fd11c89fe9 100644
> >>>> --- a/arch/riscv/include/asm/processor.h
> >>>> +++ b/arch/riscv/include/asm/processor.h
> >>>> @@ -103,6 +103,7 @@ struct thread_struct {
> >>>>        struct __riscv_d_ext_state fstate;
> >>>>        unsigned long bad_cause;
> >>>>        unsigned long envcfg;
> >>>> +    unsigned long status;
> >>>>        u32 riscv_v_flags;
> >>>>        u32 vstate_ctrl;
> >>>>        struct __riscv_v_ext_state vstate;
> >>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
> >>>> index 16490755304e..969c65b1fe41 100644
> >>>> --- a/arch/riscv/kernel/asm-offsets.c
> >>>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
> >>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> >>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> >>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
> >>>>              offsetof(struct task_struct, thread.s[11])
> >>>>            - offsetof(struct task_struct, thread.ra)
> >>>>        );
> >>>> +    DEFINE(TASK_THREAD_STATUS_RA,
> >>>> +          offsetof(struct task_struct, thread.status)
> >>>> +        - offsetof(struct task_struct, thread.ra)
> >>>> +    );
> >>>>        DEFINE(TASK_THREAD_F0_F0,
> >>>>              offsetof(struct task_struct, thread.fstate.f[0])
> >>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> >>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
> >>>> --- a/arch/riscv/kernel/entry.S
> >>>> +++ b/arch/riscv/kernel/entry.S
> >>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
> >>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
> >>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
> >>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
> >>>> +
> >>>> +    /* save the user space access flag */
> >>>> +    li    s0, SR_SUM
> >>>
> >>> This is not needed anymore ^ but I'll remove it when merging your patchset.
> >>>
> >> Could you be more specific about what "this" is?
> >>
> >> If we don't save/restore the SR_SUM bit I think our old friend
> >> the sched_tail bug will just return.
> > I think Alex is saying the `li` instruction above is not needed because s0 is
> > unused. But instead I think there is an `and` instruction missing here. The
> > patch as merged ORs the entirety of the old sstatus with the new sstatus, not
> > just the SUM bit, which seems extremely dangerous.
>

Thanks for noticing this. I've also spent a bit of time pondering...

If this were an "and" instruction, I think we should rename the struct
to "status_sum" to prevent confusions, as it only holds the SUM bit
now. Or maybe we could create a bitfield "any only touch "and
save/restore" the specified bit.

Thanks,
Andy


>
> I should have checked the definition of csrs, I thought it would write
> the csr, but you're right it ORs with the current csr value which isn't
> good at all.
>
> @Cyril Can you send a patch for that? Which also removes the `li`
> instruction that I forgot to remove :) I think we can even ask Palmer to
> squash those fixes directly into the patch.
>
> Let me know if you can't do it and I'll do.
>
> Thanks Samuel for noticing,
>
> Alex
>
>
> >
> > Regards,
> > Samuel
> >
> >>>> +    csrr  s1, CSR_STATUS
> >>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
> >>>> +
> >>>>        /* Save the kernel shadow call stack pointer */
> >>>>        scs_save_current
> >>>>        /* Restore context from next->thread */
> >>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
> >>>> +    csrs  CSR_STATUS, s0
> >>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
> >>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
> >>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
> >>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >>>
> >>> Thanks for the multiple revisions!
> >>>
> >>> Alex
> >>>
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>
> >>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-22 14:49         ` Deepak Gupta
@ 2025-05-22 17:42           ` Andy Chiu
  2025-05-22 22:43             ` Deepak Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2025-05-22 17:42 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27,
	alex, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
> >On 20/05/2025 17:49, Deepak Gupta wrote:
> >>I did give this patch my RB and had planned to come back to it to see
> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
> >>attention again. As it stands today, it doesn't impact cfi related
> >>changes but I've some concerns.
> >>
> >>Overall I do agree we should reduce number of SSTATUS accesses.
> >>
> >>Couple of questions on introducing new `sstatus` field (inline)
> >>
> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>
> >>>>When threads/tasks are switched we need to ensure the old execution's
> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>>restored.
> >>>>
> >>>>The issue was seen under heavy load especially with the syz-stress tool
> >>>>running, with crashes as follows in schedule_tail:
> >>>>
> >>>>Unable to handle kernel access to user memory without uaccess routines
> >>>>at virtual address 000000002749f0d0
> >>>>Oops [#1]
> >>>>Modules linked in:
> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>>Hardware name: riscv-virtio,qemu (DT)
> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>>000000000000000f
> >>>>Call Trace:
> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>>Dumping ftrace buffer:
> >>>> (ftrace buffer empty)
> >>>>---[ end trace b5f8f9231dc87dda ]---
> >>>>
> >>>>The issue comes from the put_user() in schedule_tail
> >>>>(kernel/sched/core.c) doing the following:
> >>>>
> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>>{
> >>>>...
> >>>>      if (current->set_child_tid)
> >>>>              put_user(task_pid_vnr(current), current->set_child_tid);
> >>>>...
> >>>>}
> >>>>
> >>>>the put_user() macro causes the code sequence to come out as follows:
> >>>>
> >>>>1:    __enable_user_access()
> >>>>2:    reg = task_pid_vnr(current);
> >>>>3:    *current->set_child_tid = reg;
> >>>>4:    __disable_user_access()
> >>>>
> >>>>The problem is that we may have a sleeping function as argument which
> >>>>could clear SR_SUM causing the panic above. This was fixed by
> >>>>evaluating the argument of the put_user() macro outside the user-enabled
> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> >>>>enabling user access")"
> >>>>
> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> >>>>to avoid the same issue we had with put_user() and sleeping functions we
> >>>>must ensure code flow can go through switch_to() from within a region of
> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
> >>>>patch addresses the problem allowing future work to enable full use of
> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> >>>>on every access. Make switch_to() save and restore SR_SUM.
> >>>>
> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>>---
> >>>>arch/riscv/include/asm/processor.h | 1 +
> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>3 files changed, 14 insertions(+)
> >>>>
> >>>>diff --git a/arch/riscv/include/asm/processor.h
> >>>>b/arch/riscv/include/ asm/processor.h
> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
> >>>>--- a/arch/riscv/include/asm/processor.h
> >>>>+++ b/arch/riscv/include/asm/processor.h
> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
> >>>>    struct __riscv_d_ext_state fstate;
> >>>>    unsigned long bad_cause;
> >>>>    unsigned long envcfg;
> >>>>+    unsigned long status;
> >>
> >>Do we really need a new member field in `thread_struct`. We already have
> >>`sstatus` in `pt_regs` which reflects overall execution environment
> >>situation
> >>for current thread. This gets saved and restored on trap entry and exit.
> >>
> >>If we put `status` in `thread_struct` it creates ambiguity in terms
> >>of which
> >>`status` to save to and pick from from future maintainibility
> >>purposes as the
> >>fields get introduced to this CSR.
> >>
> >>Why can't we access current trap frame's `sstatus` image in
> >>`__switch_to` to
> >>save and restore?
> >>
> >>Let me know if I am missing something obvious here. If there is a
> >>complication,
> >>I am missing here and we do end up using this member field, I would
> >>rename it
> >>to something like `status_kernel` to reflect that. So that future
> >>changes are
> >>cognizant of the fact that we have split `status`. One for kernel
> >>execution env
> >>per thread and one for controlling user execution env per thread.
> >
> >This is so long ago now I cannot remember if there was any sstatus in
> >the pt_regs field,
>
> FS/VS bits encode status of floating point and vector on per-thread basis.
> So `status` has been part of `pt_regs` for quite a while.
>
> > and if kernel threads have the same context as their
> >userland parts.
>
> I didn't mean kernel thread. What I meant was kernel execution environment
> per-thread. A userland thread does spend sometime in kernel and kernel does
> things on its behalf. One of those thing is touching user memory and that
> requires mucking with this CSR. So what I meant was are we splitting `status`
> on per-thread basis for their time spent in user and kernel.
>
> Getting back to original question--
> As I said, each thread spends sometime in user or in kernel. `status` in
> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
> `status` field in `pt_regs` is reflecting execution status of the thread on per
> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
> for today, certainly for future) of which `status` to pick from when we are
> doing save/restore.

I agree that it's a confusion. sstatus is already saved on pt_regs on
trap entries/return, adding another entry adds code complexity and
makes data inconsistent. But, perhaps we'd eventually need something
like this (I will explain why). Still, there might be a better
approach.

Yes, we can always reflect pt_regs for sstatus. We all know that
pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
point refers to "user's" pt_regs whenever it first enters kernel mode. Here
are reasons why SR_SUM here may or may not be properly tracked. First,
if this is a trap introduced context switch (such as interrupting in a
preemptible context after we manually enable user access in put_user),
then SR_SUM is saved somewhere in the kernel stack, and is not
reference-able with task_pt_reg during context switch. But we are safe
because the trap exit asm would help us restore the correct SR_SUM
back. However, if this is a self-initiating context switch (calling
into schedule()), then SR_SUM is not saved anywhere, and possibly
causing this error.

Preemptible Vector in the kernel mode also had this problem where a
self-initiating context switch loses the track of sstatus.vs. The way
I managed it is to track the VS bit at context switch time. However,
this bug shows that people are repeatedly facing the problem, and
maybe it suggests that we'd need a better way of managing sstatus
across context switches. Given the complex nature of this register,
which also touches the interrupt enable status, I don't think naively
saving/restoring the entire register is the way to go. Maybe the
variable deserves a more specific naming and documentation. And if
we'd need a centralized place for managing these statuses, then it
also has to take care of sstatus.VS.

Thanks,
Andy




>
> So my first question was why not to use `status` in `pt_regs`. It is granular
> as it can get (it is available per thread context per trap basis).
>
>
> I did ask Alex as well. I'll ping him again.
>
> >
> >Does anyone else have any comment on this?
> >
> >>
> >>>>    u32 riscv_v_flags;
> >>>>    u32 vstate_ctrl;
> >>>>    struct __riscv_v_ext_state vstate;
> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
> >>>>b/arch/riscv/kernel/asm- offsets.c
> >>>>index 16490755304e..969c65b1fe41 100644
> >>>>--- a/arch/riscv/kernel/asm-offsets.c
> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>
> >>_______________________________________________
> >>linux-riscv mailing list
> >>linux-riscv@lists.infradead.org
> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>
> >
> >
> >--
> >Ben Dooks                              http://www.codethink.co.uk/
> >Senior Engineer                                Codethink - Providing Genius
> >
> >https://www.codethink.co.uk/privacy.html
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-22 17:40           ` Andy Chiu
@ 2025-05-22 20:03             ` Ben Dooks
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Dooks @ 2025-05-22 20:03 UTC (permalink / raw)
  To: Andy Chiu, Alexandre Ghiti
  Cc: Samuel Holland, palmer, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69, Cyril Bur, aou, paul.walmsley,
	charlie, jrtc27

On 22/05/2025 18:40, Andy Chiu wrote:
> Hi Samuel and Alex,
> 
> On Wed, May 21, 2025 at 10:35 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Samuel,
>>
>> On 5/21/25 15:38, Samuel Holland wrote:
>>> Hi Alex, Ben,
>>>
>>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>>> Hi Cyril,
>>>>>
>>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>
>>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>> restored.
>>>>>>
>>>>>> The issue was seen under heavy load especially with the syz-stress tool
>>>>>> running, with crashes as follows in schedule_tail:
>>>>>>
>>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>>> at virtual address 000000002749f0d0
>>>>>> Oops [#1]
>>>>>> Modules linked in:
>>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>     ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>     ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>>     gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>     t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>     s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>     a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>     a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>     s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>     s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>     s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>     s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>     t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>> 000000000000000f
>>>>>> Call Trace:
>>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>> Dumping ftrace buffer:
>>>>>>       (ftrace buffer empty)
>>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>>
>>>>>> The issue comes from the put_user() in schedule_tail
>>>>>> (kernel/sched/core.c) doing the following:
>>>>>>
>>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>> {
>>>>>> ...
>>>>>>            if (current->set_child_tid)
>>>>>>                    put_user(task_pid_vnr(current), current->set_child_tid);
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>>
>>>>>> 1:    __enable_user_access()
>>>>>> 2:    reg = task_pid_vnr(current);
>>>>>> 3:    *current->set_child_tid = reg;
>>>>>> 4:    __disable_user_access()
>>>>>>
>>>>>> The problem is that we may have a sleeping function as argument which
>>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>>> evaluating the argument of the put_user() macro outside the user-enabled
>>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>>> enabling user access")"
>>>>>>
>>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>>>> to avoid the same issue we had with put_user() and sleeping functions we
>>>>>> must ensure code flow can go through switch_to() from within a region of
>>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>>> patch addresses the problem allowing future work to enable full use of
>>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>>
>>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>> ---
>>>>>>     arch/riscv/include/asm/processor.h | 1 +
>>>>>>     arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>     arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>     3 files changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>>>>> processor.h
>>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>         struct __riscv_d_ext_state fstate;
>>>>>>         unsigned long bad_cause;
>>>>>>         unsigned long envcfg;
>>>>>> +    unsigned long status;
>>>>>>         u32 riscv_v_flags;
>>>>>>         u32 vstate_ctrl;
>>>>>>         struct __riscv_v_ext_state vstate;
>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>>>>> index 16490755304e..969c65b1fe41 100644
>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>         OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>         OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>         OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>>         OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>>         OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>>               offsetof(struct task_struct, thread.s[11])
>>>>>>             - offsetof(struct task_struct, thread.ra)
>>>>>>         );
>>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>>> +          offsetof(struct task_struct, thread.status)
>>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>>> +    );
>>>>>>         DEFINE(TASK_THREAD_F0_F0,
>>>>>>               offsetof(struct task_struct, thread.fstate.f[0])
>>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>>> --- a/arch/riscv/kernel/entry.S
>>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>>         REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>>         REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>>         REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>>> +
>>>>>> +    /* save the user space access flag */
>>>>>> +    li    s0, SR_SUM
>>>>>
>>>>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>>>>
>>>> Could you be more specific about what "this" is?
>>>>
>>>> If we don't save/restore the SR_SUM bit I think our old friend
>>>> the sched_tail bug will just return.
>>> I think Alex is saying the `li` instruction above is not needed because s0 is
>>> unused. But instead I think there is an `and` instruction missing here. The
>>> patch as merged ORs the entirety of the old sstatus with the new sstatus, not
>>> just the SUM bit, which seems extremely dangerous.
>>
> 
> Thanks for noticing this. I've also spent a bit of time pondering...
> 
> If this were an "and" instruction, I think we should rename the struct
> to "status_sum" to prevent confusions, as it only holds the SUM bit
> now. Or maybe we could create a bitfield "any only touch "and
> save/restore" the specified bit.
> 
> Thanks,
> Andy

So, is it worth just saving/restoring all the flags in the SSTATUS
or do we need to have some sort of mask (and if so, are there other
flags we should make sure get saved?)

I don't have time to setup a test system at the moment and I am out
of office until Tuesday 27th anyway with limited email access to my
codethink emails.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-22 17:42           ` Andy Chiu
@ 2025-05-22 22:43             ` Deepak Gupta
  2025-05-23 12:22               ` Alexandre Ghiti
  0 siblings, 1 reply; 32+ messages in thread
From: Deepak Gupta @ 2025-05-22 22:43 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27,
	alex, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> wrote:
>>
>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>> >On 20/05/2025 17:49, Deepak Gupta wrote:
>> >>I did give this patch my RB and had planned to come back to it to see
>> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
>> >>attention again. As it stands today, it doesn't impact cfi related
>> >>changes but I've some concerns.
>> >>
>> >>Overall I do agree we should reduce number of SSTATUS accesses.
>> >>
>> >>Couple of questions on introducing new `sstatus` field (inline)
>> >>
>> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>
>> >>>>When threads/tasks are switched we need to ensure the old execution's
>> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>> >>>>restored.
>> >>>>
>> >>>>The issue was seen under heavy load especially with the syz-stress tool
>> >>>>running, with crashes as follows in schedule_tail:
>> >>>>
>> >>>>Unable to handle kernel access to user memory without uaccess routines
>> >>>>at virtual address 000000002749f0d0
>> >>>>Oops [#1]
>> >>>>Modules linked in:
>> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> >>>>Hardware name: riscv-virtio,qemu (DT)
>> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> >>>>000000000000000f
>> >>>>Call Trace:
>> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>> >>>>Dumping ftrace buffer:
>> >>>> (ftrace buffer empty)
>> >>>>---[ end trace b5f8f9231dc87dda ]---
>> >>>>
>> >>>>The issue comes from the put_user() in schedule_tail
>> >>>>(kernel/sched/core.c) doing the following:
>> >>>>
>> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> >>>>{
>> >>>>...
>> >>>>      if (current->set_child_tid)
>> >>>>              put_user(task_pid_vnr(current), current->set_child_tid);
>> >>>>...
>> >>>>}
>> >>>>
>> >>>>the put_user() macro causes the code sequence to come out as follows:
>> >>>>
>> >>>>1:    __enable_user_access()
>> >>>>2:    reg = task_pid_vnr(current);
>> >>>>3:    *current->set_child_tid = reg;
>> >>>>4:    __disable_user_access()
>> >>>>
>> >>>>The problem is that we may have a sleeping function as argument which
>> >>>>could clear SR_SUM causing the panic above. This was fixed by
>> >>>>evaluating the argument of the put_user() macro outside the user-enabled
>> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> >>>>enabling user access")"
>> >>>>
>> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> >>>>to avoid the same issue we had with put_user() and sleeping functions we
>> >>>>must ensure code flow can go through switch_to() from within a region of
>> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> >>>>patch addresses the problem allowing future work to enable full use of
>> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> >>>>on every access. Make switch_to() save and restore SR_SUM.
>> >>>>
>> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> >>>>---
>> >>>>arch/riscv/include/asm/processor.h | 1 +
>> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>> >>>>3 files changed, 14 insertions(+)
>> >>>>
>> >>>>diff --git a/arch/riscv/include/asm/processor.h
>> >>>>b/arch/riscv/include/ asm/processor.h
>> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
>> >>>>--- a/arch/riscv/include/asm/processor.h
>> >>>>+++ b/arch/riscv/include/asm/processor.h
>> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
>> >>>>    struct __riscv_d_ext_state fstate;
>> >>>>    unsigned long bad_cause;
>> >>>>    unsigned long envcfg;
>> >>>>+    unsigned long status;
>> >>
>> >>Do we really need a new member field in `thread_struct`. We already have
>> >>`sstatus` in `pt_regs` which reflects overall execution environment
>> >>situation
>> >>for current thread. This gets saved and restored on trap entry and exit.
>> >>
>> >>If we put `status` in `thread_struct` it creates ambiguity in terms
>> >>of which
>> >>`status` to save to and pick from from future maintainibility
>> >>purposes as the
>> >>fields get introduced to this CSR.
>> >>
>> >>Why can't we access current trap frame's `sstatus` image in
>> >>`__switch_to` to
>> >>save and restore?
>> >>
>> >>Let me know if I am missing something obvious here. If there is a
>> >>complication,
>> >>I am missing here and we do end up using this member field, I would
>> >>rename it
>> >>to something like `status_kernel` to reflect that. So that future
>> >>changes are
>> >>cognizant of the fact that we have split `status`. One for kernel
>> >>execution env
>> >>per thread and one for controlling user execution env per thread.
>> >
>> >This is so long ago now I cannot remember if there was any sstatus in
>> >the pt_regs field,
>>
>> FS/VS bits encode status of floating point and vector on per-thread basis.
>> So `status` has been part of `pt_regs` for quite a while.
>>
>> > and if kernel threads have the same context as their
>> >userland parts.
>>
>> I didn't mean kernel thread. What I meant was kernel execution environment
>> per-thread. A userland thread does spend sometime in kernel and kernel does
>> things on its behalf. One of those thing is touching user memory and that
>> requires mucking with this CSR. So what I meant was are we splitting `status`
>> on per-thread basis for their time spent in user and kernel.
>>
>> Getting back to original question--
>> As I said, each thread spends sometime in user or in kernel. `status` in
>> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>> `status` field in `pt_regs` is reflecting execution status of the thread on per
>> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
>> for today, certainly for future) of which `status` to pick from when we are
>> doing save/restore.
>
>I agree that it's a confusion. sstatus is already saved on pt_regs on
>trap entries/return, adding another entry adds code complexity and
>makes data inconsistent. But, perhaps we'd eventually need something
>like this (I will explain why). Still, there might be a better
>approach.
>
>Yes, we can always reflect pt_regs for sstatus. We all know that
>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>point refers to "user's" pt_regs whenever it first enters kernel mode. Here
>are reasons why SR_SUM here may or may not be properly tracked. First,
>if this is a trap introduced context switch (such as interrupting in a
>preemptible context after we manually enable user access in put_user),
>then SR_SUM is saved somewhere in the kernel stack, and is not
>reference-able with task_pt_reg during context switch. But we are safe
>because the trap exit asm would help us restore the correct SR_SUM
>back. However, if this is a self-initiating context switch (calling
>into schedule()), then SR_SUM is not saved anywhere, and possibly
>causing this error.
>
>Preemptible Vector in the kernel mode also had this problem where a
>self-initiating context switch loses the track of sstatus.vs. The way
>I managed it is to track the VS bit at context switch time. However,
>this bug shows that people are repeatedly facing the problem, and
>maybe it suggests that we'd need a better way of managing sstatus
>across context switches. Given the complex nature of this register,
>which also touches the interrupt enable status, I don't think naively
>saving/restoring the entire register is the way to go. Maybe the
>variable deserves a more specific naming and documentation. And if
>we'd need a centralized place for managing these statuses, then it
>also has to take care of sstatus.VS.


IMHO, the problem we are trying to solve in this patch is easily solvable in
below manner.


diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..499d00a6fb67 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct task_struct *prev,
         fstate_restore(next, task_pt_regs(next));
  }
  
+static inline void __switch_to_status(struct task_struct *prev,
+                                  struct task_struct *next)
+{
+       struct pt_regs *regs;
+
+       /* save status */
+       regs = task_pt_regs(prev);
+       regs->status = csr_read(CSR_STATUS);
+
+       /* restore status */
+       regs = task_pt_regs(next);
+       csr_write(CSR_STATUS, regs->status);
+}
+
  static __always_inline bool has_fpu(void)
  {
         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
@@ -115,6 +129,7 @@ do {                                                        \
         struct task_struct *__prev = (prev);            \
         struct task_struct *__next = (next);            \
         __set_prev_cpu(__prev->thread);                 \
+       __switch_to_status(__prev, __next)              \
         if (has_fpu())                                  \
                 __switch_to_fpu(__prev, __next);        \
         if (has_vector() || has_xtheadvector())         \
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 8d25837a9384..a3b98c1be055 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
         REG_S x5,  PT_T0(sp)
         save_from_x6_to_x31
  
-       /*
-        * Disable user-mode memory access as it should only be set in the
-        * actual user copy routines.
-        *
-        * Disable the FPU/Vector to detect illegal usage of floating point
-        * or vector in kernel space.
-        */
-       li t0, SR_SUM | SR_FS_VS | SR_ELP
-
         REG_L s0, TASK_TI_USER_SP(tp)
-       csrrc s1, CSR_STATUS, t0
+       csrr s1, CSR_STATUS
         save_userssp s2, s1
         csrr s2, CSR_EPC
         csrr s3, CSR_TVAL
@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
         REG_S s4, PT_CAUSE(sp)
         REG_S s5, PT_TP(sp)
  
+       /*
+        * It is fresh trap entry. Disable user-mode memory access as it should only be set in the
+        * actual user copy routines.
+        *
+        * Disable the FPU/Vector to detect illegal usage of floating point
+        * or vector in kernel space.
+        */
+       li t0, SR_SUM | SR_FS_VS | SR_ELP
+       csrrc s1, CSR_STATUS, t0
+
         /*
          * Set the scratch register to 0, so that if a recursive exception
          * occurs, the exception vector knows it came from the kernel



During the time spent in kernel if sets SUM bit in status then, above
`__switch_to_status` will ensure that `status` will get saved for current
thread and restored for next thread.

Furthermore, current trap entry code clears FS/VS/SUM (for right reasons). It
represents non-linear change of control flow and thus whatever will execute next
shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
modifies the flow by first saving the `status` on trap frame (thus if previous
trap frame had SUM=1, it will be saved and restored). And then it
unconditionally clears the SUM/FS/VS to ensure that this new trap context runs
without needing SUM=1. This ensures nesting of trap frames without diluting
security properties of SUM.

>
>Thanks,
>Andy
>
>
>
>
>>
>> So my first question was why not to use `status` in `pt_regs`. It is granular
>> as it can get (it is available per thread context per trap basis).
>>
>>
>> I did ask Alex as well. I'll ping him again.
>>
>> >
>> >Does anyone else have any comment on this?
>> >
>> >>
>> >>>>    u32 riscv_v_flags;
>> >>>>    u32 vstate_ctrl;
>> >>>>    struct __riscv_v_ext_state vstate;
>> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>> >>>>b/arch/riscv/kernel/asm- offsets.c
>> >>>>index 16490755304e..969c65b1fe41 100644
>> >>>>--- a/arch/riscv/kernel/asm-offsets.c
>> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
>> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> >>
>> >>_______________________________________________
>> >>linux-riscv mailing list
>> >>linux-riscv@lists.infradead.org
>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>
>> >
>> >
>> >--
>> >Ben Dooks                              http://www.codethink.co.uk/
>> >Senior Engineer                                Codethink - Providing Genius
>> >
>> >https://www.codethink.co.uk/privacy.html
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-22 22:43             ` Deepak Gupta
@ 2025-05-23 12:22               ` Alexandre Ghiti
  2025-05-23 17:14                 ` Deepak Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Alexandre Ghiti @ 2025-05-23 12:22 UTC (permalink / raw)
  To: Deepak Gupta, Andy Chiu
  Cc: Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley, charlie, jrtc27,
	linux-riscv, linux-kernel, jszhang, syzbot+e74b94fe601ab9552d69

Hi Andy, Deepak,

On 5/23/25 00:43, Deepak Gupta wrote:
> On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>> On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>> wrote:
>>>
>>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>> >On 20/05/2025 17:49, Deepak Gupta wrote:
>>> >>I did give this patch my RB and had planned to come back to it to see
>>> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>> >>attention again. As it stands today, it doesn't impact cfi related
>>> >>changes but I've some concerns.
>>> >>
>>> >>Overall I do agree we should reduce number of SSTATUS accesses.
>>> >>
>>> >>Couple of questions on introducing new `sstatus` field (inline)
>>> >>
>>> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>> >>>>
>>> >>>>When threads/tasks are switched we need to ensure the old 
>>> execution's
>>> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>> >>>>restored.
>>> >>>>
>>> >>>>The issue was seen under heavy load especially with the 
>>> syz-stress tool
>>> >>>>running, with crashes as follows in schedule_tail:
>>> >>>>
>>> >>>>Unable to handle kernel access to user memory without uaccess 
>>> routines
>>> >>>>at virtual address 000000002749f0d0
>>> >>>>Oops [#1]
>>> >>>>Modules linked in:
>>> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> >>>>Hardware name: riscv-virtio,qemu (DT)
>>> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> >>>>000000000000000f
>>> >>>>Call Trace:
>>> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>> kernel/sched/core.c:4264
>>> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> >>>>Dumping ftrace buffer:
>>> >>>> (ftrace buffer empty)
>>> >>>>---[ end trace b5f8f9231dc87dda ]---
>>> >>>>
>>> >>>>The issue comes from the put_user() in schedule_tail
>>> >>>>(kernel/sched/core.c) doing the following:
>>> >>>>
>>> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> >>>>{
>>> >>>>...
>>> >>>>      if (current->set_child_tid)
>>> >>>>              put_user(task_pid_vnr(current), 
>>> current->set_child_tid);
>>> >>>>...
>>> >>>>}
>>> >>>>
>>> >>>>the put_user() macro causes the code sequence to come out as 
>>> follows:
>>> >>>>
>>> >>>>1:    __enable_user_access()
>>> >>>>2:    reg = task_pid_vnr(current);
>>> >>>>3:    *current->set_child_tid = reg;
>>> >>>>4:    __disable_user_access()
>>> >>>>
>>> >>>>The problem is that we may have a sleeping function as argument 
>>> which
>>> >>>>could clear SR_SUM causing the panic above. This was fixed by
>>> >>>>evaluating the argument of the put_user() macro outside the 
>>> user-enabled
>>> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg 
>>> before
>>> >>>>enabling user access")"
>>> >>>>
>>> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() 
>>> macros and
>>> >>>>to avoid the same issue we had with put_user() and sleeping 
>>> functions we
>>> >>>>must ensure code flow can go through switch_to() from within a 
>>> region of
>>> >>>>code with SR_SUM enabled and come back with SR_SUM still 
>>> enabled. This
>>> >>>>patch addresses the problem allowing future work to enable full 
>>> use of
>>> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit 
>>> flip cost
>>> >>>>on every access. Make switch_to() save and restore SR_SUM.
>>> >>>>
>>> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> >>>>---
>>> >>>>arch/riscv/include/asm/processor.h | 1 +
>>> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>> >>>>3 files changed, 14 insertions(+)
>>> >>>>
>>> >>>>diff --git a/arch/riscv/include/asm/processor.h
>>> >>>>b/arch/riscv/include/ asm/processor.h
>>> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>> >>>>--- a/arch/riscv/include/asm/processor.h
>>> >>>>+++ b/arch/riscv/include/asm/processor.h
>>> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>> >>>>    struct __riscv_d_ext_state fstate;
>>> >>>>    unsigned long bad_cause;
>>> >>>>    unsigned long envcfg;
>>> >>>>+    unsigned long status;
>>> >>
>>> >>Do we really need a new member field in `thread_struct`. We 
>>> already have
>>> >>`sstatus` in `pt_regs` which reflects overall execution environment
>>> >>situation
>>> >>for current thread. This gets saved and restored on trap entry and 
>>> exit.
>>> >>
>>> >>If we put `status` in `thread_struct` it creates ambiguity in terms
>>> >>of which
>>> >>`status` to save to and pick from from future maintainibility
>>> >>purposes as the
>>> >>fields get introduced to this CSR.
>>> >>
>>> >>Why can't we access current trap frame's `sstatus` image in
>>> >>`__switch_to` to
>>> >>save and restore?
>>> >>
>>> >>Let me know if I am missing something obvious here. If there is a
>>> >>complication,
>>> >>I am missing here and we do end up using this member field, I would
>>> >>rename it
>>> >>to something like `status_kernel` to reflect that. So that future
>>> >>changes are
>>> >>cognizant of the fact that we have split `status`. One for kernel
>>> >>execution env
>>> >>per thread and one for controlling user execution env per thread.
>>> >
>>> >This is so long ago now I cannot remember if there was any sstatus in
>>> >the pt_regs field,
>>>
>>> FS/VS bits encode status of floating point and vector on per-thread 
>>> basis.
>>> So `status` has been part of `pt_regs` for quite a while.
>>>
>>> > and if kernel threads have the same context as their
>>> >userland parts.
>>>
>>> I didn't mean kernel thread. What I meant was kernel execution 
>>> environment
>>> per-thread. A userland thread does spend sometime in kernel and 
>>> kernel does
>>> things on its behalf. One of those thing is touching user memory and 
>>> that
>>> requires mucking with this CSR. So what I meant was are we splitting 
>>> `status`
>>> on per-thread basis for their time spent in user and kernel.
>>>
>>> Getting back to original question--
>>> As I said, each thread spends sometime in user or in kernel. 
>>> `status` in
>>> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>>> `status` field in `pt_regs` is reflecting execution status of the 
>>> thread on per
>>> trap basis. Introducing `status` in `thread_struct` creates a 
>>> confusion (if not
>>> for today, certainly for future) of which `status` to pick from when 
>>> we are
>>> doing save/restore.
>>
>> I agree that it's a confusion. sstatus is already saved on pt_regs on
>> trap entries/return, adding another entry adds code complexity and
>> makes data inconsistent. But, perhaps we'd eventually need something
>> like this (I will explain why). Still, there might be a better
>> approach.
>>
>> Yes, we can always reflect pt_regs for sstatus. We all know that
>> pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>> point refers to "user's" pt_regs whenever it first enters kernel 
>> mode. Here
>> are reasons why SR_SUM here may or may not be properly tracked. First,
>> if this is a trap introduced context switch (such as interrupting in a
>> preemptible context after we manually enable user access in put_user),
>> then SR_SUM is saved somewhere in the kernel stack, and is not
>> reference-able with task_pt_reg during context switch. But we are safe
>> because the trap exit asm would help us restore the correct SR_SUM
>> back. However, if this is a self-initiating context switch (calling
>> into schedule()), then SR_SUM is not saved anywhere, and possibly
>> causing this error.
>>
>> Preemptible Vector in the kernel mode also had this problem where a
>> self-initiating context switch loses the track of sstatus.vs. The way
>> I managed it is to track the VS bit at context switch time. However,
>> this bug shows that people are repeatedly facing the problem, and
>> maybe it suggests that we'd need a better way of managing sstatus
>> across context switches. Given the complex nature of this register,
>> which also touches the interrupt enable status, I don't think naively
>> saving/restoring the entire register is the way to go. Maybe the
>> variable deserves a more specific naming and documentation. And if
>> we'd need a centralized place for managing these statuses, then it
>> also has to take care of sstatus.VS.


Andy, thanks for the precise explanation of the problem :)

So it took me some time but here are my thoughts on this. We should 
treat pt_regs and thread_struct differently as they do not represent the 
same thing:
- pt_regs represents the context of a thread when it takes a trap
- thread_struct represents a "kernel-induced" (or a "in-kernel") context 
not caused by traps

That's why I don't really like Deepak's proposal below as it mixes both 
and I find it tricky.

I can't find a situation where saving/restoring the entire sstatus at 
context-switch is a problem though, does anyone have such thing in mind?

Finally I understand that having another copy of sstatus in 
thread_struct is not intuitive and we should, either explain why or only 
store the SUM bit (like for sstatus.VS).

Please continue the discussion as we need to find a solution that 
pleases everyone soon :)

Thanks all for jumping in,

Alex


>
>
> IMHO, the problem we are trying to solve in this patch is easily 
> solvable in
> below manner.
>
>
> diff --git a/arch/riscv/include/asm/switch_to.h 
> b/arch/riscv/include/asm/switch_to.h
> index 0e71eb82f920..499d00a6fb67 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
> task_struct *prev,
>         fstate_restore(next, task_pt_regs(next));
>  }
>
> +static inline void __switch_to_status(struct task_struct *prev,
> +                                  struct task_struct *next)
> +{
> +       struct pt_regs *regs;
> +
> +       /* save status */
> +       regs = task_pt_regs(prev);
> +       regs->status = csr_read(CSR_STATUS);
> +
> +       /* restore status */
> +       regs = task_pt_regs(next);
> +       csr_write(CSR_STATUS, regs->status);
> +}
> +
>  static __always_inline bool has_fpu(void)
>  {
>         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
> @@ -115,6 +129,7 @@ do 
> {                                                        \
>         struct task_struct *__prev = (prev);            \
>         struct task_struct *__next = (next);            \
>         __set_prev_cpu(__prev->thread);                 \
> +       __switch_to_status(__prev, __next)              \
>         if (has_fpu())                                  \
>                 __switch_to_fpu(__prev, __next);        \
>         if (has_vector() || has_xtheadvector())         \
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 8d25837a9384..a3b98c1be055 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>         REG_S x5,  PT_T0(sp)
>         save_from_x6_to_x31
>
> -       /*
> -        * Disable user-mode memory access as it should only be set in 
> the
> -        * actual user copy routines.
> -        *
> -        * Disable the FPU/Vector to detect illegal usage of floating 
> point
> -        * or vector in kernel space.
> -        */
> -       li t0, SR_SUM | SR_FS_VS | SR_ELP
> -
>         REG_L s0, TASK_TI_USER_SP(tp)
> -       csrrc s1, CSR_STATUS, t0
> +       csrr s1, CSR_STATUS
>         save_userssp s2, s1
>         csrr s2, CSR_EPC
>         csrr s3, CSR_TVAL
> @@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>         REG_S s4, PT_CAUSE(sp)
>         REG_S s5, PT_TP(sp)
>
> +       /*
> +        * It is fresh trap entry. Disable user-mode memory access as 
> it should only be set in the
> +        * actual user copy routines.
> +        *
> +        * Disable the FPU/Vector to detect illegal usage of floating 
> point
> +        * or vector in kernel space.
> +        */
> +       li t0, SR_SUM | SR_FS_VS | SR_ELP
> +       csrrc s1, CSR_STATUS, t0
> +
>         /*
>          * Set the scratch register to 0, so that if a recursive 
> exception
>          * occurs, the exception vector knows it came from the kernel
>
>
>
> During the time spent in kernel if sets SUM bit in status then, above
> `__switch_to_status` will ensure that `status` will get saved for current
> thread and restored for next thread.
>
> Furthermore, current trap entry code clears FS/VS/SUM (for right 
> reasons). It
> represents non-linear change of control flow and thus whatever will 
> execute next
> shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
> modifies the flow by first saving the `status` on trap frame (thus if 
> previous
> trap frame had SUM=1, it will be saved and restored). And then it
> unconditionally clears the SUM/FS/VS to ensure that this new trap 
> context runs
> without needing SUM=1. This ensures nesting of trap frames without 
> diluting
> security properties of SUM.
>
>>
>> Thanks,
>> Andy
>>
>>
>>
>>
>>>
>>> So my first question was why not to use `status` in `pt_regs`. It is 
>>> granular
>>> as it can get (it is available per thread context per trap basis).
>>>
>>>
>>> I did ask Alex as well. I'll ping him again.
>>>
>>> >
>>> >Does anyone else have any comment on this?
>>> >
>>> >>
>>> >>>>    u32 riscv_v_flags;
>>> >>>>    u32 vstate_ctrl;
>>> >>>>    struct __riscv_v_ext_state vstate;
>>> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>> >>>>b/arch/riscv/kernel/asm- offsets.c
>>> >>>>index 16490755304e..969c65b1fe41 100644
>>> >>>>--- a/arch/riscv/kernel/asm-offsets.c
>>> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>> >>
>>> >>_______________________________________________
>>> >>linux-riscv mailing list
>>> >>linux-riscv@lists.infradead.org
>>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> >>
>>> >
>>> >
>>> >--
>>> >Ben Dooks http://www.codethink.co.uk/
>>> >Senior Engineer                                Codethink - 
>>> Providing Genius
>>> >
>>> >https://www.codethink.co.uk/privacy.html
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-23 12:22               ` Alexandre Ghiti
@ 2025-05-23 17:14                 ` Deepak Gupta
  2025-05-23 20:00                   ` Alexandre Ghiti
  2025-05-24 10:00                   ` Andy Chiu
  0 siblings, 2 replies; 32+ messages in thread
From: Deepak Gupta @ 2025-05-23 17:14 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andy Chiu, Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley,
	charlie, jrtc27, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>Hi Andy, Deepak,
>
>On 5/23/25 00:43, Deepak Gupta wrote:
>>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>>>wrote:
>>>>
>>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>I did give this patch my RB and had planned to come back to it to see
>>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>>>>>attention again. As it stands today, it doesn't impact cfi related
>>>>>>changes but I've some concerns.
>>>>>>
>>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>
>>>>>>Couple of questions on introducing new `sstatus` field (inline)
>>>>>>
>>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>
>>>>>>>>When threads/tasks are switched we need to ensure the old 
>>>>execution's
>>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>restored.
>>>>>>>>
>>>>>>>>The issue was seen under heavy load especially with the 
>>>>syz-stress tool
>>>>>>>>running, with crashes as follows in schedule_tail:
>>>>>>>>
>>>>>>>>Unable to handle kernel access to user memory without 
>>>>uaccess routines
>>>>>>>>at virtual address 000000002749f0d0
>>>>>>>>Oops [#1]
>>>>>>>>Modules linked in:
>>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>000000000000000f
>>>>>>>>Call Trace:
>>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>>>kernel/sched/core.c:4264
>>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>Dumping ftrace buffer:
>>>>>>>> (ftrace buffer empty)
>>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>
>>>>>>>>The issue comes from the put_user() in schedule_tail
>>>>>>>>(kernel/sched/core.c) doing the following:
>>>>>>>>
>>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>{
>>>>>>>>...
>>>>>>>>      if (current->set_child_tid)
>>>>>>>>              put_user(task_pid_vnr(current), 
>>>>current->set_child_tid);
>>>>>>>>...
>>>>>>>>}
>>>>>>>>
>>>>>>>>the put_user() macro causes the code sequence to come out as 
>>>>follows:
>>>>>>>>
>>>>>>>>1:    __enable_user_access()
>>>>>>>>2:    reg = task_pid_vnr(current);
>>>>>>>>3:    *current->set_child_tid = reg;
>>>>>>>>4:    __disable_user_access()
>>>>>>>>
>>>>>>>>The problem is that we may have a sleeping function as 
>>>>argument which
>>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>evaluating the argument of the put_user() macro outside the 
>>>>user-enabled
>>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() 
>>>>arg before
>>>>>>>>enabling user access")"
>>>>>>>>
>>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX() 
>>>>macros and
>>>>>>>>to avoid the same issue we had with put_user() and sleeping 
>>>>functions we
>>>>>>>>must ensure code flow can go through switch_to() from within 
>>>>a region of
>>>>>>>>code with SR_SUM enabled and come back with SR_SUM still 
>>>>enabled. This
>>>>>>>>patch addresses the problem allowing future work to enable 
>>>>full use of
>>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR 
>>>>bit flip cost
>>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>
>>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>---
>>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>3 files changed, 14 insertions(+)
>>>>>>>>
>>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>b/arch/riscv/include/ asm/processor.h
>>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>--- a/arch/riscv/include/asm/processor.h
>>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>    struct __riscv_d_ext_state fstate;
>>>>>>>>    unsigned long bad_cause;
>>>>>>>>    unsigned long envcfg;
>>>>>>>>+    unsigned long status;
>>>>>>
>>>>>>Do we really need a new member field in `thread_struct`. We 
>>>>already have
>>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>situation
>>>>>>for current thread. This gets saved and restored on trap entry 
>>>>and exit.
>>>>>>
>>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>of which
>>>>>>`status` to save to and pick from from future maintainibility
>>>>>>purposes as the
>>>>>>fields get introduced to this CSR.
>>>>>>
>>>>>>Why can't we access current trap frame's `sstatus` image in
>>>>>>`__switch_to` to
>>>>>>save and restore?
>>>>>>
>>>>>>Let me know if I am missing something obvious here. If there is a
>>>>>>complication,
>>>>>>I am missing here and we do end up using this member field, I would
>>>>>>rename it
>>>>>>to something like `status_kernel` to reflect that. So that future
>>>>>>changes are
>>>>>>cognizant of the fact that we have split `status`. One for kernel
>>>>>>execution env
>>>>>>per thread and one for controlling user execution env per thread.
>>>>>
>>>>>This is so long ago now I cannot remember if there was any sstatus in
>>>>>the pt_regs field,
>>>>
>>>>FS/VS bits encode status of floating point and vector on 
>>>>per-thread basis.
>>>>So `status` has been part of `pt_regs` for quite a while.
>>>>
>>>>> and if kernel threads have the same context as their
>>>>>userland parts.
>>>>
>>>>I didn't mean kernel thread. What I meant was kernel execution 
>>>>environment
>>>>per-thread. A userland thread does spend sometime in kernel and 
>>>>kernel does
>>>>things on its behalf. One of those thing is touching user memory 
>>>>and that
>>>>requires mucking with this CSR. So what I meant was are we 
>>>>splitting `status`
>>>>on per-thread basis for their time spent in user and kernel.
>>>>
>>>>Getting back to original question--
>>>>As I said, each thread spends sometime in user or in kernel. 
>>>>`status` in
>>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>>>>`status` field in `pt_regs` is reflecting execution status of 
>>>>the thread on per
>>>>trap basis. Introducing `status` in `thread_struct` creates a 
>>>>confusion (if not
>>>>for today, certainly for future) of which `status` to pick from 
>>>>when we are
>>>>doing save/restore.
>>>
>>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>trap entries/return, adding another entry adds code complexity and
>>>makes data inconsistent. But, perhaps we'd eventually need something
>>>like this (I will explain why). Still, there might be a better
>>>approach.
>>>
>>>Yes, we can always reflect pt_regs for sstatus. We all know that
>>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>point refers to "user's" pt_regs whenever it first enters kernel 
>>>mode. Here
>>>are reasons why SR_SUM here may or may not be properly tracked. First,
>>>if this is a trap introduced context switch (such as interrupting in a
>>>preemptible context after we manually enable user access in put_user),
>>>then SR_SUM is saved somewhere in the kernel stack, and is not
>>>reference-able with task_pt_reg during context switch. But we are safe
>>>because the trap exit asm would help us restore the correct SR_SUM
>>>back. However, if this is a self-initiating context switch (calling
>>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>causing this error.
>>>
>>>Preemptible Vector in the kernel mode also had this problem where a
>>>self-initiating context switch loses the track of sstatus.vs. The way
>>>I managed it is to track the VS bit at context switch time. However,
>>>this bug shows that people are repeatedly facing the problem, and
>>>maybe it suggests that we'd need a better way of managing sstatus
>>>across context switches. Given the complex nature of this register,
>>>which also touches the interrupt enable status, I don't think naively
>>>saving/restoring the entire register is the way to go. Maybe the
>>>variable deserves a more specific naming and documentation. And if
>>>we'd need a centralized place for managing these statuses, then it
>>>also has to take care of sstatus.VS.
>
>
>Andy, thanks for the precise explanation of the problem :)
>
>So it took me some time but here are my thoughts on this. We should 
>treat pt_regs and thread_struct differently as they do not represent 
>the same thing:
>- pt_regs represents the context of a thread when it takes a trap
>- thread_struct represents a "kernel-induced" (or a "in-kernel") 
>context not caused by traps

Exactly they represent different context of execution. Trap represents a
non-linear control flow change and thus a fresh start of execution control
flow into kernel while `kernel-induced` one's are again non-linear but
fully a kernel/software construct.

A fresh trapped execution context shouldn't have SUM set which is how it is
currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
saved in `pt_regs` (including SR_IE) so that it could be restored whenever
`sret` happens.

The problem we'are seeing here is two fold--

1) We don't want to set and clear when we are accessing array/string for each
    word. This is software problem and this entire series is addressing it.

2) To avoid first problem we are optimizing the access to CSR by setting it
    once and clearing it once. But now we don't want to loose this bit if there
    were:

	a) trap in between 
         b) kernel induced schedule out
         c) a) followed by b)
         d) a) followed by another a)
         e) nested traps

If a) occurs, we are definitley loosing the bit as per current code. If b)
happens then also the same situation.

Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
`e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
during normal runtime of kernel.

So it all depends on nesting level of traps supported by riscv kernel.

Illustraing `c + d` example, if kernel can take 2 nested level of traps with
first trap context having had the SUM bit set, but the second trap had it clear
and now comes the switch out of this thread, at this point if it were saved in
`thread_struct` SUM would be lost for the first trap.

Later when the thread gets switched in again, you would go in 2nd trap
context without SUM (because `thread_context` didnt had it saved), which is
fine. Although when 2nd trap context eventually performs `sret`, it will
go back to first trap context where SUM was expected to be set because it
touching a user memory.

A good example would be a syscall, so that's the first trap. SUM bit is set,
touched user memory and took a trap (page fault). Now code is in second trap
which should clear the SUM bit. Somewhere in memory manager stack, thread is
scheduled out and now `sstatus` is saved in `thread_struct`. This is only
serving current trap context needs and not the one where `SUM` needed to be
set.

We can support such nesting only by ensuring below

On trap entry do 
- save `status` in `pt_regs` or some other FILO data structure
- clear SUM (and other bits needed to be cleared)

On trap return do
- reload `status` from `pt_regs` or some FILO data structure

Quite analogous to what we do for SR_IE as well.

>
>That's why I don't really like Deepak's proposal below as it mixes 
>both and I find it tricky.
>
>I can't find a situation where saving/restoring the entire sstatus at 
>context-switch is a problem though, does anyone have such thing in 
>mind?
>
>Finally I understand that having another copy of sstatus in 
>thread_struct is not intuitive and we should, either explain why or 
>only store the SUM bit (like for sstatus.VS).
>
>Please continue the discussion as we need to find a solution that 
>pleases everyone soon :)
>
>Thanks all for jumping in,
>
>Alex
>
>
>>
>>
>>IMHO, the problem we are trying to solve in this patch is easily 
>>solvable in
>>below manner.
>>
>>
>>diff --git a/arch/riscv/include/asm/switch_to.h 
>>b/arch/riscv/include/asm/switch_to.h
>>index 0e71eb82f920..499d00a6fb67 100644
>>--- a/arch/riscv/include/asm/switch_to.h
>>+++ b/arch/riscv/include/asm/switch_to.h
>>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>task_struct *prev,
>>        fstate_restore(next, task_pt_regs(next));
>> }
>>
>>+static inline void __switch_to_status(struct task_struct *prev,
>>+                                  struct task_struct *next)
>>+{
>>+       struct pt_regs *regs;
>>+
>>+       /* save status */
>>+       regs = task_pt_regs(prev);
>>+       regs->status = csr_read(CSR_STATUS);
>>+
>>+       /* restore status */
>>+       regs = task_pt_regs(next);
>>+       csr_write(CSR_STATUS, regs->status);
>>+}
>>+
>> static __always_inline bool has_fpu(void)
>> {
>>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>@@ -115,6 +129,7 @@ do 
>>{                                                        \
>>        struct task_struct *__prev = (prev);            \
>>        struct task_struct *__next = (next);            \
>>        __set_prev_cpu(__prev->thread);                 \
>>+       __switch_to_status(__prev, __next)              \
>>        if (has_fpu())                                  \
>>                __switch_to_fpu(__prev, __next);        \
>>        if (has_vector() || has_xtheadvector())         \
>>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>index 8d25837a9384..a3b98c1be055 100644
>>--- a/arch/riscv/kernel/entry.S
>>+++ b/arch/riscv/kernel/entry.S
>>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>        REG_S x5,  PT_T0(sp)
>>        save_from_x6_to_x31
>>
>>-       /*
>>-        * Disable user-mode memory access as it should only be set 
>>in the
>>-        * actual user copy routines.
>>-        *
>>-        * Disable the FPU/Vector to detect illegal usage of 
>>floating point
>>-        * or vector in kernel space.
>>-        */
>>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>-
>>        REG_L s0, TASK_TI_USER_SP(tp)
>>-       csrrc s1, CSR_STATUS, t0
>>+       csrr s1, CSR_STATUS
>>        save_userssp s2, s1
>>        csrr s2, CSR_EPC
>>        csrr s3, CSR_TVAL
>>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>        REG_S s4, PT_CAUSE(sp)
>>        REG_S s5, PT_TP(sp)
>>
>>+       /*
>>+        * It is fresh trap entry. Disable user-mode memory access 
>>as it should only be set in the
>>+        * actual user copy routines.
>>+        *
>>+        * Disable the FPU/Vector to detect illegal usage of 
>>floating point
>>+        * or vector in kernel space.
>>+        */
>>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>+       csrrc s1, CSR_STATUS, t0
>>+
>>        /*
>>         * Set the scratch register to 0, so that if a recursive 
>>exception
>>         * occurs, the exception vector knows it came from the kernel
>>
>>
>>
>>During the time spent in kernel if sets SUM bit in status then, above
>>`__switch_to_status` will ensure that `status` will get saved for current
>>thread and restored for next thread.
>>
>>Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>reasons). It
>>represents non-linear change of control flow and thus whatever will 
>>execute next
>>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
>>modifies the flow by first saving the `status` on trap frame (thus 
>>if previous
>>trap frame had SUM=1, it will be saved and restored). And then it
>>unconditionally clears the SUM/FS/VS to ensure that this new trap 
>>context runs
>>without needing SUM=1. This ensures nesting of trap frames without 
>>diluting
>>security properties of SUM.
>>
>>>
>>>Thanks,
>>>Andy
>>>
>>>
>>>
>>>
>>>>
>>>>So my first question was why not to use `status` in `pt_regs`. 
>>>>It is granular
>>>>as it can get (it is available per thread context per trap basis).
>>>>
>>>>
>>>>I did ask Alex as well. I'll ping him again.
>>>>
>>>>>
>>>>>Does anyone else have any comment on this?
>>>>>
>>>>>>
>>>>>>>>    u32 riscv_v_flags;
>>>>>>>>    u32 vstate_ctrl;
>>>>>>>>    struct __riscv_v_ext_state vstate;
>>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>index 16490755304e..969c65b1fe41 100644
>>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>
>>>>>>_______________________________________________
>>>>>>linux-riscv mailing list
>>>>>>linux-riscv@lists.infradead.org
>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Ben Dooks http://www.codethink.co.uk/
>>>>>Senior Engineer                                Codethink - 
>>>>Providing Genius
>>>>>
>>>>>https://www.codethink.co.uk/privacy.html
>>>>
>>>>_______________________________________________
>>>>linux-riscv mailing list
>>>>linux-riscv@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-23 17:14                 ` Deepak Gupta
@ 2025-05-23 20:00                   ` Alexandre Ghiti
  2025-05-27 19:34                     ` Deepak Gupta
  2025-05-24 10:00                   ` Andy Chiu
  1 sibling, 1 reply; 32+ messages in thread
From: Alexandre Ghiti @ 2025-05-23 20:00 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Andy Chiu, Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley,
	charlie, jrtc27, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69


On 5/23/25 19:14, Deepak Gupta wrote:
> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>> Hi Andy, Deepak,
>>
>> On 5/23/25 00:43, Deepak Gupta wrote:
>>> On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>> On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>>>> wrote:
>>>>>
>>>>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>> On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>> I did give this patch my RB and had planned to come back to it 
>>>>>>> to see
>>>>>>> if it impacts cfi related patches. Thanks to alex for brinigng 
>>>>>>> to my
>>>>>>> attention again. As it stands today, it doesn't impact cfi related
>>>>>>> changes but I've some concerns.
>>>>>>>
>>>>>>> Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>>
>>>>>>> Couple of questions on introducing new `sstatus` field (inline)
>>>>>>>
>>>>>>> On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>
>>>>>>>>> When threads/tasks are switched we need to ensure the old 
>>>>> execution's
>>>>>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>> restored.
>>>>>>>>>
>>>>>>>>> The issue was seen under heavy load especially with the 
>>>>> syz-stress tool
>>>>>>>>> running, with crashes as follows in schedule_tail:
>>>>>>>>>
>>>>>>>>> Unable to handle kernel access to user memory without 
>>>>> uaccess routines
>>>>>>>>> at virtual address 000000002749f0d0
>>>>>>>>> Oops [#1]
>>>>>>>>> Modules linked in:
>>>>>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : 
>>>>>>>>> ffffffe025d17ec0
>>>>>>>>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>> t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>> 000000000000000f
>>>>>>>>> Call Trace:
>>>>>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>>>> kernel/sched/core.c:4264
>>>>>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>> Dumping ftrace buffer:
>>>>>>>>> (ftrace buffer empty)
>>>>>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>>
>>>>>>>>> The issue comes from the put_user() in schedule_tail
>>>>>>>>> (kernel/sched/core.c) doing the following:
>>>>>>>>>
>>>>>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>>       if (current->set_child_tid)
>>>>>>>>>               put_user(task_pid_vnr(current), 
>>>>> current->set_child_tid);
>>>>>>>>> ...
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> the put_user() macro causes the code sequence to come out as 
>>>>> follows:
>>>>>>>>>
>>>>>>>>> 1:    __enable_user_access()
>>>>>>>>> 2:    reg = task_pid_vnr(current);
>>>>>>>>> 3:    *current->set_child_tid = reg;
>>>>>>>>> 4:    __disable_user_access()
>>>>>>>>>
>>>>>>>>> The problem is that we may have a sleeping function as 
>>>>> argument which
>>>>>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>> evaluating the argument of the put_user() macro outside the 
>>>>> user-enabled
>>>>>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() 
>>>>> arg before
>>>>>>>>> enabling user access")"
>>>>>>>>>
>>>>>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() 
>>>>> macros and
>>>>>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>>>>>> must ensure code flow can go through switch_to() from within 
>>>>> a region of
>>>>>>>>> code with SR_SUM enabled and come back with SR_SUM still 
>>>>> enabled. This
>>>>>>>>> patch addresses the problem allowing future work to enable 
>>>>> full use of
>>>>>>>>> unsafe_get/put_XXX() macros without needing to take a CSR 
>>>>> bit flip cost
>>>>>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>>
>>>>>>>>> Reported-by: 
>>>>>>>>> syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>> ---
>>>>>>>>> arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>> arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>> 3 files changed, 14 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>> b/arch/riscv/include/ asm/processor.h
>>>>>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>>     struct __riscv_d_ext_state fstate;
>>>>>>>>>     unsigned long bad_cause;
>>>>>>>>>     unsigned long envcfg;
>>>>>>>>> +    unsigned long status;
>>>>>>>
>>>>>>> Do we really need a new member field in `thread_struct`. We 
>>>>> already have
>>>>>>> `sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>> situation
>>>>>>> for current thread. This gets saved and restored on trap entry 
>>>>> and exit.
>>>>>>>
>>>>>>> If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>> of which
>>>>>>> `status` to save to and pick from from future maintainibility
>>>>>>> purposes as the
>>>>>>> fields get introduced to this CSR.
>>>>>>>
>>>>>>> Why can't we access current trap frame's `sstatus` image in
>>>>>>> `__switch_to` to
>>>>>>> save and restore?
>>>>>>>
>>>>>>> Let me know if I am missing something obvious here. If there is a
>>>>>>> complication,
>>>>>>> I am missing here and we do end up using this member field, I would
>>>>>>> rename it
>>>>>>> to something like `status_kernel` to reflect that. So that future
>>>>>>> changes are
>>>>>>> cognizant of the fact that we have split `status`. One for kernel
>>>>>>> execution env
>>>>>>> per thread and one for controlling user execution env per thread.
>>>>>>
>>>>>> This is so long ago now I cannot remember if there was any 
>>>>>> sstatus in
>>>>>> the pt_regs field,
>>>>>
>>>>> FS/VS bits encode status of floating point and vector on 
>>>>> per-thread basis.
>>>>> So `status` has been part of `pt_regs` for quite a while.
>>>>>
>>>>>> and if kernel threads have the same context as their
>>>>>> userland parts.
>>>>>
>>>>> I didn't mean kernel thread. What I meant was kernel execution 
>>>>> environment
>>>>> per-thread. A userland thread does spend sometime in kernel and 
>>>>> kernel does
>>>>> things on its behalf. One of those thing is touching user memory 
>>>>> and that
>>>>> requires mucking with this CSR. So what I meant was are we 
>>>>> splitting `status`
>>>>> on per-thread basis for their time spent in user and kernel.
>>>>>
>>>>> Getting back to original question--
>>>>> As I said, each thread spends sometime in user or in kernel. 
>>>>> `status` in
>>>>> `pt_regs` is saved on trap entry and restored on trap exit. In a 
>>>>> sense,
>>>>> `status` field in `pt_regs` is reflecting execution status of the 
>>>>> thread on per
>>>>> trap basis. Introducing `status` in `thread_struct` creates a 
>>>>> confusion (if not
>>>>> for today, certainly for future) of which `status` to pick from 
>>>>> when we are
>>>>> doing save/restore.
>>>>
>>>> I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>> trap entries/return, adding another entry adds code complexity and
>>>> makes data inconsistent. But, perhaps we'd eventually need something
>>>> like this (I will explain why). Still, there might be a better
>>>> approach.
>>>>
>>>> Yes, we can always reflect pt_regs for sstatus. We all know that
>>>> pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>> point refers to "user's" pt_regs whenever it first enters kernel 
>>>> mode. Here
>>>> are reasons why SR_SUM here may or may not be properly tracked. First,
>>>> if this is a trap introduced context switch (such as interrupting in a
>>>> preemptible context after we manually enable user access in put_user),
>>>> then SR_SUM is saved somewhere in the kernel stack, and is not
>>>> reference-able with task_pt_reg during context switch. But we are safe
>>>> because the trap exit asm would help us restore the correct SR_SUM
>>>> back. However, if this is a self-initiating context switch (calling
>>>> into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>> causing this error.
>>>>
>>>> Preemptible Vector in the kernel mode also had this problem where a
>>>> self-initiating context switch loses the track of sstatus.vs. The way
>>>> I managed it is to track the VS bit at context switch time. However,
>>>> this bug shows that people are repeatedly facing the problem, and
>>>> maybe it suggests that we'd need a better way of managing sstatus
>>>> across context switches. Given the complex nature of this register,
>>>> which also touches the interrupt enable status, I don't think naively
>>>> saving/restoring the entire register is the way to go. Maybe the
>>>> variable deserves a more specific naming and documentation. And if
>>>> we'd need a centralized place for managing these statuses, then it
>>>> also has to take care of sstatus.VS.
>>
>>
>> Andy, thanks for the precise explanation of the problem :)
>>
>> So it took me some time but here are my thoughts on this. We should 
>> treat pt_regs and thread_struct differently as they do not represent 
>> the same thing:
>> - pt_regs represents the context of a thread when it takes a trap
>> - thread_struct represents a "kernel-induced" (or a "in-kernel") 
>> context not caused by traps
>
> Exactly they represent different context of execution. Trap represents a
> non-linear control flow change and thus a fresh start of execution 
> control
> flow into kernel while `kernel-induced` one's are again non-linear but
> fully a kernel/software construct.
>
> A fresh trapped execution context shouldn't have SUM set which is how 
> it is
> currently in kernel. This bit gets cleared in trap entry and `sstatus` 
> gets
> saved in `pt_regs` (including SR_IE) so that it could be restored 
> whenever
> `sret` happens.
>
> The problem we'are seeing here is two fold--
>
> 1) We don't want to set and clear when we are accessing array/string 
> for each
>    word. This is software problem and this entire series is addressing 
> it.
>
> 2) To avoid first problem we are optimizing the access to CSR by 
> setting it
>    once and clearing it once. But now we don't want to loose this bit 
> if there
>    were:
>
>     a) trap in between         b) kernel induced schedule out
>         c) a) followed by b)
>         d) a) followed by another a)
>         e) nested traps
>
> If a) occurs, we are definitley loosing the bit as per current code.


If a trap occurs while the SUM bit is set, the SUM bit will be saved in 
pt_regs and restored when we come back so we don't lose it when a) occurs.


> If b)
> happens then also the same situation.


Currently, we do lose it in that case indeed.


>
> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` 
> and
> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and 
> `d` happen
> during normal runtime of kernel.
>
> So it all depends on nesting level of traps supported by riscv kernel.
>
> Illustraing `c + d` example, if kernel can take 2 nested level of 
> traps with
> first trap context having had the SUM bit set, but the second trap had 
> it clear
> and now comes the switch out of this thread, at this point if it were 
> saved in
> `thread_struct` SUM would be lost for the first trap.
>
> Later when the thread gets switched in again, you would go in 2nd trap
> context without SUM (because `thread_context` didnt had it saved), 
> which is
> fine. Although when 2nd trap context eventually performs `sret`, it will
> go back to first trap context where SUM was expected to be set because it
> touching a user memory.
>
> A good example would be a syscall, so that's the first trap. SUM bit 
> is set,
> touched user memory and took a trap (page fault). Now code is in 
> second trap
> which should clear the SUM bit. Somewhere in memory manager stack, 
> thread is
> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
> serving current trap context needs and not the one where `SUM` needed 
> to be
> set.


Hmm to me we don't lose the SUM bit in case of a trap, only when eager 
schedule happens:

thread A
|
|-> syscall
       |
       SUM bit is set
       |
        -> page fault (trap)
             |
              sstatus with SUM bit set is saved on pt_regs
              SUM bit is cleared
             |
              -> eager schedule
                  |
                  -> we save SUM bit cleared in thread_struct
                      |
                      |
                       schedule thread B....
                      |
                      |
                     <- switch_to thread A again
                  |
                  we restore SUM bit cleared from thread_struct
                  |
                <- we resume execution of page fault trap
               |
               so we restore SUM bit saved on pt_regs which *has* SUM 
bit set
               |
             <- sret
           |
           SUM bit is set and we continue the first syscall.

So based on my wonderful ascii art, it works :) Or did I miss something?


>
> We can support such nesting only by ensuring below
>
> On trap entry do - save `status` in `pt_regs` or some other FILO data 
> structure
> - clear SUM (and other bits needed to be cleared)
>
> On trap return do
> - reload `status` from `pt_regs` or some FILO data structure
>
> Quite analogous to what we do for SR_IE as well.
>
>>
>> That's why I don't really like Deepak's proposal below as it mixes 
>> both and I find it tricky.
>>
>> I can't find a situation where saving/restoring the entire sstatus at 
>> context-switch is a problem though, does anyone have such thing in mind?
>>
>> Finally I understand that having another copy of sstatus in 
>> thread_struct is not intuitive and we should, either explain why or 
>> only store the SUM bit (like for sstatus.VS).
>>
>> Please continue the discussion as we need to find a solution that 
>> pleases everyone soon :)
>>
>> Thanks all for jumping in,
>>
>> Alex
>>
>>
>>>
>>>
>>> IMHO, the problem we are trying to solve in this patch is easily 
>>> solvable in
>>> below manner.
>>>
>>>
>>> diff --git a/arch/riscv/include/asm/switch_to.h 
>>> b/arch/riscv/include/asm/switch_to.h
>>> index 0e71eb82f920..499d00a6fb67 100644
>>> --- a/arch/riscv/include/asm/switch_to.h
>>> +++ b/arch/riscv/include/asm/switch_to.h
>>> @@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>> task_struct *prev,
>>>         fstate_restore(next, task_pt_regs(next));
>>>  }
>>>
>>> +static inline void __switch_to_status(struct task_struct *prev,
>>> +                                  struct task_struct *next)
>>> +{
>>> +       struct pt_regs *regs;
>>> +
>>> +       /* save status */
>>> +       regs = task_pt_regs(prev);
>>> +       regs->status = csr_read(CSR_STATUS);
>>> +
>>> +       /* restore status */
>>> +       regs = task_pt_regs(next);
>>> +       csr_write(CSR_STATUS, regs->status);
>>> +}
>>> +
>>>  static __always_inline bool has_fpu(void)
>>>  {
>>>         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>> @@ -115,6 +129,7 @@ do 
>>> {                                                        \
>>>         struct task_struct *__prev = (prev);            \
>>>         struct task_struct *__next = (next);            \
>>>         __set_prev_cpu(__prev->thread);                 \
>>> +       __switch_to_status(__prev, __next)              \
>>>         if (has_fpu())                                  \
>>>                 __switch_to_fpu(__prev, __next);        \
>>>         if (has_vector() || has_xtheadvector())         \
>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>> index 8d25837a9384..a3b98c1be055 100644
>>> --- a/arch/riscv/kernel/entry.S
>>> +++ b/arch/riscv/kernel/entry.S
>>> @@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>>         REG_S x5,  PT_T0(sp)
>>>         save_from_x6_to_x31
>>>
>>> -       /*
>>> -        * Disable user-mode memory access as it should only be set 
>>> in the
>>> -        * actual user copy routines.
>>> -        *
>>> -        * Disable the FPU/Vector to detect illegal usage of 
>>> floating point
>>> -        * or vector in kernel space.
>>> -        */
>>> -       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>> -
>>>         REG_L s0, TASK_TI_USER_SP(tp)
>>> -       csrrc s1, CSR_STATUS, t0
>>> +       csrr s1, CSR_STATUS
>>>         save_userssp s2, s1
>>>         csrr s2, CSR_EPC
>>>         csrr s3, CSR_TVAL
>>> @@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>>         REG_S s4, PT_CAUSE(sp)
>>>         REG_S s5, PT_TP(sp)
>>>
>>> +       /*
>>> +        * It is fresh trap entry. Disable user-mode memory access 
>>> as it should only be set in the
>>> +        * actual user copy routines.
>>> +        *
>>> +        * Disable the FPU/Vector to detect illegal usage of 
>>> floating point
>>> +        * or vector in kernel space.
>>> +        */
>>> +       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>> +       csrrc s1, CSR_STATUS, t0
>>> +
>>>         /*
>>>          * Set the scratch register to 0, so that if a recursive 
>>> exception
>>>          * occurs, the exception vector knows it came from the kernel
>>>
>>>
>>>
>>> During the time spent in kernel if sets SUM bit in status then, above
>>> `__switch_to_status` will ensure that `status` will get saved for 
>>> current
>>> thread and restored for next thread.
>>>
>>> Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>> reasons). It
>>> represents non-linear change of control flow and thus whatever will 
>>> execute next
>>> shouldn't need SUM/FS/VS unless it wants to set it). This patch 
>>> slightly
>>> modifies the flow by first saving the `status` on trap frame (thus 
>>> if previous
>>> trap frame had SUM=1, it will be saved and restored). And then it
>>> unconditionally clears the SUM/FS/VS to ensure that this new trap 
>>> context runs
>>> without needing SUM=1. This ensures nesting of trap frames without 
>>> diluting
>>> security properties of SUM.
>>>
>>>>
>>>> Thanks,
>>>> Andy
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> So my first question was why not to use `status` in `pt_regs`. It 
>>>>> is granular
>>>>> as it can get (it is available per thread context per trap basis).
>>>>>
>>>>>
>>>>> I did ask Alex as well. I'll ping him again.
>>>>>
>>>>>>
>>>>>> Does anyone else have any comment on this?
>>>>>>
>>>>>>>
>>>>>>>>>     u32 riscv_v_flags;
>>>>>>>>>     u32 vstate_ctrl;
>>>>>>>>>     struct __riscv_v_ext_state vstate;
>>>>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>> index 16490755304e..969c65b1fe41 100644
>>>>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> linux-riscv mailing list
>>>>>>> linux-riscv@lists.infradead.org
>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Ben Dooks http://www.codethink.co.uk/
>>>>>> Senior Engineer                                Codethink - 
>>>>> Providing Genius
>>>>>>
>>>>>> https://www.codethink.co.uk/privacy.html
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-23 17:14                 ` Deepak Gupta
  2025-05-23 20:00                   ` Alexandre Ghiti
@ 2025-05-24 10:00                   ` Andy Chiu
  2025-05-27 20:58                     ` Deepak Gupta
  1 sibling, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2025-05-24 10:00 UTC (permalink / raw)
  To: Deepak Gupta
  Cc: Alexandre Ghiti, Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley,
	charlie, jrtc27, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Sat, May 24, 2025 at 1:14 AM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
> >Hi Andy, Deepak,
> >
> >On 5/23/25 00:43, Deepak Gupta wrote:
> >>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
> >>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com>
> >>>wrote:
> >>>>
> >>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
> >>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
> >>>>>>I did give this patch my RB and had planned to come back to it to see
> >>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
> >>>>>>attention again. As it stands today, it doesn't impact cfi related
> >>>>>>changes but I've some concerns.
> >>>>>>
> >>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
> >>>>>>
> >>>>>>Couple of questions on introducing new `sstatus` field (inline)
> >>>>>>
> >>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
> >>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
> >>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>>>>>
> >>>>>>>>When threads/tasks are switched we need to ensure the old
> >>>>execution's
> >>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>>>>>>restored.
> >>>>>>>>
> >>>>>>>>The issue was seen under heavy load especially with the
> >>>>syz-stress tool
> >>>>>>>>running, with crashes as follows in schedule_tail:
> >>>>>>>>
> >>>>>>>>Unable to handle kernel access to user memory without
> >>>>uaccess routines
> >>>>>>>>at virtual address 000000002749f0d0
> >>>>>>>>Oops [#1]
> >>>>>>>>Modules linked in:
> >>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>>>>>>Hardware name: riscv-virtio,qemu (DT)
> >>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>>>>>>000000000000000f
> >>>>>>>>Call Trace:
> >>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
> >>>>kernel/sched/core.c:4264
> >>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>>>>>>Dumping ftrace buffer:
> >>>>>>>> (ftrace buffer empty)
> >>>>>>>>---[ end trace b5f8f9231dc87dda ]---
> >>>>>>>>
> >>>>>>>>The issue comes from the put_user() in schedule_tail
> >>>>>>>>(kernel/sched/core.c) doing the following:
> >>>>>>>>
> >>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>>>>>>{
> >>>>>>>>...
> >>>>>>>>      if (current->set_child_tid)
> >>>>>>>>              put_user(task_pid_vnr(current),
> >>>>current->set_child_tid);
> >>>>>>>>...
> >>>>>>>>}
> >>>>>>>>
> >>>>>>>>the put_user() macro causes the code sequence to come out as
> >>>>follows:
> >>>>>>>>
> >>>>>>>>1:    __enable_user_access()
> >>>>>>>>2:    reg = task_pid_vnr(current);
> >>>>>>>>3:    *current->set_child_tid = reg;
> >>>>>>>>4:    __disable_user_access()
> >>>>>>>>
> >>>>>>>>The problem is that we may have a sleeping function as
> >>>>argument which
> >>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
> >>>>>>>>evaluating the argument of the put_user() macro outside the
> >>>>user-enabled
> >>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user()
> >>>>arg before
> >>>>>>>>enabling user access")"
> >>>>>>>>
> >>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX()
> >>>>macros and
> >>>>>>>>to avoid the same issue we had with put_user() and sleeping
> >>>>functions we
> >>>>>>>>must ensure code flow can go through switch_to() from within
> >>>>a region of
> >>>>>>>>code with SR_SUM enabled and come back with SR_SUM still
> >>>>enabled. This
> >>>>>>>>patch addresses the problem allowing future work to enable
> >>>>full use of
> >>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR
> >>>>bit flip cost
> >>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
> >>>>>>>>
> >>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>>>>>>---
> >>>>>>>>arch/riscv/include/asm/processor.h | 1 +
> >>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>>>>>3 files changed, 14 insertions(+)
> >>>>>>>>
> >>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
> >>>>>>>>b/arch/riscv/include/ asm/processor.h
> >>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
> >>>>>>>>--- a/arch/riscv/include/asm/processor.h
> >>>>>>>>+++ b/arch/riscv/include/asm/processor.h
> >>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
> >>>>>>>>    struct __riscv_d_ext_state fstate;
> >>>>>>>>    unsigned long bad_cause;
> >>>>>>>>    unsigned long envcfg;
> >>>>>>>>+    unsigned long status;
> >>>>>>
> >>>>>>Do we really need a new member field in `thread_struct`. We
> >>>>already have
> >>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
> >>>>>>situation
> >>>>>>for current thread. This gets saved and restored on trap entry
> >>>>and exit.
> >>>>>>
> >>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
> >>>>>>of which
> >>>>>>`status` to save to and pick from from future maintainibility
> >>>>>>purposes as the
> >>>>>>fields get introduced to this CSR.
> >>>>>>
> >>>>>>Why can't we access current trap frame's `sstatus` image in
> >>>>>>`__switch_to` to
> >>>>>>save and restore?
> >>>>>>
> >>>>>>Let me know if I am missing something obvious here. If there is a
> >>>>>>complication,
> >>>>>>I am missing here and we do end up using this member field, I would
> >>>>>>rename it
> >>>>>>to something like `status_kernel` to reflect that. So that future
> >>>>>>changes are
> >>>>>>cognizant of the fact that we have split `status`. One for kernel
> >>>>>>execution env
> >>>>>>per thread and one for controlling user execution env per thread.
> >>>>>
> >>>>>This is so long ago now I cannot remember if there was any sstatus in
> >>>>>the pt_regs field,
> >>>>
> >>>>FS/VS bits encode status of floating point and vector on
> >>>>per-thread basis.
> >>>>So `status` has been part of `pt_regs` for quite a while.
> >>>>
> >>>>> and if kernel threads have the same context as their
> >>>>>userland parts.
> >>>>
> >>>>I didn't mean kernel thread. What I meant was kernel execution
> >>>>environment
> >>>>per-thread. A userland thread does spend sometime in kernel and
> >>>>kernel does
> >>>>things on its behalf. One of those thing is touching user memory
> >>>>and that
> >>>>requires mucking with this CSR. So what I meant was are we
> >>>>splitting `status`
> >>>>on per-thread basis for their time spent in user and kernel.
> >>>>
> >>>>Getting back to original question--
> >>>>As I said, each thread spends sometime in user or in kernel.
> >>>>`status` in
> >>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
> >>>>`status` field in `pt_regs` is reflecting execution status of
> >>>>the thread on per
> >>>>trap basis. Introducing `status` in `thread_struct` creates a
> >>>>confusion (if not
> >>>>for today, certainly for future) of which `status` to pick from
> >>>>when we are
> >>>>doing save/restore.
> >>>
> >>>I agree that it's a confusion. sstatus is already saved on pt_regs on
> >>>trap entries/return, adding another entry adds code complexity and
> >>>makes data inconsistent. But, perhaps we'd eventually need something
> >>>like this (I will explain why). Still, there might be a better
> >>>approach.
> >>>
> >>>Yes, we can always reflect pt_regs for sstatus. We all know that
> >>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
> >>>point refers to "user's" pt_regs whenever it first enters kernel
> >>>mode. Here
> >>>are reasons why SR_SUM here may or may not be properly tracked. First,
> >>>if this is a trap introduced context switch (such as interrupting in a
> >>>preemptible context after we manually enable user access in put_user),
> >>>then SR_SUM is saved somewhere in the kernel stack, and is not
> >>>reference-able with task_pt_reg during context switch. But we are safe
> >>>because the trap exit asm would help us restore the correct SR_SUM
> >>>back. However, if this is a self-initiating context switch (calling
> >>>into schedule()), then SR_SUM is not saved anywhere, and possibly
> >>>causing this error.
> >>>
> >>>Preemptible Vector in the kernel mode also had this problem where a
> >>>self-initiating context switch loses the track of sstatus.vs. The way
> >>>I managed it is to track the VS bit at context switch time. However,
> >>>this bug shows that people are repeatedly facing the problem, and
> >>>maybe it suggests that we'd need a better way of managing sstatus
> >>>across context switches. Given the complex nature of this register,
> >>>which also touches the interrupt enable status, I don't think naively
> >>>saving/restoring the entire register is the way to go. Maybe the
> >>>variable deserves a more specific naming and documentation. And if
> >>>we'd need a centralized place for managing these statuses, then it
> >>>also has to take care of sstatus.VS.
> >
> >
> >Andy, thanks for the precise explanation of the problem :)

Thanks for reading it Alex! It's my bad making it wordy

> >
> >So it took me some time but here are my thoughts on this. We should
> >treat pt_regs and thread_struct differently as they do not represent
> >the same thing:
> >- pt_regs represents the context of a thread when it takes a trap
> >- thread_struct represents a "kernel-induced" (or a "in-kernel")
> >context not caused by traps
>
> Exactly they represent different context of execution. Trap represents a
> non-linear control flow change and thus a fresh start of execution control
> flow into kernel while `kernel-induced` one's are again non-linear but
> fully a kernel/software construct.
>
> A fresh trapped execution context shouldn't have SUM set which is how it is
> currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
> saved in `pt_regs` (including SR_IE) so that it could be restored whenever
> `sret` happens.
>
> The problem we'are seeing here is two fold--
>
> 1) We don't want to set and clear when we are accessing array/string for each
>     word. This is software problem and this entire series is addressing it.
>
> 2) To avoid first problem we are optimizing the access to CSR by setting it
>     once and clearing it once. But now we don't want to loose this bit if there
>     were:
>
>         a) trap in between
>          b) kernel induced schedule out
>          c) a) followed by b)
>          d) a) followed by another a)
>          e) nested traps
>
> If a) occurs, we are definitley loosing the bit as per current code. If b)
> happens then also the same situation.
>
> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
> during normal runtime of kernel.

The trap entry/exit routine should always take care of trap cases,
whenever the kernel traps, SUM is saved to pt_regs somewhere in the
kernel stack. Yes, a task may be scheduled out after a trap, which is
common, but please be aware of that after scheduling back to the
original task, it then has to execute the trap exit and thus restore
the SUM before going back to the original code (where it receives an
exception).

>
> So it all depends on nesting level of traps supported by riscv kernel.
>
> Illustraing `c + d` example, if kernel can take 2 nested level of traps with
> first trap context having had the SUM bit set, but the second trap had it clear
> and now comes the switch out of this thread, at this point if it were saved in
> `thread_struct` SUM would be lost for the first trap.

No, the trap exit always restores the in-context (correct) sstatus back

>
> Later when the thread gets switched in again, you would go in 2nd trap
> context without SUM (because `thread_context` didnt had it saved), which is
> fine. Although when 2nd trap context eventually performs `sret`, it will
> go back to first trap context where SUM was expected to be set because it
> touching a user memory.
>
> A good example would be a syscall, so that's the first trap. SUM bit is set,
> touched user memory and took a trap (page fault). Now code is in second trap
> which should clear the SUM bit. Somewhere in memory manager stack, thread is
> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
> serving current trap context needs and not the one where `SUM` needed to be
> set.
>
> We can support such nesting only by ensuring below
>
> On trap entry do
> - save `status` in `pt_regs` or some other FILO data structure
> - clear SUM (and other bits needed to be cleared)
>
> On trap return do
> - reload `status` from `pt_regs` or some FILO data structure
>
> Quite analogous to what we do for SR_IE as well.

I am not sure if I understand what FILO is, but the current trap
handling routines do save/restore sstatus, which can be found at
handle_exception and ret_from_exception, as of today.

>
> >
> >That's why I don't really like Deepak's proposal below as it mixes
> >both and I find it tricky.
> >
> >I can't find a situation where saving/restoring the entire sstatus at
> >context-switch is a problem though, does anyone have such thing in
> >mind?

I agree that we should keep track of sstatus somewhere and be explicit
about what context it tracks.

sstatus not just tracks per-thread status, some are machine-wide.
Though __switch_to are always called with interrupt disabled, I think
conceptually interrupt enable status should not be saved/restore on a
per-thread basis.

Just FYI that some statuses are currently managed by individual
modules (such as the live sstatus.VS are managed in asm/vector.h). We
can discuss what is prefered. The final patch should take care of
this, or should document that VS is managed elsewhere, if we would
like a centralized sstatus management.

Personally, I would prefer a centralized sstatus management that only
touches SUM. This prevents duplicating condition matchings for vector
out to other places. But maybe there are better ways

Thanks,
Andy




> >
> >Finally I understand that having another copy of sstatus in
> >thread_struct is not intuitive and we should, either explain why or
> >only store the SUM bit (like for sstatus.VS).
> >
> >Please continue the discussion as we need to find a solution that
> >pleases everyone soon :)
> >
> >Thanks all for jumping in,
> >
> >Alex
> >
> >
> >>
> >>
> >>IMHO, the problem we are trying to solve in this patch is easily
> >>solvable in
> >>below manner.
> >>
> >>
> >>diff --git a/arch/riscv/include/asm/switch_to.h
> >>b/arch/riscv/include/asm/switch_to.h
> >>index 0e71eb82f920..499d00a6fb67 100644
> >>--- a/arch/riscv/include/asm/switch_to.h
> >>+++ b/arch/riscv/include/asm/switch_to.h
> >>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct
> >>task_struct *prev,
> >>        fstate_restore(next, task_pt_regs(next));
> >> }
> >>
> >>+static inline void __switch_to_status(struct task_struct *prev,
> >>+                                  struct task_struct *next)
> >>+{
> >>+       struct pt_regs *regs;
> >>+
> >>+       /* save status */
> >>+       regs = task_pt_regs(prev);
> >>+       regs->status = csr_read(CSR_STATUS);
> >>+
> >>+       /* restore status */
> >>+       regs = task_pt_regs(next);
> >>+       csr_write(CSR_STATUS, regs->status);
> >>+}
> >>+
> >> static __always_inline bool has_fpu(void)
> >> {
> >>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
> >>@@ -115,6 +129,7 @@ do
> >>{                                                        \
> >>        struct task_struct *__prev = (prev);            \
> >>        struct task_struct *__next = (next);            \
> >>        __set_prev_cpu(__prev->thread);                 \
> >>+       __switch_to_status(__prev, __next)              \
> >>        if (has_fpu())                                  \
> >>                __switch_to_fpu(__prev, __next);        \
> >>        if (has_vector() || has_xtheadvector())         \
> >>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> >>index 8d25837a9384..a3b98c1be055 100644
> >>--- a/arch/riscv/kernel/entry.S
> >>+++ b/arch/riscv/kernel/entry.S
> >>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
> >>        REG_S x5,  PT_T0(sp)
> >>        save_from_x6_to_x31
> >>
> >>-       /*
> >>-        * Disable user-mode memory access as it should only be set
> >>in the
> >>-        * actual user copy routines.
> >>-        *
> >>-        * Disable the FPU/Vector to detect illegal usage of
> >>floating point
> >>-        * or vector in kernel space.
> >>-        */
> >>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
> >>-
> >>        REG_L s0, TASK_TI_USER_SP(tp)
> >>-       csrrc s1, CSR_STATUS, t0
> >>+       csrr s1, CSR_STATUS
> >>        save_userssp s2, s1
> >>        csrr s2, CSR_EPC
> >>        csrr s3, CSR_TVAL
> >>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
> >>        REG_S s4, PT_CAUSE(sp)
> >>        REG_S s5, PT_TP(sp)
> >>
> >>+       /*
> >>+        * It is fresh trap entry. Disable user-mode memory access
> >>as it should only be set in the
> >>+        * actual user copy routines.
> >>+        *
> >>+        * Disable the FPU/Vector to detect illegal usage of
> >>floating point
> >>+        * or vector in kernel space.
> >>+        */
> >>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
> >>+       csrrc s1, CSR_STATUS, t0
> >>+
> >>        /*
> >>         * Set the scratch register to 0, so that if a recursive
> >>exception
> >>         * occurs, the exception vector knows it came from the kernel
> >>
> >>
> >>
> >>During the time spent in kernel if sets SUM bit in status then, above
> >>`__switch_to_status` will ensure that `status` will get saved for current
> >>thread and restored for next thread.
> >>
> >>Furthermore, current trap entry code clears FS/VS/SUM (for right
> >>reasons). It
> >>represents non-linear change of control flow and thus whatever will
> >>execute next
> >>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
> >>modifies the flow by first saving the `status` on trap frame (thus
> >>if previous
> >>trap frame had SUM=1, it will be saved and restored). And then it
> >>unconditionally clears the SUM/FS/VS to ensure that this new trap
> >>context runs
> >>without needing SUM=1. This ensures nesting of trap frames without
> >>diluting
> >>security properties of SUM.
> >>
> >>>
> >>>Thanks,
> >>>Andy
> >>>
> >>>
> >>>
> >>>
> >>>>
> >>>>So my first question was why not to use `status` in `pt_regs`.
> >>>>It is granular
> >>>>as it can get (it is available per thread context per trap basis).
> >>>>
> >>>>
> >>>>I did ask Alex as well. I'll ping him again.
> >>>>
> >>>>>
> >>>>>Does anyone else have any comment on this?
> >>>>>
> >>>>>>
> >>>>>>>>    u32 riscv_v_flags;
> >>>>>>>>    u32 vstate_ctrl;
> >>>>>>>>    struct __riscv_v_ext_state vstate;
> >>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>b/arch/riscv/kernel/asm- offsets.c
> >>>>>>>>index 16490755304e..969c65b1fe41 100644
> >>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>linux-riscv mailing list
> >>>>>>linux-riscv@lists.infradead.org
> >>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>>>>
> >>>>>
> >>>>>
> >>>>>--
> >>>>>Ben Dooks http://www.codethink.co.uk/
> >>>>>Senior Engineer                                Codethink -
> >>>>Providing Genius
> >>>>>
> >>>>>https://www.codethink.co.uk/privacy.html
> >>>>
> >>>>_______________________________________________
> >>>>linux-riscv mailing list
> >>>>linux-riscv@lists.infradead.org
> >>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>
> >>_______________________________________________
> >>linux-riscv mailing list
> >>linux-riscv@lists.infradead.org
> >>http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-23 20:00                   ` Alexandre Ghiti
@ 2025-05-27 19:34                     ` Deepak Gupta
  0 siblings, 0 replies; 32+ messages in thread
From: Deepak Gupta @ 2025-05-27 19:34 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Andy Chiu, Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley,
	charlie, jrtc27, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Fri, May 23, 2025 at 10:00:11PM +0200, Alexandre Ghiti wrote:
>
>On 5/23/25 19:14, Deepak Gupta wrote:
>>On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>>>Hi Andy, Deepak,
>>>
>>>On 5/23/25 00:43, Deepak Gupta wrote:
>>>>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta 
>>>>><debug@rivosinc.com> wrote:
>>>>>>
>>>>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>>>I did give this patch my RB and had planned to come back 
>>>>>>>>to it to see
>>>>>>>>if it impacts cfi related patches. Thanks to alex for 
>>>>>>>>brinigng to my
>>>>>>>>attention again. As it stands today, it doesn't impact cfi related
>>>>>>>>changes but I've some concerns.
>>>>>>>>
>>>>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>>>
>>>>>>>>Couple of questions on introducing new `sstatus` field (inline)
>>>>>>>>
>>>>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>>
>>>>>>>>>>When threads/tasks are switched we need to ensure 
>>>>>>>>>>the old
>>>>>>execution's
>>>>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>>>restored.
>>>>>>>>>>
>>>>>>>>>>The issue was seen under heavy load especially with 
>>>>>>>>>>the
>>>>>>syz-stress tool
>>>>>>>>>>running, with crashes as follows in schedule_tail:
>>>>>>>>>>
>>>>>>>>>>Unable to handle kernel access to user memory 
>>>>>>>>>>without
>>>>>>uaccess routines
>>>>>>>>>>at virtual address 000000002749f0d0
>>>>>>>>>>Oops [#1]
>>>>>>>>>>Modules linked in:
>>>>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : 
>>>>>>>>>>ffffffe025d17ec0
>>>>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>>>000000000000000f
>>>>>>>>>>Call Trace:
>>>>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
>>>>>>kernel/sched/core.c:4264
>>>>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>>>Dumping ftrace buffer:
>>>>>>>>>>(ftrace buffer empty)
>>>>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>>>
>>>>>>>>>>The issue comes from the put_user() in schedule_tail
>>>>>>>>>>(kernel/sched/core.c) doing the following:
>>>>>>>>>>
>>>>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>>>{
>>>>>>>>>>...
>>>>>>>>>>      if (current->set_child_tid)
>>>>>>>>>>              put_user(task_pid_vnr(current),
>>>>>>current->set_child_tid);
>>>>>>>>>>...
>>>>>>>>>>}
>>>>>>>>>>
>>>>>>>>>>the put_user() macro causes the code sequence to 
>>>>>>>>>>come out as
>>>>>>follows:
>>>>>>>>>>
>>>>>>>>>>1:    __enable_user_access()
>>>>>>>>>>2:    reg = task_pid_vnr(current);
>>>>>>>>>>3:    *current->set_child_tid = reg;
>>>>>>>>>>4:    __disable_user_access()
>>>>>>>>>>
>>>>>>>>>>The problem is that we may have a sleeping function 
>>>>>>>>>>as
>>>>>>argument which
>>>>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>>>evaluating the argument of the put_user() macro 
>>>>>>>>>>outside the
>>>>>>user-enabled
>>>>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate 
>>>>>>>>>>put_user()
>>>>>>arg before
>>>>>>>>>>enabling user access")"
>>>>>>>>>>
>>>>>>>>>>In order for riscv to take advantage of 
>>>>>>>>>>unsafe_get/put_XXX()
>>>>>>macros and
>>>>>>>>>>to avoid the same issue we had with put_user() and 
>>>>>>>>>>sleeping
>>>>>>functions we
>>>>>>>>>>must ensure code flow can go through switch_to() 
>>>>>>>>>>from within
>>>>>>a region of
>>>>>>>>>>code with SR_SUM enabled and come back with SR_SUM 
>>>>>>>>>>still
>>>>>>enabled. This
>>>>>>>>>>patch addresses the problem allowing future work to 
>>>>>>>>>>enable
>>>>>>full use of
>>>>>>>>>>unsafe_get/put_XXX() macros without needing to take 
>>>>>>>>>>a CSR
>>>>>>bit flip cost
>>>>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>>>
>>>>>>>>>>Reported-by: 
>>>>>>>>>>syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>>>---
>>>>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>>>3 files changed, 14 insertions(+)
>>>>>>>>>>
>>>>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>>>b/arch/riscv/include/ asm/processor.h
>>>>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>>>--- a/arch/riscv/include/asm/processor.h
>>>>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>>>    struct __riscv_d_ext_state fstate;
>>>>>>>>>>    unsigned long bad_cause;
>>>>>>>>>>    unsigned long envcfg;
>>>>>>>>>>+    unsigned long status;
>>>>>>>>
>>>>>>>>Do we really need a new member field in `thread_struct`. 
>>>>>>>>We
>>>>>>already have
>>>>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>>>situation
>>>>>>>>for current thread. This gets saved and restored on trap 
>>>>>>>>entry
>>>>>>and exit.
>>>>>>>>
>>>>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>>>of which
>>>>>>>>`status` to save to and pick from from future maintainibility
>>>>>>>>purposes as the
>>>>>>>>fields get introduced to this CSR.
>>>>>>>>
>>>>>>>>Why can't we access current trap frame's `sstatus` image in
>>>>>>>>`__switch_to` to
>>>>>>>>save and restore?
>>>>>>>>
>>>>>>>>Let me know if I am missing something obvious here. If there is a
>>>>>>>>complication,
>>>>>>>>I am missing here and we do end up using this member field, I would
>>>>>>>>rename it
>>>>>>>>to something like `status_kernel` to reflect that. So that future
>>>>>>>>changes are
>>>>>>>>cognizant of the fact that we have split `status`. One for kernel
>>>>>>>>execution env
>>>>>>>>per thread and one for controlling user execution env per thread.
>>>>>>>
>>>>>>>This is so long ago now I cannot remember if there was any 
>>>>>>>sstatus in
>>>>>>>the pt_regs field,
>>>>>>
>>>>>>FS/VS bits encode status of floating point and vector on 
>>>>>>per-thread basis.
>>>>>>So `status` has been part of `pt_regs` for quite a while.
>>>>>>
>>>>>>>and if kernel threads have the same context as their
>>>>>>>userland parts.
>>>>>>
>>>>>>I didn't mean kernel thread. What I meant was kernel 
>>>>>>execution environment
>>>>>>per-thread. A userland thread does spend sometime in kernel 
>>>>>>and kernel does
>>>>>>things on its behalf. One of those thing is touching user 
>>>>>>memory and that
>>>>>>requires mucking with this CSR. So what I meant was are we 
>>>>>>splitting `status`
>>>>>>on per-thread basis for their time spent in user and kernel.
>>>>>>
>>>>>>Getting back to original question--
>>>>>>As I said, each thread spends sometime in user or in kernel. 
>>>>>>`status` in
>>>>>>`pt_regs` is saved on trap entry and restored on trap exit. 
>>>>>>In a sense,
>>>>>>`status` field in `pt_regs` is reflecting execution status 
>>>>>>of the thread on per
>>>>>>trap basis. Introducing `status` in `thread_struct` creates 
>>>>>>a confusion (if not
>>>>>>for today, certainly for future) of which `status` to pick 
>>>>>>from when we are
>>>>>>doing save/restore.
>>>>>
>>>>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>>>trap entries/return, adding another entry adds code complexity and
>>>>>makes data inconsistent. But, perhaps we'd eventually need something
>>>>>like this (I will explain why). Still, there might be a better
>>>>>approach.
>>>>>
>>>>>Yes, we can always reflect pt_regs for sstatus. We all know that
>>>>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>>>point refers to "user's" pt_regs whenever it first enters 
>>>>>kernel mode. Here
>>>>>are reasons why SR_SUM here may or may not be properly tracked. First,
>>>>>if this is a trap introduced context switch (such as interrupting in a
>>>>>preemptible context after we manually enable user access in put_user),
>>>>>then SR_SUM is saved somewhere in the kernel stack, and is not
>>>>>reference-able with task_pt_reg during context switch. But we are safe
>>>>>because the trap exit asm would help us restore the correct SR_SUM
>>>>>back. However, if this is a self-initiating context switch (calling
>>>>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>>>causing this error.
>>>>>
>>>>>Preemptible Vector in the kernel mode also had this problem where a
>>>>>self-initiating context switch loses the track of sstatus.vs. The way
>>>>>I managed it is to track the VS bit at context switch time. However,
>>>>>this bug shows that people are repeatedly facing the problem, and
>>>>>maybe it suggests that we'd need a better way of managing sstatus
>>>>>across context switches. Given the complex nature of this register,
>>>>>which also touches the interrupt enable status, I don't think naively
>>>>>saving/restoring the entire register is the way to go. Maybe the
>>>>>variable deserves a more specific naming and documentation. And if
>>>>>we'd need a centralized place for managing these statuses, then it
>>>>>also has to take care of sstatus.VS.
>>>
>>>
>>>Andy, thanks for the precise explanation of the problem :)
>>>
>>>So it took me some time but here are my thoughts on this. We 
>>>should treat pt_regs and thread_struct differently as they do not 
>>>represent the same thing:
>>>- pt_regs represents the context of a thread when it takes a trap
>>>- thread_struct represents a "kernel-induced" (or a "in-kernel") 
>>>context not caused by traps
>>
>>Exactly they represent different context of execution. Trap represents a
>>non-linear control flow change and thus a fresh start of execution 
>>control
>>flow into kernel while `kernel-induced` one's are again non-linear but
>>fully a kernel/software construct.
>>
>>A fresh trapped execution context shouldn't have SUM set which is 
>>how it is
>>currently in kernel. This bit gets cleared in trap entry and 
>>`sstatus` gets
>>saved in `pt_regs` (including SR_IE) so that it could be restored 
>>whenever
>>`sret` happens.
>>
>>The problem we'are seeing here is two fold--
>>
>>1) We don't want to set and clear when we are accessing array/string 
>>for each
>>   word. This is software problem and this entire series is 
>>addressing it.
>>
>>2) To avoid first problem we are optimizing the access to CSR by 
>>setting it
>>   once and clearing it once. But now we don't want to loose this 
>>bit if there
>>   were:
>>
>>    a) trap in between         b) kernel induced schedule out
>>        c) a) followed by b)
>>        d) a) followed by another a)
>>        e) nested traps
>>
>>If a) occurs, we are definitley loosing the bit as per current code.
>
>
>If a trap occurs while the SUM bit is set, the SUM bit will be saved 
>in pt_regs and restored when we come back so we don't lose it when a) 
>occurs.

yes. My bad on that, Sorry about that.

a) is fine with current `status` save/restore on pt_regs on trap frame.

>
>
>>If b)
>>happens then also the same situation.
>
>
>Currently, we do lose it in that case indeed.
>
>
>>
>>Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, 
>>`d` and
>>`e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` 
>>and `d` happen
>>during normal runtime of kernel.
>>
>>So it all depends on nesting level of traps supported by riscv kernel.
>>
>>Illustraing `c + d` example, if kernel can take 2 nested level of 
>>traps with
>>first trap context having had the SUM bit set, but the second trap 
>>had it clear
>>and now comes the switch out of this thread, at this point if it 
>>were saved in
>>`thread_struct` SUM would be lost for the first trap.
>>
>>Later when the thread gets switched in again, you would go in 2nd trap
>>context without SUM (because `thread_context` didnt had it saved), 
>>which is
>>fine. Although when 2nd trap context eventually performs `sret`, it will
>>go back to first trap context where SUM was expected to be set because it
>>touching a user memory.
>>
>>A good example would be a syscall, so that's the first trap. SUM bit 
>>is set,
>>touched user memory and took a trap (page fault). Now code is in 
>>second trap
>>which should clear the SUM bit. Somewhere in memory manager stack, 
>>thread is
>>scheduled out and now `sstatus` is saved in `thread_struct`. This is only
>>serving current trap context needs and not the one where `SUM` 
>>needed to be
>>set.
>
>
>Hmm to me we don't lose the SUM bit in case of a trap, only when eager 
>schedule happens:
>
>thread A
>|
>|-> syscall
>      |
>      SUM bit is set
>      |
>       -> page fault (trap)
>            |
>             sstatus with SUM bit set is saved on pt_regs
>             SUM bit is cleared
>            |
>             -> eager schedule
>                 |
>                 -> we save SUM bit cleared in thread_struct
>                     |
>                     |
>                      schedule thread B....
>                     |
>                     |
>                    <- switch_to thread A again
>                 |
>                 we restore SUM bit cleared from thread_struct
>                 |
>               <- we resume execution of page fault trap
>              |
>              so we restore SUM bit saved on pt_regs which *has* SUM 
>bit set
>              |
>            <- sret
>          |
>          SUM bit is set and we continue the first syscall.
>
>So based on my wonderful ascii art, it works :) Or did I miss something?

Again I think I missed/confused it in my head when I was trying to ascertain
which `status` will be picked in which situation.

Two questions:

1) In this particular case, there won't be any yielding (kernel induced) between
    `set SUM` and `clear SUM`, right?


2) Will there be nesting of kernel induced events? If not, then I believe current
    patch is good enough. 


If I have to summarize--
- Nesting of `SUM` save/restore across traps is already served by trap entry/exit.
- Kernel induced control flow changes (scheduling) are not allowed between set
   and clear of SUM (and likely future status bits)
- If nesting of kernel induced events dont need to be supported and their
   invocation follow the 2nd rule, then having it in thread_struct makes sense.
   I would ideally call it something else to indicate intentionality.


Let me know if I got it right this time?


>
>
>>
>>We can support such nesting only by ensuring below
>>
>>On trap entry do - save `status` in `pt_regs` or some other FILO 
>>data structure
>>- clear SUM (and other bits needed to be cleared)
>>
>>On trap return do
>>- reload `status` from `pt_regs` or some FILO data structure
>>
>>Quite analogous to what we do for SR_IE as well.
>>
>>>
>>>That's why I don't really like Deepak's proposal below as it mixes 
>>>both and I find it tricky.
>>>
>>>I can't find a situation where saving/restoring the entire sstatus 
>>>at context-switch is a problem though, does anyone have such thing 
>>>in mind?
>>>
>>>Finally I understand that having another copy of sstatus in 
>>>thread_struct is not intuitive and we should, either explain why 
>>>or only store the SUM bit (like for sstatus.VS).
>>>
>>>Please continue the discussion as we need to find a solution that 
>>>pleases everyone soon :)
>>>
>>>Thanks all for jumping in,
>>>
>>>Alex
>>>
>>>
>>>>
>>>>
>>>>IMHO, the problem we are trying to solve in this patch is easily 
>>>>solvable in
>>>>below manner.
>>>>
>>>>
>>>>diff --git a/arch/riscv/include/asm/switch_to.h 
>>>>b/arch/riscv/include/asm/switch_to.h
>>>>index 0e71eb82f920..499d00a6fb67 100644
>>>>--- a/arch/riscv/include/asm/switch_to.h
>>>>+++ b/arch/riscv/include/asm/switch_to.h
>>>>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>>>task_struct *prev,
>>>>        fstate_restore(next, task_pt_regs(next));
>>>> }
>>>>
>>>>+static inline void __switch_to_status(struct task_struct *prev,
>>>>+                                  struct task_struct *next)
>>>>+{
>>>>+       struct pt_regs *regs;
>>>>+
>>>>+       /* save status */
>>>>+       regs = task_pt_regs(prev);
>>>>+       regs->status = csr_read(CSR_STATUS);
>>>>+
>>>>+       /* restore status */
>>>>+       regs = task_pt_regs(next);
>>>>+       csr_write(CSR_STATUS, regs->status);
>>>>+}
>>>>+
>>>> static __always_inline bool has_fpu(void)
>>>> {
>>>>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>>>@@ -115,6 +129,7 @@ do 
>>>>{                                                        \
>>>>        struct task_struct *__prev = (prev);            \
>>>>        struct task_struct *__next = (next);            \
>>>>        __set_prev_cpu(__prev->thread);                 \
>>>>+       __switch_to_status(__prev, __next)              \
>>>>        if (has_fpu())                                  \
>>>>                __switch_to_fpu(__prev, __next);        \
>>>>        if (has_vector() || has_xtheadvector())         \
>>>>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>index 8d25837a9384..a3b98c1be055 100644
>>>>--- a/arch/riscv/kernel/entry.S
>>>>+++ b/arch/riscv/kernel/entry.S
>>>>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>>>        REG_S x5,  PT_T0(sp)
>>>>        save_from_x6_to_x31
>>>>
>>>>-       /*
>>>>-        * Disable user-mode memory access as it should only be 
>>>>set in the
>>>>-        * actual user copy routines.
>>>>-        *
>>>>-        * Disable the FPU/Vector to detect illegal usage of 
>>>>floating point
>>>>-        * or vector in kernel space.
>>>>-        */
>>>>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>>>-
>>>>        REG_L s0, TASK_TI_USER_SP(tp)
>>>>-       csrrc s1, CSR_STATUS, t0
>>>>+       csrr s1, CSR_STATUS
>>>>        save_userssp s2, s1
>>>>        csrr s2, CSR_EPC
>>>>        csrr s3, CSR_TVAL
>>>>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>>>        REG_S s4, PT_CAUSE(sp)
>>>>        REG_S s5, PT_TP(sp)
>>>>
>>>>+       /*
>>>>+        * It is fresh trap entry. Disable user-mode memory 
>>>>access as it should only be set in the
>>>>+        * actual user copy routines.
>>>>+        *
>>>>+        * Disable the FPU/Vector to detect illegal usage of 
>>>>floating point
>>>>+        * or vector in kernel space.
>>>>+        */
>>>>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>>>+       csrrc s1, CSR_STATUS, t0
>>>>+
>>>>        /*
>>>>         * Set the scratch register to 0, so that if a recursive 
>>>>exception
>>>>         * occurs, the exception vector knows it came from the kernel
>>>>
>>>>
>>>>
>>>>During the time spent in kernel if sets SUM bit in status then, above
>>>>`__switch_to_status` will ensure that `status` will get saved 
>>>>for current
>>>>thread and restored for next thread.
>>>>
>>>>Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>>>reasons). It
>>>>represents non-linear change of control flow and thus whatever 
>>>>will execute next
>>>>shouldn't need SUM/FS/VS unless it wants to set it). This patch 
>>>>slightly
>>>>modifies the flow by first saving the `status` on trap frame 
>>>>(thus if previous
>>>>trap frame had SUM=1, it will be saved and restored). And then it
>>>>unconditionally clears the SUM/FS/VS to ensure that this new 
>>>>trap context runs
>>>>without needing SUM=1. This ensures nesting of trap frames 
>>>>without diluting
>>>>security properties of SUM.
>>>>
>>>>>
>>>>>Thanks,
>>>>>Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>So my first question was why not to use `status` in 
>>>>>>`pt_regs`. It is granular
>>>>>>as it can get (it is available per thread context per trap basis).
>>>>>>
>>>>>>
>>>>>>I did ask Alex as well. I'll ping him again.
>>>>>>
>>>>>>>
>>>>>>>Does anyone else have any comment on this?
>>>>>>>
>>>>>>>>
>>>>>>>>>>    u32 riscv_v_flags;
>>>>>>>>>>    u32 vstate_ctrl;
>>>>>>>>>>    struct __riscv_v_ext_state vstate;
>>>>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>>>index 16490755304e..969c65b1fe41 100644
>>>>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>linux-riscv mailing list
>>>>>>>>linux-riscv@lists.infradead.org
>>>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>-- 
>>>>>>>Ben Dooks http://www.codethink.co.uk/
>>>>>>>Senior Engineer                                Codethink -
>>>>>>Providing Genius
>>>>>>>
>>>>>>>https://www.codethink.co.uk/privacy.html
>>>>>>
>>>>>>_______________________________________________
>>>>>>linux-riscv mailing list
>>>>>>linux-riscv@lists.infradead.org
>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>>_______________________________________________
>>>>linux-riscv mailing list
>>>>linux-riscv@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
  2025-05-24 10:00                   ` Andy Chiu
@ 2025-05-27 20:58                     ` Deepak Gupta
  0 siblings, 0 replies; 32+ messages in thread
From: Deepak Gupta @ 2025-05-27 20:58 UTC (permalink / raw)
  To: Andy Chiu
  Cc: Alexandre Ghiti, Ben Dooks, Cyril Bur, palmer, aou, paul.walmsley,
	charlie, jrtc27, linux-riscv, linux-kernel, jszhang,
	syzbot+e74b94fe601ab9552d69

On Sat, May 24, 2025 at 06:00:00PM +0800, Andy Chiu wrote:
>On Sat, May 24, 2025 at 1:14 AM Deepak Gupta <debug@rivosinc.com> wrote:
>>
>> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>> >Hi Andy, Deepak,
>> >
>> >On 5/23/25 00:43, Deepak Gupta wrote:
>> >>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>> >>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com>
>> >>>wrote:
>> >>>>
>> >>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>> >>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>> >>>>>>I did give this patch my RB and had planned to come back to it to see
>> >>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>> >>>>>>attention again. As it stands today, it doesn't impact cfi related
>> >>>>>>changes but I've some concerns.
>> >>>>>>
>> >>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>> >>>>>>
>> >>>>>>Couple of questions on introducing new `sstatus` field (inline)
>> >>>>>>
>> >>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> >>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> >>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>>>>>
>> >>>>>>>>When threads/tasks are switched we need to ensure the old
>> >>>>execution's
>> >>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>> >>>>>>>>restored.
>> >>>>>>>>
>> >>>>>>>>The issue was seen under heavy load especially with the
>> >>>>syz-stress tool
>> >>>>>>>>running, with crashes as follows in schedule_tail:
>> >>>>>>>>
>> >>>>>>>>Unable to handle kernel access to user memory without
>> >>>>uaccess routines
>> >>>>>>>>at virtual address 000000002749f0d0
>> >>>>>>>>Oops [#1]
>> >>>>>>>>Modules linked in:
>> >>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> >>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> >>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>> >>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> >>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> >>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> >>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> >>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> >>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> >>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> >>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> >>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> >>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> >>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> >>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> >>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>> >>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> >>>>>>>>000000000000000f
>> >>>>>>>>Call Trace:
>> >>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
>> >>>>kernel/sched/core.c:4264
>> >>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>> >>>>>>>>Dumping ftrace buffer:
>> >>>>>>>> (ftrace buffer empty)
>> >>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>> >>>>>>>>
>> >>>>>>>>The issue comes from the put_user() in schedule_tail
>> >>>>>>>>(kernel/sched/core.c) doing the following:
>> >>>>>>>>
>> >>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> >>>>>>>>{
>> >>>>>>>>...
>> >>>>>>>>      if (current->set_child_tid)
>> >>>>>>>>              put_user(task_pid_vnr(current),
>> >>>>current->set_child_tid);
>> >>>>>>>>...
>> >>>>>>>>}
>> >>>>>>>>
>> >>>>>>>>the put_user() macro causes the code sequence to come out as
>> >>>>follows:
>> >>>>>>>>
>> >>>>>>>>1:    __enable_user_access()
>> >>>>>>>>2:    reg = task_pid_vnr(current);
>> >>>>>>>>3:    *current->set_child_tid = reg;
>> >>>>>>>>4:    __disable_user_access()
>> >>>>>>>>
>> >>>>>>>>The problem is that we may have a sleeping function as
>> >>>>argument which
>> >>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>> >>>>>>>>evaluating the argument of the put_user() macro outside the
>> >>>>user-enabled
>> >>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user()
>> >>>>arg before
>> >>>>>>>>enabling user access")"
>> >>>>>>>>
>> >>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX()
>> >>>>macros and
>> >>>>>>>>to avoid the same issue we had with put_user() and sleeping
>> >>>>functions we
>> >>>>>>>>must ensure code flow can go through switch_to() from within
>> >>>>a region of
>> >>>>>>>>code with SR_SUM enabled and come back with SR_SUM still
>> >>>>enabled. This
>> >>>>>>>>patch addresses the problem allowing future work to enable
>> >>>>full use of
>> >>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR
>> >>>>bit flip cost
>> >>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>> >>>>>>>>
>> >>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> >>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> >>>>>>>>---
>> >>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>> >>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> >>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>> >>>>>>>>3 files changed, 14 insertions(+)
>> >>>>>>>>
>> >>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>> >>>>>>>>b/arch/riscv/include/ asm/processor.h
>> >>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>> >>>>>>>>--- a/arch/riscv/include/asm/processor.h
>> >>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>> >>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>> >>>>>>>>    struct __riscv_d_ext_state fstate;
>> >>>>>>>>    unsigned long bad_cause;
>> >>>>>>>>    unsigned long envcfg;
>> >>>>>>>>+    unsigned long status;
>> >>>>>>
>> >>>>>>Do we really need a new member field in `thread_struct`. We
>> >>>>already have
>> >>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>> >>>>>>situation
>> >>>>>>for current thread. This gets saved and restored on trap entry
>> >>>>and exit.
>> >>>>>>
>> >>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>> >>>>>>of which
>> >>>>>>`status` to save to and pick from from future maintainibility
>> >>>>>>purposes as the
>> >>>>>>fields get introduced to this CSR.
>> >>>>>>
>> >>>>>>Why can't we access current trap frame's `sstatus` image in
>> >>>>>>`__switch_to` to
>> >>>>>>save and restore?
>> >>>>>>
>> >>>>>>Let me know if I am missing something obvious here. If there is a
>> >>>>>>complication,
>> >>>>>>I am missing here and we do end up using this member field, I would
>> >>>>>>rename it
>> >>>>>>to something like `status_kernel` to reflect that. So that future
>> >>>>>>changes are
>> >>>>>>cognizant of the fact that we have split `status`. One for kernel
>> >>>>>>execution env
>> >>>>>>per thread and one for controlling user execution env per thread.
>> >>>>>
>> >>>>>This is so long ago now I cannot remember if there was any sstatus in
>> >>>>>the pt_regs field,
>> >>>>
>> >>>>FS/VS bits encode status of floating point and vector on
>> >>>>per-thread basis.
>> >>>>So `status` has been part of `pt_regs` for quite a while.
>> >>>>
>> >>>>> and if kernel threads have the same context as their
>> >>>>>userland parts.
>> >>>>
>> >>>>I didn't mean kernel thread. What I meant was kernel execution
>> >>>>environment
>> >>>>per-thread. A userland thread does spend sometime in kernel and
>> >>>>kernel does
>> >>>>things on its behalf. One of those thing is touching user memory
>> >>>>and that
>> >>>>requires mucking with this CSR. So what I meant was are we
>> >>>>splitting `status`
>> >>>>on per-thread basis for their time spent in user and kernel.
>> >>>>
>> >>>>Getting back to original question--
>> >>>>As I said, each thread spends sometime in user or in kernel.
>> >>>>`status` in
>> >>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>> >>>>`status` field in `pt_regs` is reflecting execution status of
>> >>>>the thread on per
>> >>>>trap basis. Introducing `status` in `thread_struct` creates a
>> >>>>confusion (if not
>> >>>>for today, certainly for future) of which `status` to pick from
>> >>>>when we are
>> >>>>doing save/restore.
>> >>>
>> >>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>> >>>trap entries/return, adding another entry adds code complexity and
>> >>>makes data inconsistent. But, perhaps we'd eventually need something
>> >>>like this (I will explain why). Still, there might be a better
>> >>>approach.
>> >>>
>> >>>Yes, we can always reflect pt_regs for sstatus. We all know that
>> >>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>> >>>point refers to "user's" pt_regs whenever it first enters kernel
>> >>>mode. Here
>> >>>are reasons why SR_SUM here may or may not be properly tracked. First,
>> >>>if this is a trap introduced context switch (such as interrupting in a
>> >>>preemptible context after we manually enable user access in put_user),
>> >>>then SR_SUM is saved somewhere in the kernel stack, and is not
>> >>>reference-able with task_pt_reg during context switch. But we are safe
>> >>>because the trap exit asm would help us restore the correct SR_SUM
>> >>>back. However, if this is a self-initiating context switch (calling
>> >>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>> >>>causing this error.
>> >>>
>> >>>Preemptible Vector in the kernel mode also had this problem where a
>> >>>self-initiating context switch loses the track of sstatus.vs. The way
>> >>>I managed it is to track the VS bit at context switch time. However,
>> >>>this bug shows that people are repeatedly facing the problem, and
>> >>>maybe it suggests that we'd need a better way of managing sstatus
>> >>>across context switches. Given the complex nature of this register,
>> >>>which also touches the interrupt enable status, I don't think naively
>> >>>saving/restoring the entire register is the way to go. Maybe the
>> >>>variable deserves a more specific naming and documentation. And if
>> >>>we'd need a centralized place for managing these statuses, then it
>> >>>also has to take care of sstatus.VS.
>> >
>> >
>> >Andy, thanks for the precise explanation of the problem :)
>
>Thanks for reading it Alex! It's my bad making it wordy
>
>> >
>> >So it took me some time but here are my thoughts on this. We should
>> >treat pt_regs and thread_struct differently as they do not represent
>> >the same thing:
>> >- pt_regs represents the context of a thread when it takes a trap
>> >- thread_struct represents a "kernel-induced" (or a "in-kernel")
>> >context not caused by traps
>>
>> Exactly they represent different context of execution. Trap represents a
>> non-linear control flow change and thus a fresh start of execution control
>> flow into kernel while `kernel-induced` one's are again non-linear but
>> fully a kernel/software construct.
>>
>> A fresh trapped execution context shouldn't have SUM set which is how it is
>> currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
>> saved in `pt_regs` (including SR_IE) so that it could be restored whenever
>> `sret` happens.
>>
>> The problem we'are seeing here is two fold--
>>
>> 1) We don't want to set and clear when we are accessing array/string for each
>>     word. This is software problem and this entire series is addressing it.
>>
>> 2) To avoid first problem we are optimizing the access to CSR by setting it
>>     once and clearing it once. But now we don't want to loose this bit if there
>>     were:
>>
>>         a) trap in between
>>          b) kernel induced schedule out
>>          c) a) followed by b)
>>          d) a) followed by another a)
>>          e) nested traps
>>
>> If a) occurs, we are definitley loosing the bit as per current code. If b)
>> happens then also the same situation.
>>
>> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
>> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
>> during normal runtime of kernel.
>
>The trap entry/exit routine should always take care of trap cases,
>whenever the kernel traps, SUM is saved to pt_regs somewhere in the
>kernel stack. Yes, a task may be scheduled out after a trap, which is
>common, but please be aware of that after scheduling back to the
>original task, it then has to execute the trap exit and thus restore
>the SUM before going back to the original code (where it receives an
>exception).

Yes you are right. Thanks for correcting me.

As I mentioned in another fork of the thread. The nesting of traps is taken
care of by trap entry/exit. 
It's all about kernel induced event then.

Is there nesting of kernel induced event?
If there is no nesting then sure a field in `thread_struct` is fine.
But then in that case save/restore is in `pt_regs` is also fine and keep
a single image which truly represents current context and trap together.

>
>>
>> So it all depends on nesting level of traps supported by riscv kernel.
>>
>> Illustraing `c + d` example, if kernel can take 2 nested level of traps with
>> first trap context having had the SUM bit set, but the second trap had it clear
>> and now comes the switch out of this thread, at this point if it were saved in
>> `thread_struct` SUM would be lost for the first trap.
>
>No, the trap exit always restores the in-context (correct) sstatus back
>
>>
>> Later when the thread gets switched in again, you would go in 2nd trap
>> context without SUM (because `thread_context` didnt had it saved), which is
>> fine. Although when 2nd trap context eventually performs `sret`, it will
>> go back to first trap context where SUM was expected to be set because it
>> touching a user memory.
>>
>> A good example would be a syscall, so that's the first trap. SUM bit is set,
>> touched user memory and took a trap (page fault). Now code is in second trap
>> which should clear the SUM bit. Somewhere in memory manager stack, thread is
>> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
>> serving current trap context needs and not the one where `SUM` needed to be
>> set.
>>
>> We can support such nesting only by ensuring below
>>
>> On trap entry do
>> - save `status` in `pt_regs` or some other FILO data structure
>> - clear SUM (and other bits needed to be cleared)
>>
>> On trap return do
>> - reload `status` from `pt_regs` or some FILO data structure
>>
>> Quite analogous to what we do for SR_IE as well.
>
>I am not sure if I understand what FILO is, but the current trap
>handling routines do save/restore sstatus, which can be found at
>handle_exception and ret_from_exception, as of today.
>
>>
>> >
>> >That's why I don't really like Deepak's proposal below as it mixes
>> >both and I find it tricky.
>> >
>> >I can't find a situation where saving/restoring the entire sstatus at
>> >context-switch is a problem though, does anyone have such thing in
>> >mind?
>
>I agree that we should keep track of sstatus somewhere and be explicit
>about what context it tracks.
>
>sstatus not just tracks per-thread status, some are machine-wide.
>Though __switch_to are always called with interrupt disabled, I think
>conceptually interrupt enable status should not be saved/restore on a
>per-thread basis.
>
>Just FYI that some statuses are currently managed by individual
>modules (such as the live sstatus.VS are managed in asm/vector.h). We
>can discuss what is prefered. The final patch should take care of
>this, or should document that VS is managed elsewhere, if we would
>like a centralized sstatus management.
>
>Personally, I would prefer a centralized sstatus management that only
>touches SUM. This prevents duplicating condition matchings for vector
>out to other places. But maybe there are better ways
>
>Thanks,
>Andy
>
>
>
>
>> >
>> >Finally I understand that having another copy of sstatus in
>> >thread_struct is not intuitive and we should, either explain why or
>> >only store the SUM bit (like for sstatus.VS).
>> >
>> >Please continue the discussion as we need to find a solution that
>> >pleases everyone soon :)
>> >
>> >Thanks all for jumping in,
>> >
>> >Alex
>> >
>> >
>> >>
>> >>
>> >>IMHO, the problem we are trying to solve in this patch is easily
>> >>solvable in
>> >>below manner.
>> >>
>> >>
>> >>diff --git a/arch/riscv/include/asm/switch_to.h
>> >>b/arch/riscv/include/asm/switch_to.h
>> >>index 0e71eb82f920..499d00a6fb67 100644
>> >>--- a/arch/riscv/include/asm/switch_to.h
>> >>+++ b/arch/riscv/include/asm/switch_to.h
>> >>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct
>> >>task_struct *prev,
>> >>        fstate_restore(next, task_pt_regs(next));
>> >> }
>> >>
>> >>+static inline void __switch_to_status(struct task_struct *prev,
>> >>+                                  struct task_struct *next)
>> >>+{
>> >>+       struct pt_regs *regs;
>> >>+
>> >>+       /* save status */
>> >>+       regs = task_pt_regs(prev);
>> >>+       regs->status = csr_read(CSR_STATUS);
>> >>+
>> >>+       /* restore status */
>> >>+       regs = task_pt_regs(next);
>> >>+       csr_write(CSR_STATUS, regs->status);
>> >>+}
>> >>+
>> >> static __always_inline bool has_fpu(void)
>> >> {
>> >>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>> >>@@ -115,6 +129,7 @@ do
>> >>{                                                        \
>> >>        struct task_struct *__prev = (prev);            \
>> >>        struct task_struct *__next = (next);            \
>> >>        __set_prev_cpu(__prev->thread);                 \
>> >>+       __switch_to_status(__prev, __next)              \
>> >>        if (has_fpu())                                  \
>> >>                __switch_to_fpu(__prev, __next);        \
>> >>        if (has_vector() || has_xtheadvector())         \
>> >>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> >>index 8d25837a9384..a3b98c1be055 100644
>> >>--- a/arch/riscv/kernel/entry.S
>> >>+++ b/arch/riscv/kernel/entry.S
>> >>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>> >>        REG_S x5,  PT_T0(sp)
>> >>        save_from_x6_to_x31
>> >>
>> >>-       /*
>> >>-        * Disable user-mode memory access as it should only be set
>> >>in the
>> >>-        * actual user copy routines.
>> >>-        *
>> >>-        * Disable the FPU/Vector to detect illegal usage of
>> >>floating point
>> >>-        * or vector in kernel space.
>> >>-        */
>> >>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>> >>-
>> >>        REG_L s0, TASK_TI_USER_SP(tp)
>> >>-       csrrc s1, CSR_STATUS, t0
>> >>+       csrr s1, CSR_STATUS
>> >>        save_userssp s2, s1
>> >>        csrr s2, CSR_EPC
>> >>        csrr s3, CSR_TVAL
>> >>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>> >>        REG_S s4, PT_CAUSE(sp)
>> >>        REG_S s5, PT_TP(sp)
>> >>
>> >>+       /*
>> >>+        * It is fresh trap entry. Disable user-mode memory access
>> >>as it should only be set in the
>> >>+        * actual user copy routines.
>> >>+        *
>> >>+        * Disable the FPU/Vector to detect illegal usage of
>> >>floating point
>> >>+        * or vector in kernel space.
>> >>+        */
>> >>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>> >>+       csrrc s1, CSR_STATUS, t0
>> >>+
>> >>        /*
>> >>         * Set the scratch register to 0, so that if a recursive
>> >>exception
>> >>         * occurs, the exception vector knows it came from the kernel
>> >>
>> >>
>> >>
>> >>During the time spent in kernel if sets SUM bit in status then, above
>> >>`__switch_to_status` will ensure that `status` will get saved for current
>> >>thread and restored for next thread.
>> >>
>> >>Furthermore, current trap entry code clears FS/VS/SUM (for right
>> >>reasons). It
>> >>represents non-linear change of control flow and thus whatever will
>> >>execute next
>> >>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
>> >>modifies the flow by first saving the `status` on trap frame (thus
>> >>if previous
>> >>trap frame had SUM=1, it will be saved and restored). And then it
>> >>unconditionally clears the SUM/FS/VS to ensure that this new trap
>> >>context runs
>> >>without needing SUM=1. This ensures nesting of trap frames without
>> >>diluting
>> >>security properties of SUM.
>> >>
>> >>>
>> >>>Thanks,
>> >>>Andy
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>>So my first question was why not to use `status` in `pt_regs`.
>> >>>>It is granular
>> >>>>as it can get (it is available per thread context per trap basis).
>> >>>>
>> >>>>
>> >>>>I did ask Alex as well. I'll ping him again.
>> >>>>
>> >>>>>
>> >>>>>Does anyone else have any comment on this?
>> >>>>>
>> >>>>>>
>> >>>>>>>>    u32 riscv_v_flags;
>> >>>>>>>>    u32 vstate_ctrl;
>> >>>>>>>>    struct __riscv_v_ext_state vstate;
>> >>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>> >>>>>>>>index 16490755304e..969c65b1fe41 100644
>> >>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>> >>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>> >>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>> >>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> >>>>>>
>> >>>>>>_______________________________________________
>> >>>>>>linux-riscv mailing list
>> >>>>>>linux-riscv@lists.infradead.org
>> >>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>--
>> >>>>>Ben Dooks http://www.codethink.co.uk/
>> >>>>>Senior Engineer                                Codethink -
>> >>>>Providing Genius
>> >>>>>
>> >>>>>https://www.codethink.co.uk/privacy.html
>> >>>>
>> >>>>_______________________________________________
>> >>>>linux-riscv mailing list
>> >>>>linux-riscv@lists.infradead.org
>> >>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>
>> >>_______________________________________________
>> >>linux-riscv mailing list
>> >>linux-riscv@lists.infradead.org
>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-05-27 20:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10  7:05 [PATCH v6 0/5] riscv: uaccess: optimisations Cyril Bur
2025-04-10  7:05 ` [PATCH v6 1/5] riscv: save the SR_SUM status over switches Cyril Bur
2025-04-22 10:22   ` Alexandre Ghiti
2025-05-21  8:26     ` Ben Dooks
2025-05-21 13:38       ` Samuel Holland
2025-05-21 14:30         ` Alexandre Ghiti
2025-05-21 14:45           ` Cyril Bur
2025-05-22 16:15           ` [EXT] " Cyril Bur
2025-05-22 17:40           ` Andy Chiu
2025-05-22 20:03             ` Ben Dooks
2025-04-22 23:01   ` Deepak Gupta
2025-04-23  6:44     ` Alexandre Ghiti
2025-05-20 16:49     ` Deepak Gupta
2025-05-22  6:23       ` Ben Dooks
2025-05-22 14:49         ` Deepak Gupta
2025-05-22 17:42           ` Andy Chiu
2025-05-22 22:43             ` Deepak Gupta
2025-05-23 12:22               ` Alexandre Ghiti
2025-05-23 17:14                 ` Deepak Gupta
2025-05-23 20:00                   ` Alexandre Ghiti
2025-05-27 19:34                     ` Deepak Gupta
2025-05-24 10:00                   ` Andy Chiu
2025-05-27 20:58                     ` Deepak Gupta
2025-04-10  7:05 ` [PATCH v6 2/5] riscv: implement user_access_begin() and families Cyril Bur
2025-04-22 10:26   ` Alexandre Ghiti
2025-04-10  7:05 ` [PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user() Cyril Bur
2025-04-22 12:10   ` Alexandre Ghiti
2025-04-10  7:05 ` [PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user() Cyril Bur
2025-04-22 10:36   ` Alexandre Ghiti
2025-04-10  7:05 ` [PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user() Cyril Bur
2025-04-22 12:19   ` Alexandre Ghiti
2025-05-09 17:30 ` [PATCH v6 0/5] riscv: uaccess: optimisations patchwork-bot+linux-riscv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).