* [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode()
@ 2025-01-23 19:14 Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins
Similar to commit 221a164035fd ("entry: Move
syscall_enter_from_user_mode() to header file"), move
syscall_exit_to_user_mode() to the header file as well.
Testing was done with the byte-unixbench [1] syscall benchmark (which
calls getpid) and QEMU. On riscv I measured a 7.09246% improvement, on
x86 a 2.98843% improvement, on loongarch a 6.07954% improvement, and on
s390 a 11.1328% improvement.
Since this is on QEMU, I know these numbers are not perfect, but they
show a trend of general improvement across all architectures that use
the generic entry code.
[1] https://github.com/kdlucas/byte-unixbench
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v2:
- Fixup compilation issues for loongarch
- Fixup compilation issues with CONFIG_CONTEXT_TRACKING_USER
- Link to v1: https://lore.kernel.org/r/20250122-riscv_optimize_entry-v1-0-4ee95559cfd0@rivosinc.com
---
Charlie Jenkins (4):
riscv: entry: Convert ret_from_fork() to C
riscv: entry: Split ret_from_fork() into user and kernel
loongarch: entry: Migrate ret_from_fork() to C
entry: Inline syscall_exit_to_user_mode()
arch/loongarch/include/asm/asm-prototypes.h | 5 +++
arch/loongarch/include/asm/switch_to.h | 8 +++++
arch/loongarch/kernel/entry.S | 22 ++++++-------
arch/loongarch/kernel/process.c | 34 ++++++++++++++++----
arch/riscv/include/asm/asm-prototypes.h | 2 ++
arch/riscv/kernel/entry.S | 20 +++++++-----
arch/riscv/kernel/process.c | 21 +++++++++++--
include/linux/entry-common.h | 43 +++++++++++++++++++++++--
kernel/entry/common.c | 49 +----------------------------
9 files changed, 125 insertions(+), 79 deletions(-)
---
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
change-id: 20240402-riscv_optimize_entry-583843420325
--
- Charlie
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
2025-01-24 13:14 ` Brian Gerst
2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins
Move the main section of ret_from_fork() to C to allow inlining of
syscall_exit_to_user_mode().
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/kernel/entry.S | 15 ++++++---------
arch/riscv/kernel/process.c | 14 ++++++++++++--
3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
DECLARE_DO_ERROR_INFO(do_trap_break);
+asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
asmlinkage void handle_bad_stack(struct pt_regs *regs);
asmlinkage void do_page_fault(struct pt_regs *regs);
asmlinkage void do_irq(struct pt_regs *regs);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
ASM_NOKPROBE(handle_kernel_stack_overflow)
#endif
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
call schedule_tail
- beqz s0, 1f /* not from kernel thread */
- /* Call fn(arg) */
- move a0, s1
- jalr s0
-1:
- move a0, sp /* pt_regs */
- call syscall_exit_to_user_mode
+ move a0, s1 /* fn */
+ move a1, s0 /* fn_arg */
+ move a2, sp /* pt_regs */
+ call ret_from_fork
j ret_from_exception
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
#ifdef CONFIG_IRQ_STACKS
/*
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -17,7 +17,9 @@
#include <linux/ptrace.h>
#include <linux/uaccess.h>
#include <linux/personality.h>
+#include <linux/entry-common.h>
+#include <asm/asm-prototypes.h>
#include <asm/unistd.h>
#include <asm/processor.h>
#include <asm/csr.h>
@@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
EXPORT_SYMBOL(__stack_chk_guard);
#endif
-extern asmlinkage void ret_from_fork(void);
+extern asmlinkage void ret_from_fork_asm(void);
void noinstr arch_cpu_idle(void)
{
@@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return 0;
}
+asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
+{
+ if (unlikely(fn))
+ fn(fn_arg);
+
+ syscall_exit_to_user_mode(regs);
+}
+
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
{
unsigned long clone_flags = args->flags;
@@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.riscv_v_flags = 0;
if (has_vector())
riscv_v_thread_alloc(p);
- p->thread.ra = (unsigned long)ret_from_fork;
+ p->thread.ra = (unsigned long)ret_from_fork_asm;
p->thread.sp = (unsigned long)childregs; /* kernel sp */
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
2025-01-24 7:19 ` Alexandre Ghiti
2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins
This function was unified into a single function in commit ab9164dae273
("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
However that imposed a performance degradation. Partially reverting this
commit to have ret_from_fork() split again results in a 1% increase on
the number of times fork is able to be called per second.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
arch/riscv/include/asm/asm-prototypes.h | 3 ++-
arch/riscv/kernel/entry.S | 13 ++++++++++---
arch/riscv/kernel/process.c | 17 +++++++++++------
3 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
DECLARE_DO_ERROR_INFO(do_trap_break);
-asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
+asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
+asmlinkage void ret_from_fork_user(struct pt_regs *regs);
asmlinkage void handle_bad_stack(struct pt_regs *regs);
asmlinkage void do_page_fault(struct pt_regs *regs);
asmlinkage void do_irq(struct pt_regs *regs);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
ASM_NOKPROBE(handle_kernel_stack_overflow)
#endif
-SYM_CODE_START(ret_from_fork_asm)
+SYM_CODE_START(ret_from_fork_kernel_asm)
call schedule_tail
move a0, s1 /* fn */
move a1, s0 /* fn_arg */
move a2, sp /* pt_regs */
- call ret_from_fork
+ call ret_from_fork_kernel
j ret_from_exception
-SYM_CODE_END(ret_from_fork_asm)
+SYM_CODE_END(ret_from_fork_kernel_asm)
+
+SYM_CODE_START(ret_from_fork_user_asm)
+ call schedule_tail
+ move a0, sp /* pt_regs */
+ call ret_from_fork_user
+ j ret_from_exception
+SYM_CODE_END(ret_from_fork_user_asm)
#ifdef CONFIG_IRQ_STACKS
/*
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
EXPORT_SYMBOL(__stack_chk_guard);
#endif
-extern asmlinkage void ret_from_fork_asm(void);
+extern asmlinkage void ret_from_fork_kernel_asm(void);
+extern asmlinkage void ret_from_fork_user_asm(void);
void noinstr arch_cpu_idle(void)
{
@@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return 0;
}
-asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
+asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
{
- if (unlikely(fn))
- fn(fn_arg);
+ fn(fn_arg);
syscall_exit_to_user_mode(regs);
}
+asmlinkage void ret_from_fork_user(struct pt_regs *regs)
+{
+ syscall_exit_to_user_mode(regs);
+}
+
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
{
unsigned long clone_flags = args->flags;
@@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.s[0] = (unsigned long)args->fn;
p->thread.s[1] = (unsigned long)args->fn_arg;
+ p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
} else {
*childregs = *(current_pt_regs());
/* Turn off status.VS */
@@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
if (clone_flags & CLONE_SETTLS)
childregs->tp = tls;
childregs->a0 = 0; /* Return value of fork() */
- p->thread.s[0] = 0;
+ p->thread.ra = (unsigned long)ret_from_fork_user_asm;
}
p->thread.riscv_v_flags = 0;
if (has_vector())
riscv_v_thread_alloc(p);
- p->thread.ra = (unsigned long)ret_from_fork_asm;
p->thread.sp = (unsigned long)childregs; /* kernel sp */
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
2025-01-24 9:05 ` Huacai Chen
2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins
Loongarch is the only architecture that calls
syscall_exit_to_user_mode() from asm. Move the call into C so that this
function can be inlined across all architectures.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
arch/loongarch/include/asm/asm-prototypes.h | 5 +++++
arch/loongarch/include/asm/switch_to.h | 8 +++++++
arch/loongarch/kernel/entry.S | 22 +++++++++----------
arch/loongarch/kernel/process.c | 34 ++++++++++++++++++++++++-----
4 files changed, 51 insertions(+), 18 deletions(-)
diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
--- a/arch/loongarch/include/asm/asm-prototypes.h
+++ b/arch/loongarch/include/asm/asm-prototypes.h
@@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
__int128_t __ashrti3(__int128_t a, int b);
__int128_t __lshrti3(__int128_t a, int b);
#endif
+
+asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+ struct pt_regs *regs,
+ int (*fn)(void *),
+ void *fn_arg);
diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
--- a/arch/loongarch/include/asm/switch_to.h
+++ b/arch/loongarch/include/asm/switch_to.h
@@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *next, struct thread_info *next_ti,
void *sched_ra, void *sched_cfa);
+void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+ struct pt_regs *regs,
+ int (*fn)(void *),
+ void *fn_arg);
+
+void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
+ struct pt_regs *regs);
+
/*
* For newly created kernel threads switch_to() will return to
* ret_from_kernel_thread, newly created user threads to ret_from_fork.
diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
--- a/arch/loongarch/kernel/entry.S
+++ b/arch/loongarch/kernel/entry.S
@@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
SYM_CODE_END(handle_syscall)
_ASM_NOKPROBE(handle_syscall)
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
UNWIND_HINT_REGS
- bl schedule_tail # a0 = struct task_struct *prev
- move a0, sp
- bl syscall_exit_to_user_mode
+ move a1, sp
+ bl ret_from_fork
RESTORE_STATIC
RESTORE_SOME
RESTORE_SP_AND_RET
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
-SYM_CODE_START(ret_from_kernel_thread)
+SYM_CODE_START(ret_from_kernel_thread_asm)
UNWIND_HINT_REGS
- bl schedule_tail # a0 = struct task_struct *prev
- move a0, s1
- jirl ra, s0, 0
- move a0, sp
- bl syscall_exit_to_user_mode
+ move a1, sp
+ move a2, s0
+ move a3, s1
+ bl ret_from_kernel_thread
RESTORE_STATIC
RESTORE_SOME
RESTORE_SP_AND_RET
-SYM_CODE_END(ret_from_kernel_thread)
+SYM_CODE_END(ret_from_kernel_thread_asm)
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -14,6 +14,7 @@
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/errno.h>
+#include <linux/entry-common.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/sched/task.h>
@@ -33,6 +34,7 @@
#include <linux/prctl.h>
#include <linux/nmi.h>
+#include <asm/asm-prototypes.h>
#include <asm/asm.h>
#include <asm/bootinfo.h>
#include <asm/cpu.h>
@@ -47,6 +49,7 @@
#include <asm/pgtable.h>
#include <asm/processor.h>
#include <asm/reg.h>
+#include <asm/switch_to.h>
#include <asm/unwind.h>
#include <asm/vdso.h>
@@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
EXPORT_SYMBOL(boot_option_idle_override);
-asmlinkage void ret_from_fork(void);
-asmlinkage void ret_from_kernel_thread(void);
+asmlinkage void restore_and_ret(void);
+asmlinkage void ret_from_fork_asm(void);
+asmlinkage void ret_from_kernel_thread_asm(void);
void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
{
@@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return 0;
}
+asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+ struct pt_regs *regs,
+ int (*fn)(void *),
+ void *fn_arg)
+{
+ schedule_tail(prev);
+
+ fn(fn_arg);
+
+ syscall_exit_to_user_mode(regs);
+}
+
+void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
+{
+ schedule_tail(prev);
+ syscall_exit_to_user_mode(regs);
+}
+
/*
* Copy architecture-specific thread state
*/
@@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.reg03 = childksp;
p->thread.reg23 = (unsigned long)args->fn;
p->thread.reg24 = (unsigned long)args->fn_arg;
- p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
- p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
+ p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
+ p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
memset(childregs, 0, sizeof(struct pt_regs));
childregs->csr_euen = p->thread.csr_euen;
childregs->csr_crmd = p->thread.csr_crmd;
@@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
childregs->regs[3] = usp;
p->thread.reg03 = (unsigned long) childregs;
- p->thread.reg01 = (unsigned long) ret_from_fork;
- p->thread.sched_ra = (unsigned long) ret_from_fork;
+ p->thread.reg01 = (unsigned long) ret_from_fork_asm;
+ p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
/*
* New tasks lose permission to use the fpu. This accelerates context
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode()
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
` (2 preceding siblings ...)
2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
3 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins
Architectures using the generic entry code can be optimized by having
syscall_exit_to_user_mode inlined.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
include/linux/entry-common.h | 43 ++++++++++++++++++++++++++++++++++++--
kernel/entry/common.c | 49 +-------------------------------------------
2 files changed, 42 insertions(+), 50 deletions(-)
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index fc61d0205c97084acc89c8e45e088946f5e6d9b2..ee1c400bc0eb0ebb5850f95e856b819fca7b3577 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -14,6 +14,7 @@
#include <linux/kmsan.h>
#include <asm/entry-common.h>
+#include <asm/syscall.h>
/*
* Define dummy _TIF work flags if not defined by the architecture or for
@@ -366,6 +367,15 @@ static __always_inline void exit_to_user_mode(void)
lockdep_hardirqs_on(CALLER_ADDR0);
}
+/**
+ * syscall_exit_work - Handle work before returning to user mode
+ * @regs: Pointer to current pt_regs
+ * @work: Current thread syscall work
+ *
+ * Do one-time syscall specific work.
+ */
+void syscall_exit_work(struct pt_regs *regs, unsigned long work);
+
/**
* syscall_exit_to_user_mode_work - Handle work before returning to user mode
* @regs: Pointer to currents pt_regs
@@ -379,7 +389,30 @@ static __always_inline void exit_to_user_mode(void)
* make the final state transitions. Interrupts must stay disabled between
* return from this function and the invocation of exit_to_user_mode().
*/
-void syscall_exit_to_user_mode_work(struct pt_regs *regs);
+static __always_inline void syscall_exit_to_user_mode_work(struct pt_regs *regs)
+{
+ unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
+ unsigned long nr = syscall_get_nr(current, regs);
+
+ CT_WARN_ON(ct_state() != PERF_CONTEXT_KERNEL);
+
+ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+ if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
+ local_irq_enable();
+ }
+
+ rseq_syscall(regs);
+
+ /*
+ * Do one-time syscall specific work. If these work items are
+ * enabled, we want to run them exactly once per syscall exit with
+ * interrupts enabled.
+ */
+ if (unlikely(work & SYSCALL_WORK_EXIT))
+ syscall_exit_work(regs, work);
+ local_irq_disable_exit_to_user();
+ exit_to_user_mode_prepare(regs);
+}
/**
* syscall_exit_to_user_mode - Handle work before returning to user mode
@@ -410,7 +443,13 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
* exit_to_user_mode(). This function is preferred unless there is a
* compelling architectural reason to use the separate functions.
*/
-void syscall_exit_to_user_mode(struct pt_regs *regs);
+static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
+{
+ instrumentation_begin();
+ syscall_exit_to_user_mode_work(regs);
+ instrumentation_end();
+ exit_to_user_mode();
+}
/**
* irqentry_enter_from_user_mode - Establish state before invoking the irq handler
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e33691d5adf7aab4af54cf2bf8e5ef5bd6ad1424..f55e421fb196dd5f9d4e34dd85ae096c774cf879 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -146,7 +146,7 @@ static inline bool report_single_step(unsigned long work)
return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
}
-static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
+void syscall_exit_work(struct pt_regs *regs, unsigned long work)
{
bool step;
@@ -173,53 +173,6 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
ptrace_report_syscall_exit(regs, step);
}
-/*
- * Syscall specific exit to user mode preparation. Runs with interrupts
- * enabled.
- */
-static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
-{
- unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
- unsigned long nr = syscall_get_nr(current, regs);
-
- CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
-
- if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
- if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
- local_irq_enable();
- }
-
- rseq_syscall(regs);
-
- /*
- * Do one-time syscall specific work. If these work items are
- * enabled, we want to run them exactly once per syscall exit with
- * interrupts enabled.
- */
- if (unlikely(work & SYSCALL_WORK_EXIT))
- syscall_exit_work(regs, work);
-}
-
-static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
- syscall_exit_to_user_mode_prepare(regs);
- local_irq_disable_exit_to_user();
- exit_to_user_mode_prepare(regs);
-}
-
-void syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
- __syscall_exit_to_user_mode_work(regs);
-}
-
-__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
-{
- instrumentation_begin();
- __syscall_exit_to_user_mode_work(regs);
- instrumentation_end();
- exit_to_user_mode();
-}
-
noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
{
enter_from_user_mode(regs);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
@ 2025-01-24 7:19 ` Alexandre Ghiti
2025-01-24 7:53 ` Charlie Jenkins
0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Ghiti @ 2025-01-24 7:19 UTC (permalink / raw)
To: Charlie Jenkins, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
Alexandre Ghiti
Cc: linux-riscv, linux-kernel, loongarch
Hi Charlie,
On 23/01/2025 20:14, Charlie Jenkins wrote:
> This function was unified into a single function in commit ab9164dae273
> ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> However that imposed a performance degradation. Partially reverting this
> commit to have ret_from_fork() split again results in a 1% increase on
> the number of times fork is able to be called per second.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
> arch/riscv/include/asm/asm-prototypes.h | 3 ++-
> arch/riscv/kernel/entry.S | 13 ++++++++++---
> arch/riscv/kernel/process.c | 17 +++++++++++------
> 3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> DECLARE_DO_ERROR_INFO(do_trap_break);
>
> -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> asmlinkage void handle_bad_stack(struct pt_regs *regs);
> asmlinkage void do_page_fault(struct pt_regs *regs);
> asmlinkage void do_irq(struct pt_regs *regs);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> ASM_NOKPROBE(handle_kernel_stack_overflow)
> #endif
>
> -SYM_CODE_START(ret_from_fork_asm)
> +SYM_CODE_START(ret_from_fork_kernel_asm)
> call schedule_tail
> move a0, s1 /* fn */
> move a1, s0 /* fn_arg */
> move a2, sp /* pt_regs */
> - call ret_from_fork
> + call ret_from_fork_kernel
> j ret_from_exception
> -SYM_CODE_END(ret_from_fork_asm)
> +SYM_CODE_END(ret_from_fork_kernel_asm)
> +
> +SYM_CODE_START(ret_from_fork_user_asm)
> + call schedule_tail
> + move a0, sp /* pt_regs */
> + call ret_from_fork_user
> + j ret_from_exception
> +SYM_CODE_END(ret_from_fork_user_asm)
>
> #ifdef CONFIG_IRQ_STACKS
> /*
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> EXPORT_SYMBOL(__stack_chk_guard);
> #endif
>
> -extern asmlinkage void ret_from_fork_asm(void);
> +extern asmlinkage void ret_from_fork_kernel_asm(void);
> +extern asmlinkage void ret_from_fork_user_asm(void);
>
> void noinstr arch_cpu_idle(void)
> {
> @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> return 0;
> }
>
> -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> {
> - if (unlikely(fn))
> - fn(fn_arg);
> + fn(fn_arg);
>
> syscall_exit_to_user_mode(regs);
> }
>
> +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> +{
> + syscall_exit_to_user_mode(regs);
> +}
> +
> int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> {
> unsigned long clone_flags = args->flags;
> @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>
> p->thread.s[0] = (unsigned long)args->fn;
> p->thread.s[1] = (unsigned long)args->fn_arg;
> + p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> } else {
> *childregs = *(current_pt_regs());
> /* Turn off status.VS */
> @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> if (clone_flags & CLONE_SETTLS)
> childregs->tp = tls;
> childregs->a0 = 0; /* Return value of fork() */
> - p->thread.s[0] = 0;
> + p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> }
> p->thread.riscv_v_flags = 0;
> if (has_vector())
> riscv_v_thread_alloc(p);
> - p->thread.ra = (unsigned long)ret_from_fork_asm;
> p->thread.sp = (unsigned long)childregs; /* kernel sp */
> return 0;
> }
>
Can you benchmark this change on some HW? I'm not sure we would indeed
gain this 1%.
Thanks,
Alex
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
2025-01-24 7:19 ` Alexandre Ghiti
@ 2025-01-24 7:53 ` Charlie Jenkins
2025-01-24 13:08 ` Brian Gerst
0 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 7:53 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
linux-riscv, linux-kernel, loongarch
On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> Hi Charlie,
>
> On 23/01/2025 20:14, Charlie Jenkins wrote:
> > This function was unified into a single function in commit ab9164dae273
> > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > However that imposed a performance degradation. Partially reverting this
> > commit to have ret_from_fork() split again results in a 1% increase on
> > the number of times fork is able to be called per second.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> > arch/riscv/include/asm/asm-prototypes.h | 3 ++-
> > arch/riscv/kernel/entry.S | 13 ++++++++++---
> > arch/riscv/kernel/process.c | 17 +++++++++++------
> > 3 files changed, 23 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > DECLARE_DO_ERROR_INFO(do_trap_break);
> > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> > asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > asmlinkage void do_page_fault(struct pt_regs *regs);
> > asmlinkage void do_irq(struct pt_regs *regs);
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > ASM_NOKPROBE(handle_kernel_stack_overflow)
> > #endif
> > -SYM_CODE_START(ret_from_fork_asm)
> > +SYM_CODE_START(ret_from_fork_kernel_asm)
> > call schedule_tail
> > move a0, s1 /* fn */
> > move a1, s0 /* fn_arg */
> > move a2, sp /* pt_regs */
> > - call ret_from_fork
> > + call ret_from_fork_kernel
> > j ret_from_exception
> > -SYM_CODE_END(ret_from_fork_asm)
> > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > +
> > +SYM_CODE_START(ret_from_fork_user_asm)
> > + call schedule_tail
> > + move a0, sp /* pt_regs */
> > + call ret_from_fork_user
> > + j ret_from_exception
> > +SYM_CODE_END(ret_from_fork_user_asm)
> > #ifdef CONFIG_IRQ_STACKS
> > /*
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> > EXPORT_SYMBOL(__stack_chk_guard);
> > #endif
> > -extern asmlinkage void ret_from_fork_asm(void);
> > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > +extern asmlinkage void ret_from_fork_user_asm(void);
> > void noinstr arch_cpu_idle(void)
> > {
> > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > return 0;
> > }
> > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > {
> > - if (unlikely(fn))
> > - fn(fn_arg);
> > + fn(fn_arg);
> > syscall_exit_to_user_mode(regs);
> > }
> > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > +{
> > + syscall_exit_to_user_mode(regs);
> > +}
> > +
> > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > {
> > unsigned long clone_flags = args->flags;
> > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > p->thread.s[0] = (unsigned long)args->fn;
> > p->thread.s[1] = (unsigned long)args->fn_arg;
> > + p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> > } else {
> > *childregs = *(current_pt_regs());
> > /* Turn off status.VS */
> > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > if (clone_flags & CLONE_SETTLS)
> > childregs->tp = tls;
> > childregs->a0 = 0; /* Return value of fork() */
> > - p->thread.s[0] = 0;
> > + p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> > }
> > p->thread.riscv_v_flags = 0;
> > if (has_vector())
> > riscv_v_thread_alloc(p);
> > - p->thread.ra = (unsigned long)ret_from_fork_asm;
> > p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > return 0;
> > }
> >
>
> Can you benchmark this change on some HW? I'm not sure we would indeed gain
> this 1%.
It reduces the syscall path by 3 instructions, two for not needing to
move the fn and fn_args from:
move a0, s1 /* fn */
move a1, s0 /* fn_arg */
And one for not needing to do the conditional. This one is also saved on
kernel threads.
It's a very small improvement, but there is only something like 100
instructions along the direct syscall path so it ends up being a large
percentage. On hardware moving registers is very cheap and this branch
will be almost always be correctly predicted so the cost is close to
zero. I just figured that since I am making changes around here it would
be nice if it was optimal instead of being close to optimal.
- Charlie
>
> Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
@ 2025-01-24 9:05 ` Huacai Chen
2025-01-24 18:28 ` Charlie Jenkins
0 siblings, 1 reply; 14+ messages in thread
From: Huacai Chen @ 2025-01-24 9:05 UTC (permalink / raw)
To: Charlie Jenkins
Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
linux-kernel, loongarch
Hi, Charlie,
On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Loongarch is the only architecture that calls
We usually use "LoongArch" instead of "loongarch" or "Loongarch".
> syscall_exit_to_user_mode() from asm. Move the call into C so that this
> function can be inlined across all architectures.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
> arch/loongarch/include/asm/asm-prototypes.h | 5 +++++
> arch/loongarch/include/asm/switch_to.h | 8 +++++++
> arch/loongarch/kernel/entry.S | 22 +++++++++----------
> arch/loongarch/kernel/process.c | 34 ++++++++++++++++++++++++-----
> 4 files changed, 51 insertions(+), 18 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> --- a/arch/loongarch/include/asm/asm-prototypes.h
> +++ b/arch/loongarch/include/asm/asm-prototypes.h
> @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
> __int128_t __ashrti3(__int128_t a, int b);
> __int128_t __lshrti3(__int128_t a, int b);
> #endif
> +
> +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> + struct pt_regs *regs,
> + int (*fn)(void *),
> + void *fn_arg);
It is a little strange that we only need to declare
ret_from_kernel_thread() but not ret_from_fork().
> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
> --- a/arch/loongarch/include/asm/switch_to.h
> +++ b/arch/loongarch/include/asm/switch_to.h
> @@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
> struct task_struct *next, struct thread_info *next_ti,
> void *sched_ra, void *sched_cfa);
>
> +void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> + struct pt_regs *regs,
> + int (*fn)(void *),
> + void *fn_arg);
> +
> +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
> + struct pt_regs *regs);
> +
I prefer alpha-betical order, which means put ret_from_fork() before
ret_from_kernel_thread().
> /*
> * For newly created kernel threads switch_to() will return to
> * ret_from_kernel_thread, newly created user threads to ret_from_fork.
> diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
> index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
> --- a/arch/loongarch/kernel/entry.S
> +++ b/arch/loongarch/kernel/entry.S
> @@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
> SYM_CODE_END(handle_syscall)
> _ASM_NOKPROBE(handle_syscall)
>
> -SYM_CODE_START(ret_from_fork)
> +SYM_CODE_START(ret_from_fork_asm)
> UNWIND_HINT_REGS
> - bl schedule_tail # a0 = struct task_struct *prev
> - move a0, sp
> - bl syscall_exit_to_user_mode
> + move a1, sp
> + bl ret_from_fork
> RESTORE_STATIC
> RESTORE_SOME
> RESTORE_SP_AND_RET
> -SYM_CODE_END(ret_from_fork)
> +SYM_CODE_END(ret_from_fork_asm)
>
> -SYM_CODE_START(ret_from_kernel_thread)
> +SYM_CODE_START(ret_from_kernel_thread_asm)
> UNWIND_HINT_REGS
> - bl schedule_tail # a0 = struct task_struct *prev
> - move a0, s1
> - jirl ra, s0, 0
> - move a0, sp
> - bl syscall_exit_to_user_mode
> + move a1, sp
> + move a2, s0
> + move a3, s1
> + bl ret_from_kernel_thread
> RESTORE_STATIC
> RESTORE_SOME
> RESTORE_SP_AND_RET
> -SYM_CODE_END(ret_from_kernel_thread)
> +SYM_CODE_END(ret_from_kernel_thread_asm)
> diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
> @@ -14,6 +14,7 @@
> #include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/errno.h>
> +#include <linux/entry-common.h>
> #include <linux/sched.h>
> #include <linux/sched/debug.h>
> #include <linux/sched/task.h>
> @@ -33,6 +34,7 @@
> #include <linux/prctl.h>
> #include <linux/nmi.h>
>
> +#include <asm/asm-prototypes.h>
> #include <asm/asm.h>
> #include <asm/bootinfo.h>
> #include <asm/cpu.h>
> @@ -47,6 +49,7 @@
> #include <asm/pgtable.h>
> #include <asm/processor.h>
> #include <asm/reg.h>
> +#include <asm/switch_to.h>
> #include <asm/unwind.h>
> #include <asm/vdso.h>
>
> @@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
> unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> EXPORT_SYMBOL(boot_option_idle_override);
>
> -asmlinkage void ret_from_fork(void);
> -asmlinkage void ret_from_kernel_thread(void);
> +asmlinkage void restore_and_ret(void);
> +asmlinkage void ret_from_fork_asm(void);
> +asmlinkage void ret_from_kernel_thread_asm(void);
>
> void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
> {
> @@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> return 0;
> }
>
> +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> + struct pt_regs *regs,
> + int (*fn)(void *),
> + void *fn_arg)
> +{
> + schedule_tail(prev);
> +
> + fn(fn_arg);
> +
The two blank lines can be removed, and again, I prefer alpha-betical
order, which means put ret_from_fork() before
ret_from_kernel_thread().
Huacai
> + syscall_exit_to_user_mode(regs);
> +}
> +
> +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
> +{
> + schedule_tail(prev);
> + syscall_exit_to_user_mode(regs);
> +}
> +
> /*
> * Copy architecture-specific thread state
> */
> @@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> p->thread.reg03 = childksp;
> p->thread.reg23 = (unsigned long)args->fn;
> p->thread.reg24 = (unsigned long)args->fn_arg;
> - p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
> - p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
> + p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
> + p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
> memset(childregs, 0, sizeof(struct pt_regs));
> childregs->csr_euen = p->thread.csr_euen;
> childregs->csr_crmd = p->thread.csr_crmd;
> @@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> childregs->regs[3] = usp;
>
> p->thread.reg03 = (unsigned long) childregs;
> - p->thread.reg01 = (unsigned long) ret_from_fork;
> - p->thread.sched_ra = (unsigned long) ret_from_fork;
> + p->thread.reg01 = (unsigned long) ret_from_fork_asm;
> + p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
>
> /*
> * New tasks lose permission to use the fpu. This accelerates context
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
2025-01-24 7:53 ` Charlie Jenkins
@ 2025-01-24 13:08 ` Brian Gerst
2025-01-24 18:26 ` Charlie Jenkins
0 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2025-01-24 13:08 UTC (permalink / raw)
To: Charlie Jenkins
Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
Alexandre Ghiti, linux-riscv, linux-kernel, loongarch
On Fri, Jan 24, 2025 at 2:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> > Hi Charlie,
> >
> > On 23/01/2025 20:14, Charlie Jenkins wrote:
> > > This function was unified into a single function in commit ab9164dae273
> > > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > > However that imposed a performance degradation. Partially reverting this
> > > commit to have ret_from_fork() split again results in a 1% increase on
> > > the number of times fork is able to be called per second.
> > >
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > > arch/riscv/include/asm/asm-prototypes.h | 3 ++-
> > > arch/riscv/kernel/entry.S | 13 ++++++++++---
> > > arch/riscv/kernel/process.c | 17 +++++++++++------
> > > 3 files changed, 23 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > DECLARE_DO_ERROR_INFO(do_trap_break);
> > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> > > asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > asmlinkage void do_page_fault(struct pt_regs *regs);
> > > asmlinkage void do_irq(struct pt_regs *regs);
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > > ASM_NOKPROBE(handle_kernel_stack_overflow)
> > > #endif
> > > -SYM_CODE_START(ret_from_fork_asm)
> > > +SYM_CODE_START(ret_from_fork_kernel_asm)
> > > call schedule_tail
> > > move a0, s1 /* fn */
> > > move a1, s0 /* fn_arg */
> > > move a2, sp /* pt_regs */
> > > - call ret_from_fork
> > > + call ret_from_fork_kernel
> > > j ret_from_exception
> > > -SYM_CODE_END(ret_from_fork_asm)
> > > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > > +
> > > +SYM_CODE_START(ret_from_fork_user_asm)
> > > + call schedule_tail
> > > + move a0, sp /* pt_regs */
> > > + call ret_from_fork_user
> > > + j ret_from_exception
> > > +SYM_CODE_END(ret_from_fork_user_asm)
> > > #ifdef CONFIG_IRQ_STACKS
> > > /*
> > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > > --- a/arch/riscv/kernel/process.c
> > > +++ b/arch/riscv/kernel/process.c
> > > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> > > EXPORT_SYMBOL(__stack_chk_guard);
> > > #endif
> > > -extern asmlinkage void ret_from_fork_asm(void);
> > > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > > +extern asmlinkage void ret_from_fork_user_asm(void);
> > > void noinstr arch_cpu_idle(void)
> > > {
> > > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > > return 0;
> > > }
> > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > {
> > > - if (unlikely(fn))
> > > - fn(fn_arg);
> > > + fn(fn_arg);
> > > syscall_exit_to_user_mode(regs);
> > > }
> > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > > +{
> > > + syscall_exit_to_user_mode(regs);
> > > +}
> > > +
> > > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > {
> > > unsigned long clone_flags = args->flags;
> > > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > p->thread.s[0] = (unsigned long)args->fn;
> > > p->thread.s[1] = (unsigned long)args->fn_arg;
> > > + p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> > > } else {
> > > *childregs = *(current_pt_regs());
> > > /* Turn off status.VS */
> > > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > if (clone_flags & CLONE_SETTLS)
> > > childregs->tp = tls;
> > > childregs->a0 = 0; /* Return value of fork() */
> > > - p->thread.s[0] = 0;
> > > + p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> > > }
> > > p->thread.riscv_v_flags = 0;
> > > if (has_vector())
> > > riscv_v_thread_alloc(p);
> > > - p->thread.ra = (unsigned long)ret_from_fork_asm;
> > > p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > > return 0;
> > > }
> > >
> >
> > Can you benchmark this change on some HW? I'm not sure we would indeed gain
> > this 1%.
>
> It reduces the syscall path by 3 instructions, two for not needing to
> move the fn and fn_args from:
>
> move a0, s1 /* fn */
> move a1, s0 /* fn_arg */
>
> And one for not needing to do the conditional. This one is also saved on
> kernel threads.
>
> It's a very small improvement, but there is only something like 100
> instructions along the direct syscall path so it ends up being a large
> percentage. On hardware moving registers is very cheap and this branch
> will be almost always be correctly predicted so the cost is close to
> zero. I just figured that since I am making changes around here it would
> be nice if it was optimal instead of being close to optimal.
That may be the case on the child process side, but compared to the
cost of the fork on the parent process side (allocating and
initializing a new task struct), it's miniscule.
Brian Gerst
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
@ 2025-01-24 13:14 ` Brian Gerst
2025-01-24 18:23 ` Charlie Jenkins
0 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2025-01-24 13:14 UTC (permalink / raw)
To: Charlie Jenkins
Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
linux-riscv, linux-kernel, loongarch
On Thu, Jan 23, 2025 at 2:15 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Move the main section of ret_from_fork() to C to allow inlining of
> syscall_exit_to_user_mode().
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
> arch/riscv/include/asm/asm-prototypes.h | 1 +
> arch/riscv/kernel/entry.S | 15 ++++++---------
> arch/riscv/kernel/process.c | 14 ++++++++++++--
> 3 files changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> DECLARE_DO_ERROR_INFO(do_trap_break);
>
> +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> asmlinkage void handle_bad_stack(struct pt_regs *regs);
> asmlinkage void do_page_fault(struct pt_regs *regs);
> asmlinkage void do_irq(struct pt_regs *regs);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> ASM_NOKPROBE(handle_kernel_stack_overflow)
> #endif
>
> -SYM_CODE_START(ret_from_fork)
> +SYM_CODE_START(ret_from_fork_asm)
> call schedule_tail
> - beqz s0, 1f /* not from kernel thread */
> - /* Call fn(arg) */
> - move a0, s1
> - jalr s0
> -1:
> - move a0, sp /* pt_regs */
> - call syscall_exit_to_user_mode
> + move a0, s1 /* fn */
> + move a1, s0 /* fn_arg */
> + move a2, sp /* pt_regs */
> + call ret_from_fork
> j ret_from_exception
> -SYM_CODE_END(ret_from_fork)
> +SYM_CODE_END(ret_from_fork_asm)
>
> #ifdef CONFIG_IRQ_STACKS
> /*
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -17,7 +17,9 @@
> #include <linux/ptrace.h>
> #include <linux/uaccess.h>
> #include <linux/personality.h>
> +#include <linux/entry-common.h>
>
> +#include <asm/asm-prototypes.h>
> #include <asm/unistd.h>
> #include <asm/processor.h>
> #include <asm/csr.h>
> @@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
> EXPORT_SYMBOL(__stack_chk_guard);
> #endif
>
> -extern asmlinkage void ret_from_fork(void);
> +extern asmlinkage void ret_from_fork_asm(void);
>
> void noinstr arch_cpu_idle(void)
> {
> @@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> return 0;
> }
>
> +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> +{
> + if (unlikely(fn))
> + fn(fn_arg);
> +
> + syscall_exit_to_user_mode(regs);
> +}
> +
> int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> {
> unsigned long clone_flags = args->flags;
> @@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> p->thread.riscv_v_flags = 0;
> if (has_vector())
> riscv_v_thread_alloc(p);
> - p->thread.ra = (unsigned long)ret_from_fork;
> + p->thread.ra = (unsigned long)ret_from_fork_asm;
> p->thread.sp = (unsigned long)childregs; /* kernel sp */
> return 0;
> }
>
> --
> 2.43.0
>
>
Is there a specific reason you didn't move the call to schedule_tail()
to the C function, like on x86?
Brian Gerst
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
2025-01-24 13:14 ` Brian Gerst
@ 2025-01-24 18:23 ` Charlie Jenkins
0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:23 UTC (permalink / raw)
To: Brian Gerst
Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
linux-riscv, linux-kernel, loongarch
On Fri, Jan 24, 2025 at 08:14:08AM -0500, Brian Gerst wrote:
> On Thu, Jan 23, 2025 at 2:15 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Move the main section of ret_from_fork() to C to allow inlining of
> > syscall_exit_to_user_mode().
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> > arch/riscv/include/asm/asm-prototypes.h | 1 +
> > arch/riscv/kernel/entry.S | 15 ++++++---------
> > arch/riscv/kernel/process.c | 14 ++++++++++++--
> > 3 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > DECLARE_DO_ERROR_INFO(do_trap_break);
> >
> > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > asmlinkage void do_page_fault(struct pt_regs *regs);
> > asmlinkage void do_irq(struct pt_regs *regs);
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > ASM_NOKPROBE(handle_kernel_stack_overflow)
> > #endif
> >
> > -SYM_CODE_START(ret_from_fork)
> > +SYM_CODE_START(ret_from_fork_asm)
> > call schedule_tail
> > - beqz s0, 1f /* not from kernel thread */
> > - /* Call fn(arg) */
> > - move a0, s1
> > - jalr s0
> > -1:
> > - move a0, sp /* pt_regs */
> > - call syscall_exit_to_user_mode
> > + move a0, s1 /* fn */
> > + move a1, s0 /* fn_arg */
> > + move a2, sp /* pt_regs */
> > + call ret_from_fork
> > j ret_from_exception
> > -SYM_CODE_END(ret_from_fork)
> > +SYM_CODE_END(ret_from_fork_asm)
> >
> > #ifdef CONFIG_IRQ_STACKS
> > /*
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -17,7 +17,9 @@
> > #include <linux/ptrace.h>
> > #include <linux/uaccess.h>
> > #include <linux/personality.h>
> > +#include <linux/entry-common.h>
> >
> > +#include <asm/asm-prototypes.h>
> > #include <asm/unistd.h>
> > #include <asm/processor.h>
> > #include <asm/csr.h>
> > @@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
> > EXPORT_SYMBOL(__stack_chk_guard);
> > #endif
> >
> > -extern asmlinkage void ret_from_fork(void);
> > +extern asmlinkage void ret_from_fork_asm(void);
> >
> > void noinstr arch_cpu_idle(void)
> > {
> > @@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > return 0;
> > }
> >
> > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > +{
> > + if (unlikely(fn))
> > + fn(fn_arg);
> > +
> > + syscall_exit_to_user_mode(regs);
> > +}
> > +
> > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > {
> > unsigned long clone_flags = args->flags;
> > @@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > p->thread.riscv_v_flags = 0;
> > if (has_vector())
> > riscv_v_thread_alloc(p);
> > - p->thread.ra = (unsigned long)ret_from_fork;
> > + p->thread.ra = (unsigned long)ret_from_fork_asm;
> > p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > return 0;
> > }
> >
> > --
> > 2.43.0
> >
> >
>
> Is there a specific reason you didn't move the call to schedule_tail()
> to the C function, like on x86?
Yes, the generated code ends up being dramatically worse if
schedule_tail() is moved into C. This is because the arg for
schedule_tail() is already in a0 so the extra stack manipulation
instructions end up taking up a lot of instructions.
With this change:
<ret_from_fork_asm>:
ff65b097 auipc ra,0xff65b
1ee080e7 jalr 494(ra) # ffffffff8005038a <schedule_tail>
8526 mv a0,s1
85a2 mv a1,s0
860a mv a2,sp
ff61b097 auipc ra,0xff61b
606080e7 jalr 1542(ra) # ffffffff800107b0 <ret_from_fork>
b5f5 j ffffffff809f509e <ret_from_exception>
<ret_from_fork>:
1101 addi sp,sp,-32
e822 sd s0,16(sp)
ec06 sd ra,24(sp)
1000 addi s0,sp,32
e991 bnez a1,ffffffff800107cc <ret_from_fork+0x1c>
8532 mv a0,a2
009db097 auipc ra,0x9db
a32080e7 jalr -1486(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
60e2 ld ra,24(sp)
6442 ld s0,16(sp)
6105 addi sp,sp,32
8082 ret
<following instructions used only in the kernel thread case>
fec43423 sd a2,-24(s0)
9582 jalr a1
fe843603 ld a2,-24(s0)
8532 mv a0,a2
009db097 auipc ra,0x9db
a16080e7 jalr -1514(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
60e2 ld ra,24(sp)
6442 ld s0,16(sp)
6105 addi sp,sp,32
8082 ret
Contrasted with what this looks like if schedule_tail() is called from
C.
<ret_from_fork_asm>:
85a6 mv a1,s1
8622 mv a2,s0
868a mv a3,sp
ff61b097 auipc ra,0xff61b
60e080e7 jalr 1550(ra) # ffffffff800107b0 <ret_from_fork>
bdd5 j ffffffff809f509e <ret_from_exception>
<ret_from_fork>:
7179 addi sp,sp,-48
f022 sd s0,32(sp)
ec26 sd s1,24(sp)
e84a sd s2,16(sp)
e44e sd s3,8(sp)
f406 sd ra,40(sp)
1800 addi s0,sp,48
84b2 mv s1,a2
89ae mv s3,a1
8936 mv s2,a3
3c73f0ef jal ffffffff8005038a <schedule_tail>
ec89 bnez s1,ffffffff800107e2 <ret_from_fork+0x32>
854a mv a0,s2
009db097 auipc ra,0x9db
a22080e7 jalr -1502(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
70a2 ld ra,40(sp)
7402 ld s0,32(sp)
64e2 ld s1,24(sp)
6942 ld s2,16(sp)
69a2 ld s3,8(sp)
6145 addi sp,sp,48
8082 ret
854e mv a0,s3
9482 jalr s1
b7d5 j ffffffff800107ca <ret_from_fork+0x1a>
ret_from_fork_asm ends up being 2 instructions more when calling from
asm, but the user fork ret_from_fork ends up being only 12 instructions
rather than 22 instructions when calling from C. If we were able to mix
asm and C code in a naked function we would be able to get rid of the
stack manipulation and still be able to inline C but we don't live in
that world...
- Charlie
>
>
> Brian Gerst
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
2025-01-24 13:08 ` Brian Gerst
@ 2025-01-24 18:26 ` Charlie Jenkins
0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:26 UTC (permalink / raw)
To: Brian Gerst
Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
Alexandre Ghiti, linux-riscv, linux-kernel, loongarch
On Fri, Jan 24, 2025 at 08:08:44AM -0500, Brian Gerst wrote:
> On Fri, Jan 24, 2025 at 2:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> > > Hi Charlie,
> > >
> > > On 23/01/2025 20:14, Charlie Jenkins wrote:
> > > > This function was unified into a single function in commit ab9164dae273
> > > > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > > > However that imposed a performance degradation. Partially reverting this
> > > > commit to have ret_from_fork() split again results in a 1% increase on
> > > > the number of times fork is able to be called per second.
> > > >
> > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > ---
> > > > arch/riscv/include/asm/asm-prototypes.h | 3 ++-
> > > > arch/riscv/kernel/entry.S | 13 ++++++++++---
> > > > arch/riscv/kernel/process.c | 17 +++++++++++------
> > > > 3 files changed, 23 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > > DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > > DECLARE_DO_ERROR_INFO(do_trap_break);
> > > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> > > > asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > > asmlinkage void do_page_fault(struct pt_regs *regs);
> > > > asmlinkage void do_irq(struct pt_regs *regs);
> > > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > > > --- a/arch/riscv/kernel/entry.S
> > > > +++ b/arch/riscv/kernel/entry.S
> > > > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > > > ASM_NOKPROBE(handle_kernel_stack_overflow)
> > > > #endif
> > > > -SYM_CODE_START(ret_from_fork_asm)
> > > > +SYM_CODE_START(ret_from_fork_kernel_asm)
> > > > call schedule_tail
> > > > move a0, s1 /* fn */
> > > > move a1, s0 /* fn_arg */
> > > > move a2, sp /* pt_regs */
> > > > - call ret_from_fork
> > > > + call ret_from_fork_kernel
> > > > j ret_from_exception
> > > > -SYM_CODE_END(ret_from_fork_asm)
> > > > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > > > +
> > > > +SYM_CODE_START(ret_from_fork_user_asm)
> > > > + call schedule_tail
> > > > + move a0, sp /* pt_regs */
> > > > + call ret_from_fork_user
> > > > + j ret_from_exception
> > > > +SYM_CODE_END(ret_from_fork_user_asm)
> > > > #ifdef CONFIG_IRQ_STACKS
> > > > /*
> > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > > > --- a/arch/riscv/kernel/process.c
> > > > +++ b/arch/riscv/kernel/process.c
> > > > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> > > > EXPORT_SYMBOL(__stack_chk_guard);
> > > > #endif
> > > > -extern asmlinkage void ret_from_fork_asm(void);
> > > > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > > > +extern asmlinkage void ret_from_fork_user_asm(void);
> > > > void noinstr arch_cpu_idle(void)
> > > > {
> > > > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > > > return 0;
> > > > }
> > > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > > {
> > > > - if (unlikely(fn))
> > > > - fn(fn_arg);
> > > > + fn(fn_arg);
> > > > syscall_exit_to_user_mode(regs);
> > > > }
> > > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > > > +{
> > > > + syscall_exit_to_user_mode(regs);
> > > > +}
> > > > +
> > > > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > > {
> > > > unsigned long clone_flags = args->flags;
> > > > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > > p->thread.s[0] = (unsigned long)args->fn;
> > > > p->thread.s[1] = (unsigned long)args->fn_arg;
> > > > + p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> > > > } else {
> > > > *childregs = *(current_pt_regs());
> > > > /* Turn off status.VS */
> > > > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > > if (clone_flags & CLONE_SETTLS)
> > > > childregs->tp = tls;
> > > > childregs->a0 = 0; /* Return value of fork() */
> > > > - p->thread.s[0] = 0;
> > > > + p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> > > > }
> > > > p->thread.riscv_v_flags = 0;
> > > > if (has_vector())
> > > > riscv_v_thread_alloc(p);
> > > > - p->thread.ra = (unsigned long)ret_from_fork_asm;
> > > > p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > > > return 0;
> > > > }
> > > >
> > >
> > > Can you benchmark this change on some HW? I'm not sure we would indeed gain
> > > this 1%.
> >
> > It reduces the syscall path by 3 instructions, two for not needing to
> > move the fn and fn_args from:
> >
> > move a0, s1 /* fn */
> > move a1, s0 /* fn_arg */
> >
> > And one for not needing to do the conditional. This one is also saved on
> > kernel threads.
> >
> > It's a very small improvement, but there is only something like 100
> > instructions along the direct syscall path so it ends up being a large
> > percentage. On hardware moving registers is very cheap and this branch
> > will be almost always be correctly predicted so the cost is close to
> > zero. I just figured that since I am making changes around here it would
> > be nice if it was optimal instead of being close to optimal.
>
> That may be the case on the child process side, but compared to the
> cost of the fork on the parent process side (allocating and
> initializing a new task struct), it's miniscule.
>
Yes that is a good point. The change will have a probably unnoticeable
effect, mostly just depends on how people want this code to look.
- Charlie
>
>
> Brian Gerst
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
2025-01-24 9:05 ` Huacai Chen
@ 2025-01-24 18:28 ` Charlie Jenkins
2025-01-24 22:23 ` Charlie Jenkins
0 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:28 UTC (permalink / raw)
To: Huacai Chen
Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
linux-kernel, loongarch
On Fri, Jan 24, 2025 at 05:05:21PM +0800, Huacai Chen wrote:
> Hi, Charlie,
>
> On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Loongarch is the only architecture that calls
> We usually use "LoongArch" instead of "loongarch" or "Loongarch".
>
> > syscall_exit_to_user_mode() from asm. Move the call into C so that this
> > function can be inlined across all architectures.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> > arch/loongarch/include/asm/asm-prototypes.h | 5 +++++
> > arch/loongarch/include/asm/switch_to.h | 8 +++++++
> > arch/loongarch/kernel/entry.S | 22 +++++++++----------
> > arch/loongarch/kernel/process.c | 34 ++++++++++++++++++++++++-----
> > 4 files changed, 51 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> > index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> > --- a/arch/loongarch/include/asm/asm-prototypes.h
> > +++ b/arch/loongarch/include/asm/asm-prototypes.h
> > @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
> > __int128_t __ashrti3(__int128_t a, int b);
> > __int128_t __lshrti3(__int128_t a, int b);
> > #endif
> > +
> > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > + struct pt_regs *regs,
> > + int (*fn)(void *),
> > + void *fn_arg);
> It is a little strange that we only need to declare
> ret_from_kernel_thread() but not ret_from_fork().
Just an oversight by me, thank you for pointing that out.
>
> > diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> > index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
> > --- a/arch/loongarch/include/asm/switch_to.h
> > +++ b/arch/loongarch/include/asm/switch_to.h
> > @@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
> > struct task_struct *next, struct thread_info *next_ti,
> > void *sched_ra, void *sched_cfa);
> >
> > +void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > + struct pt_regs *regs,
> > + int (*fn)(void *),
> > + void *fn_arg);
> > +
> > +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
> > + struct pt_regs *regs);
> > +
> I prefer alpha-betical order, which means put ret_from_fork() before
> ret_from_kernel_thread().
Makes sense!
>
> > /*
> > * For newly created kernel threads switch_to() will return to
> > * ret_from_kernel_thread, newly created user threads to ret_from_fork.
> > diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
> > index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
> > --- a/arch/loongarch/kernel/entry.S
> > +++ b/arch/loongarch/kernel/entry.S
> > @@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
> > SYM_CODE_END(handle_syscall)
> > _ASM_NOKPROBE(handle_syscall)
> >
> > -SYM_CODE_START(ret_from_fork)
> > +SYM_CODE_START(ret_from_fork_asm)
> > UNWIND_HINT_REGS
> > - bl schedule_tail # a0 = struct task_struct *prev
> > - move a0, sp
> > - bl syscall_exit_to_user_mode
> > + move a1, sp
> > + bl ret_from_fork
> > RESTORE_STATIC
> > RESTORE_SOME
> > RESTORE_SP_AND_RET
> > -SYM_CODE_END(ret_from_fork)
> > +SYM_CODE_END(ret_from_fork_asm)
> >
> > -SYM_CODE_START(ret_from_kernel_thread)
> > +SYM_CODE_START(ret_from_kernel_thread_asm)
> > UNWIND_HINT_REGS
> > - bl schedule_tail # a0 = struct task_struct *prev
> > - move a0, s1
> > - jirl ra, s0, 0
> > - move a0, sp
> > - bl syscall_exit_to_user_mode
> > + move a1, sp
> > + move a2, s0
> > + move a3, s1
> > + bl ret_from_kernel_thread
> > RESTORE_STATIC
> > RESTORE_SOME
> > RESTORE_SP_AND_RET
> > -SYM_CODE_END(ret_from_kernel_thread)
> > +SYM_CODE_END(ret_from_kernel_thread_asm)
> > diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> > index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
> > --- a/arch/loongarch/kernel/process.c
> > +++ b/arch/loongarch/kernel/process.c
> > @@ -14,6 +14,7 @@
> > #include <linux/init.h>
> > #include <linux/kernel.h>
> > #include <linux/errno.h>
> > +#include <linux/entry-common.h>
> > #include <linux/sched.h>
> > #include <linux/sched/debug.h>
> > #include <linux/sched/task.h>
> > @@ -33,6 +34,7 @@
> > #include <linux/prctl.h>
> > #include <linux/nmi.h>
> >
> > +#include <asm/asm-prototypes.h>
> > #include <asm/asm.h>
> > #include <asm/bootinfo.h>
> > #include <asm/cpu.h>
> > @@ -47,6 +49,7 @@
> > #include <asm/pgtable.h>
> > #include <asm/processor.h>
> > #include <asm/reg.h>
> > +#include <asm/switch_to.h>
> > #include <asm/unwind.h>
> > #include <asm/vdso.h>
> >
> > @@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
> > unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> > EXPORT_SYMBOL(boot_option_idle_override);
> >
> > -asmlinkage void ret_from_fork(void);
> > -asmlinkage void ret_from_kernel_thread(void);
> > +asmlinkage void restore_and_ret(void);
> > +asmlinkage void ret_from_fork_asm(void);
> > +asmlinkage void ret_from_kernel_thread_asm(void);
> >
> > void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
> > {
> > @@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > return 0;
> > }
> >
> > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > + struct pt_regs *regs,
> > + int (*fn)(void *),
> > + void *fn_arg)
> > +{
> > + schedule_tail(prev);
> > +
> > + fn(fn_arg);
> > +
> The two blank lines can be removed, and again, I prefer alpha-betical
> order, which means put ret_from_fork() before
> ret_from_kernel_thread().
Will update in the next version.
- Charlie
>
> Huacai
>
> > + syscall_exit_to_user_mode(regs);
> > +}
> > +
> > +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
> > +{
> > + schedule_tail(prev);
> > + syscall_exit_to_user_mode(regs);
> > +}
> > +
> > /*
> > * Copy architecture-specific thread state
> > */
> > @@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > p->thread.reg03 = childksp;
> > p->thread.reg23 = (unsigned long)args->fn;
> > p->thread.reg24 = (unsigned long)args->fn_arg;
> > - p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
> > - p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
> > + p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
> > + p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
> > memset(childregs, 0, sizeof(struct pt_regs));
> > childregs->csr_euen = p->thread.csr_euen;
> > childregs->csr_crmd = p->thread.csr_crmd;
> > @@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > childregs->regs[3] = usp;
> >
> > p->thread.reg03 = (unsigned long) childregs;
> > - p->thread.reg01 = (unsigned long) ret_from_fork;
> > - p->thread.sched_ra = (unsigned long) ret_from_fork;
> > + p->thread.reg01 = (unsigned long) ret_from_fork_asm;
> > + p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
> >
> > /*
> > * New tasks lose permission to use the fpu. This accelerates context
> >
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
2025-01-24 18:28 ` Charlie Jenkins
@ 2025-01-24 22:23 ` Charlie Jenkins
0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 22:23 UTC (permalink / raw)
To: Huacai Chen
Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
linux-kernel, loongarch
On Fri, Jan 24, 2025 at 10:28:50AM -0800, Charlie Jenkins wrote:
> On Fri, Jan 24, 2025 at 05:05:21PM +0800, Huacai Chen wrote:
> > Hi, Charlie,
> >
> > On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> > >
> > > Loongarch is the only architecture that calls
> > We usually use "LoongArch" instead of "loongarch" or "Loongarch".
> >
> > > syscall_exit_to_user_mode() from asm. Move the call into C so that this
> > > function can be inlined across all architectures.
> > >
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > > arch/loongarch/include/asm/asm-prototypes.h | 5 +++++
> > > arch/loongarch/include/asm/switch_to.h | 8 +++++++
> > > arch/loongarch/kernel/entry.S | 22 +++++++++----------
> > > arch/loongarch/kernel/process.c | 34 ++++++++++++++++++++++++-----
> > > 4 files changed, 51 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> > > index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> > > --- a/arch/loongarch/include/asm/asm-prototypes.h
> > > +++ b/arch/loongarch/include/asm/asm-prototypes.h
> > > @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
> > > __int128_t __ashrti3(__int128_t a, int b);
> > > __int128_t __lshrti3(__int128_t a, int b);
> > > #endif
> > > +
> > > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > > + struct pt_regs *regs,
> > > + int (*fn)(void *),
> > > + void *fn_arg);
> > It is a little strange that we only need to declare
> > ret_from_kernel_thread() but not ret_from_fork().
>
> Just an oversight by me, thank you for pointing that out.
Oh I see what I did, I meant to put these functions in asm-prototypes
and not in switch_to but I ended up putting them in both.
- Charlie
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-01-24 22:23 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
2025-01-24 13:14 ` Brian Gerst
2025-01-24 18:23 ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
2025-01-24 7:19 ` Alexandre Ghiti
2025-01-24 7:53 ` Charlie Jenkins
2025-01-24 13:08 ` Brian Gerst
2025-01-24 18:26 ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
2025-01-24 9:05 ` Huacai Chen
2025-01-24 18:28 ` Charlie Jenkins
2025-01-24 22:23 ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).