loongarch.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode()
@ 2025-01-23 19:14 Charlie Jenkins
  2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins

Similar to commit 221a164035fd ("entry: Move
syscall_enter_from_user_mode() to header file"), move
syscall_exit_to_user_mode() to the header file as well.

Testing was done with the byte-unixbench [1] syscall benchmark (which
calls getpid) and QEMU. On riscv I measured a 7.09246% improvement, on
x86 a 2.98843% improvement, on loongarch a 6.07954% improvement, and on
s390 a 11.1328% improvement.

Since this is on QEMU, I know these numbers are not perfect, but they
show a trend of general improvement across all architectures that use
the generic entry code.

[1] https://github.com/kdlucas/byte-unixbench

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
Changes in v2:
- Fixup compilation issues for loongarch
- Fixup compilation issues with CONFIG_CONTEXT_TRACKING_USER
- Link to v1: https://lore.kernel.org/r/20250122-riscv_optimize_entry-v1-0-4ee95559cfd0@rivosinc.com

---
Charlie Jenkins (4):
      riscv: entry: Convert ret_from_fork() to C
      riscv: entry: Split ret_from_fork() into user and kernel
      loongarch: entry: Migrate ret_from_fork() to C
      entry: Inline syscall_exit_to_user_mode()

 arch/loongarch/include/asm/asm-prototypes.h |  5 +++
 arch/loongarch/include/asm/switch_to.h      |  8 +++++
 arch/loongarch/kernel/entry.S               | 22 ++++++-------
 arch/loongarch/kernel/process.c             | 34 ++++++++++++++++----
 arch/riscv/include/asm/asm-prototypes.h     |  2 ++
 arch/riscv/kernel/entry.S                   | 20 +++++++-----
 arch/riscv/kernel/process.c                 | 21 +++++++++++--
 include/linux/entry-common.h                | 43 +++++++++++++++++++++++--
 kernel/entry/common.c                       | 49 +----------------------------
 9 files changed, 125 insertions(+), 79 deletions(-)
---
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
change-id: 20240402-riscv_optimize_entry-583843420325
-- 
- Charlie


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
  2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
  2025-01-24 13:14   ` Brian Gerst
  2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins

Move the main section of ret_from_fork() to C to allow inlining of
syscall_exit_to_user_mode().

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/asm-prototypes.h |  1 +
 arch/riscv/kernel/entry.S               | 15 ++++++---------
 arch/riscv/kernel/process.c             | 14 ++++++++++++--
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
 DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
 DECLARE_DO_ERROR_INFO(do_trap_break);
 
+asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
 asmlinkage void handle_bad_stack(struct pt_regs *regs);
 asmlinkage void do_page_fault(struct pt_regs *regs);
 asmlinkage void do_irq(struct pt_regs *regs);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
 ASM_NOKPROBE(handle_kernel_stack_overflow)
 #endif
 
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
 	call schedule_tail
-	beqz s0, 1f	/* not from kernel thread */
-	/* Call fn(arg) */
-	move a0, s1
-	jalr s0
-1:
-	move a0, sp /* pt_regs */
-	call syscall_exit_to_user_mode
+	move a0, s1 /* fn */
+	move a1, s0 /* fn_arg */
+	move a2, sp /* pt_regs */
+	call ret_from_fork
 	j ret_from_exception
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 
 #ifdef CONFIG_IRQ_STACKS
 /*
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -17,7 +17,9 @@
 #include <linux/ptrace.h>
 #include <linux/uaccess.h>
 #include <linux/personality.h>
+#include <linux/entry-common.h>
 
+#include <asm/asm-prototypes.h>
 #include <asm/unistd.h>
 #include <asm/processor.h>
 #include <asm/csr.h>
@@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif
 
-extern asmlinkage void ret_from_fork(void);
+extern asmlinkage void ret_from_fork_asm(void);
 
 void noinstr arch_cpu_idle(void)
 {
@@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	return 0;
 }
 
+asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
+{
+	if (unlikely(fn))
+		fn(fn_arg);
+
+	syscall_exit_to_user_mode(regs);
+}
+
 int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 {
 	unsigned long clone_flags = args->flags;
@@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	p->thread.riscv_v_flags = 0;
 	if (has_vector())
 		riscv_v_thread_alloc(p);
-	p->thread.ra = (unsigned long)ret_from_fork;
+	p->thread.ra = (unsigned long)ret_from_fork_asm;
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
  2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
  2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
  2025-01-24  7:19   ` Alexandre Ghiti
  2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
  2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
  3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins

This function was unified into a single function in commit ab9164dae273
("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
However that imposed a performance degradation. Partially reverting this
commit to have ret_from_fork() split again results in a 1% increase on
the number of times fork is able to be called per second.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/asm-prototypes.h |  3 ++-
 arch/riscv/kernel/entry.S               | 13 ++++++++++---
 arch/riscv/kernel/process.c             | 17 +++++++++++------
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
 DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
 DECLARE_DO_ERROR_INFO(do_trap_break);
 
-asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
+asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
+asmlinkage void ret_from_fork_user(struct pt_regs *regs);
 asmlinkage void handle_bad_stack(struct pt_regs *regs);
 asmlinkage void do_page_fault(struct pt_regs *regs);
 asmlinkage void do_irq(struct pt_regs *regs);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
 ASM_NOKPROBE(handle_kernel_stack_overflow)
 #endif
 
-SYM_CODE_START(ret_from_fork_asm)
+SYM_CODE_START(ret_from_fork_kernel_asm)
 	call schedule_tail
 	move a0, s1 /* fn */
 	move a1, s0 /* fn_arg */
 	move a2, sp /* pt_regs */
-	call ret_from_fork
+	call ret_from_fork_kernel
 	j ret_from_exception
-SYM_CODE_END(ret_from_fork_asm)
+SYM_CODE_END(ret_from_fork_kernel_asm)
+
+SYM_CODE_START(ret_from_fork_user_asm)
+	call schedule_tail
+	move a0, sp /* pt_regs */
+	call ret_from_fork_user
+	j ret_from_exception
+SYM_CODE_END(ret_from_fork_user_asm)
 
 #ifdef CONFIG_IRQ_STACKS
 /*
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
 EXPORT_SYMBOL(__stack_chk_guard);
 #endif
 
-extern asmlinkage void ret_from_fork_asm(void);
+extern asmlinkage void ret_from_fork_kernel_asm(void);
+extern asmlinkage void ret_from_fork_user_asm(void);
 
 void noinstr arch_cpu_idle(void)
 {
@@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	return 0;
 }
 
-asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
+asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
 {
-	if (unlikely(fn))
-		fn(fn_arg);
+	fn(fn_arg);
 
 	syscall_exit_to_user_mode(regs);
 }
 
+asmlinkage void ret_from_fork_user(struct pt_regs *regs)
+{
+	syscall_exit_to_user_mode(regs);
+}
+
 int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 {
 	unsigned long clone_flags = args->flags;
@@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 
 		p->thread.s[0] = (unsigned long)args->fn;
 		p->thread.s[1] = (unsigned long)args->fn_arg;
+		p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
 	} else {
 		*childregs = *(current_pt_regs());
 		/* Turn off status.VS */
@@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		if (clone_flags & CLONE_SETTLS)
 			childregs->tp = tls;
 		childregs->a0 = 0; /* Return value of fork() */
-		p->thread.s[0] = 0;
+		p->thread.ra = (unsigned long)ret_from_fork_user_asm;
 	}
 	p->thread.riscv_v_flags = 0;
 	if (has_vector())
 		riscv_v_thread_alloc(p);
-	p->thread.ra = (unsigned long)ret_from_fork_asm;
 	p->thread.sp = (unsigned long)childregs; /* kernel sp */
 	return 0;
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
  2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
  2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
  2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
  2025-01-24  9:05   ` Huacai Chen
  2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins
  3 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins

Loongarch is the only architecture that calls
syscall_exit_to_user_mode() from asm. Move the call into C so that this
function can be inlined across all architectures.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/loongarch/include/asm/asm-prototypes.h |  5 +++++
 arch/loongarch/include/asm/switch_to.h      |  8 +++++++
 arch/loongarch/kernel/entry.S               | 22 +++++++++----------
 arch/loongarch/kernel/process.c             | 34 ++++++++++++++++++++++++-----
 4 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
--- a/arch/loongarch/include/asm/asm-prototypes.h
+++ b/arch/loongarch/include/asm/asm-prototypes.h
@@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
 __int128_t __ashrti3(__int128_t a, int b);
 __int128_t __lshrti3(__int128_t a, int b);
 #endif
+
+asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+								    struct pt_regs *regs,
+								    int (*fn)(void *),
+								    void *fn_arg);
diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
--- a/arch/loongarch/include/asm/switch_to.h
+++ b/arch/loongarch/include/asm/switch_to.h
@@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
 			struct task_struct *next, struct thread_info *next_ti,
 			void *sched_ra, void *sched_cfa);
 
+void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+							 struct pt_regs *regs,
+							 int (*fn)(void *),
+							 void *fn_arg);
+
+void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
+						struct pt_regs *regs);
+
 /*
  * For newly created kernel threads switch_to() will return to
  * ret_from_kernel_thread, newly created user threads to ret_from_fork.
diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
--- a/arch/loongarch/kernel/entry.S
+++ b/arch/loongarch/kernel/entry.S
@@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
 SYM_CODE_END(handle_syscall)
 _ASM_NOKPROBE(handle_syscall)
 
-SYM_CODE_START(ret_from_fork)
+SYM_CODE_START(ret_from_fork_asm)
 	UNWIND_HINT_REGS
-	bl		schedule_tail		# a0 = struct task_struct *prev
-	move		a0, sp
-	bl 		syscall_exit_to_user_mode
+	move		a1, sp
+	bl 		ret_from_fork
 	RESTORE_STATIC
 	RESTORE_SOME
 	RESTORE_SP_AND_RET
-SYM_CODE_END(ret_from_fork)
+SYM_CODE_END(ret_from_fork_asm)
 
-SYM_CODE_START(ret_from_kernel_thread)
+SYM_CODE_START(ret_from_kernel_thread_asm)
 	UNWIND_HINT_REGS
-	bl		schedule_tail		# a0 = struct task_struct *prev
-	move		a0, s1
-	jirl		ra, s0, 0
-	move		a0, sp
-	bl		syscall_exit_to_user_mode
+	move		a1, sp
+	move		a2, s0
+	move		a3, s1
+	bl		ret_from_kernel_thread
 	RESTORE_STATIC
 	RESTORE_SOME
 	RESTORE_SP_AND_RET
-SYM_CODE_END(ret_from_kernel_thread)
+SYM_CODE_END(ret_from_kernel_thread_asm)
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -14,6 +14,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/errno.h>
+#include <linux/entry-common.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task.h>
@@ -33,6 +34,7 @@
 #include <linux/prctl.h>
 #include <linux/nmi.h>
 
+#include <asm/asm-prototypes.h>
 #include <asm/asm.h>
 #include <asm/bootinfo.h>
 #include <asm/cpu.h>
@@ -47,6 +49,7 @@
 #include <asm/pgtable.h>
 #include <asm/processor.h>
 #include <asm/reg.h>
+#include <asm/switch_to.h>
 #include <asm/unwind.h>
 #include <asm/vdso.h>
 
@@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-asmlinkage void ret_from_fork(void);
-asmlinkage void ret_from_kernel_thread(void);
+asmlinkage void restore_and_ret(void);
+asmlinkage void ret_from_fork_asm(void);
+asmlinkage void ret_from_kernel_thread_asm(void);
 
 void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
 {
@@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	return 0;
 }
 
+asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
+								    struct pt_regs *regs,
+								    int (*fn)(void *),
+								    void *fn_arg)
+{
+	schedule_tail(prev);
+
+	fn(fn_arg);
+
+	syscall_exit_to_user_mode(regs);
+}
+
+void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
+{
+	schedule_tail(prev);
+	syscall_exit_to_user_mode(regs);
+}
+
 /*
  * Copy architecture-specific thread state
  */
@@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		p->thread.reg03 = childksp;
 		p->thread.reg23 = (unsigned long)args->fn;
 		p->thread.reg24 = (unsigned long)args->fn_arg;
-		p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
-		p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
+		p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
+		p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->csr_euen = p->thread.csr_euen;
 		childregs->csr_crmd = p->thread.csr_crmd;
@@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		childregs->regs[3] = usp;
 
 	p->thread.reg03 = (unsigned long) childregs;
-	p->thread.reg01 = (unsigned long) ret_from_fork;
-	p->thread.sched_ra = (unsigned long) ret_from_fork;
+	p->thread.reg01 = (unsigned long) ret_from_fork_asm;
+	p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
 
 	/*
 	 * New tasks lose permission to use the fpu. This accelerates context

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode()
  2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
                   ` (2 preceding siblings ...)
  2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
@ 2025-01-23 19:14 ` Charlie Jenkins
  3 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-23 19:14 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch, Charlie Jenkins

Architectures using the generic entry code can be optimized by having
syscall_exit_to_user_mode inlined.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 include/linux/entry-common.h | 43 ++++++++++++++++++++++++++++++++++++--
 kernel/entry/common.c        | 49 +-------------------------------------------
 2 files changed, 42 insertions(+), 50 deletions(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index fc61d0205c97084acc89c8e45e088946f5e6d9b2..ee1c400bc0eb0ebb5850f95e856b819fca7b3577 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -14,6 +14,7 @@
 #include <linux/kmsan.h>
 
 #include <asm/entry-common.h>
+#include <asm/syscall.h>
 
 /*
  * Define dummy _TIF work flags if not defined by the architecture or for
@@ -366,6 +367,15 @@ static __always_inline void exit_to_user_mode(void)
 	lockdep_hardirqs_on(CALLER_ADDR0);
 }
 
+/**
+ * syscall_exit_work - Handle work before returning to user mode
+ * @regs:	Pointer to current pt_regs
+ * @work:	Current thread syscall work
+ *
+ * Do one-time syscall specific work.
+ */
+void syscall_exit_work(struct pt_regs *regs, unsigned long work);
+
 /**
  * syscall_exit_to_user_mode_work - Handle work before returning to user mode
  * @regs:	Pointer to currents pt_regs
@@ -379,7 +389,30 @@ static __always_inline void exit_to_user_mode(void)
  * make the final state transitions. Interrupts must stay disabled between
  * return from this function and the invocation of exit_to_user_mode().
  */
-void syscall_exit_to_user_mode_work(struct pt_regs *regs);
+static __always_inline void syscall_exit_to_user_mode_work(struct pt_regs *regs)
+{
+	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
+	unsigned long nr = syscall_get_nr(current, regs);
+
+	CT_WARN_ON(ct_state() != PERF_CONTEXT_KERNEL);
+
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
+		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
+			local_irq_enable();
+	}
+
+	rseq_syscall(regs);
+
+	/*
+	 * Do one-time syscall specific work. If these work items are
+	 * enabled, we want to run them exactly once per syscall exit with
+	 * interrupts enabled.
+	 */
+	if (unlikely(work & SYSCALL_WORK_EXIT))
+		syscall_exit_work(regs, work);
+	local_irq_disable_exit_to_user();
+	exit_to_user_mode_prepare(regs);
+}
 
 /**
  * syscall_exit_to_user_mode - Handle work before returning to user mode
@@ -410,7 +443,13 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs);
  * exit_to_user_mode(). This function is preferred unless there is a
  * compelling architectural reason to use the separate functions.
  */
-void syscall_exit_to_user_mode(struct pt_regs *regs);
+static __always_inline void syscall_exit_to_user_mode(struct pt_regs *regs)
+{
+	instrumentation_begin();
+	syscall_exit_to_user_mode_work(regs);
+	instrumentation_end();
+	exit_to_user_mode();
+}
 
 /**
  * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index e33691d5adf7aab4af54cf2bf8e5ef5bd6ad1424..f55e421fb196dd5f9d4e34dd85ae096c774cf879 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -146,7 +146,7 @@ static inline bool report_single_step(unsigned long work)
 	return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
 }
 
-static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
+void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 {
 	bool step;
 
@@ -173,53 +173,6 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
 		ptrace_report_syscall_exit(regs, step);
 }
 
-/*
- * Syscall specific exit to user mode preparation. Runs with interrupts
- * enabled.
- */
-static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs)
-{
-	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
-	unsigned long nr = syscall_get_nr(current, regs);
-
-	CT_WARN_ON(ct_state() != CT_STATE_KERNEL);
-
-	if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
-		if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr))
-			local_irq_enable();
-	}
-
-	rseq_syscall(regs);
-
-	/*
-	 * Do one-time syscall specific work. If these work items are
-	 * enabled, we want to run them exactly once per syscall exit with
-	 * interrupts enabled.
-	 */
-	if (unlikely(work & SYSCALL_WORK_EXIT))
-		syscall_exit_work(regs, work);
-}
-
-static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
-	syscall_exit_to_user_mode_prepare(regs);
-	local_irq_disable_exit_to_user();
-	exit_to_user_mode_prepare(regs);
-}
-
-void syscall_exit_to_user_mode_work(struct pt_regs *regs)
-{
-	__syscall_exit_to_user_mode_work(regs);
-}
-
-__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
-{
-	instrumentation_begin();
-	__syscall_exit_to_user_mode_work(regs);
-	instrumentation_end();
-	exit_to_user_mode();
-}
-
 noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
 {
 	enter_from_user_mode(regs);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
  2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
@ 2025-01-24  7:19   ` Alexandre Ghiti
  2025-01-24  7:53     ` Charlie Jenkins
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Ghiti @ 2025-01-24  7:19 UTC (permalink / raw)
  To: Charlie Jenkins, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
	WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, loongarch

Hi Charlie,

On 23/01/2025 20:14, Charlie Jenkins wrote:
> This function was unified into a single function in commit ab9164dae273
> ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> However that imposed a performance degradation. Partially reverting this
> commit to have ret_from_fork() split again results in a 1% increase on
> the number of times fork is able to be called per second.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>   arch/riscv/include/asm/asm-prototypes.h |  3 ++-
>   arch/riscv/kernel/entry.S               | 13 ++++++++++---
>   arch/riscv/kernel/process.c             | 17 +++++++++++------
>   3 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
>   DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
>   DECLARE_DO_ERROR_INFO(do_trap_break);
>   
> -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
>   asmlinkage void handle_bad_stack(struct pt_regs *regs);
>   asmlinkage void do_page_fault(struct pt_regs *regs);
>   asmlinkage void do_irq(struct pt_regs *regs);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
>   ASM_NOKPROBE(handle_kernel_stack_overflow)
>   #endif
>   
> -SYM_CODE_START(ret_from_fork_asm)
> +SYM_CODE_START(ret_from_fork_kernel_asm)
>   	call schedule_tail
>   	move a0, s1 /* fn */
>   	move a1, s0 /* fn_arg */
>   	move a2, sp /* pt_regs */
> -	call ret_from_fork
> +	call ret_from_fork_kernel
>   	j ret_from_exception
> -SYM_CODE_END(ret_from_fork_asm)
> +SYM_CODE_END(ret_from_fork_kernel_asm)
> +
> +SYM_CODE_START(ret_from_fork_user_asm)
> +	call schedule_tail
> +	move a0, sp /* pt_regs */
> +	call ret_from_fork_user
> +	j ret_from_exception
> +SYM_CODE_END(ret_from_fork_user_asm)
>   
>   #ifdef CONFIG_IRQ_STACKS
>   /*
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
>   EXPORT_SYMBOL(__stack_chk_guard);
>   #endif
>   
> -extern asmlinkage void ret_from_fork_asm(void);
> +extern asmlinkage void ret_from_fork_kernel_asm(void);
> +extern asmlinkage void ret_from_fork_user_asm(void);
>   
>   void noinstr arch_cpu_idle(void)
>   {
> @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>   	return 0;
>   }
>   
> -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
>   {
> -	if (unlikely(fn))
> -		fn(fn_arg);
> +	fn(fn_arg);
>   
>   	syscall_exit_to_user_mode(regs);
>   }
>   
> +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> +{
> +	syscall_exit_to_user_mode(regs);
> +}
> +
>   int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   {
>   	unsigned long clone_flags = args->flags;
> @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   
>   		p->thread.s[0] = (unsigned long)args->fn;
>   		p->thread.s[1] = (unsigned long)args->fn_arg;
> +		p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
>   	} else {
>   		*childregs = *(current_pt_regs());
>   		/* Turn off status.VS */
> @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   		if (clone_flags & CLONE_SETTLS)
>   			childregs->tp = tls;
>   		childregs->a0 = 0; /* Return value of fork() */
> -		p->thread.s[0] = 0;
> +		p->thread.ra = (unsigned long)ret_from_fork_user_asm;
>   	}
>   	p->thread.riscv_v_flags = 0;
>   	if (has_vector())
>   		riscv_v_thread_alloc(p);
> -	p->thread.ra = (unsigned long)ret_from_fork_asm;
>   	p->thread.sp = (unsigned long)childregs; /* kernel sp */
>   	return 0;
>   }
>

Can you benchmark this change on some HW? I'm not sure we would indeed 
gain this 1%.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
  2025-01-24  7:19   ` Alexandre Ghiti
@ 2025-01-24  7:53     ` Charlie Jenkins
  2025-01-24 13:08       ` Brian Gerst
  0 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24  7:53 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
	linux-riscv, linux-kernel, loongarch

On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> Hi Charlie,
> 
> On 23/01/2025 20:14, Charlie Jenkins wrote:
> > This function was unified into a single function in commit ab9164dae273
> > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > However that imposed a performance degradation. Partially reverting this
> > commit to have ret_from_fork() split again results in a 1% increase on
> > the number of times fork is able to be called per second.
> > 
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >   arch/riscv/include/asm/asm-prototypes.h |  3 ++-
> >   arch/riscv/kernel/entry.S               | 13 ++++++++++---
> >   arch/riscv/kernel/process.c             | 17 +++++++++++------
> >   3 files changed, 23 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> >   DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> >   DECLARE_DO_ERROR_INFO(do_trap_break);
> > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> >   asmlinkage void handle_bad_stack(struct pt_regs *regs);
> >   asmlinkage void do_page_fault(struct pt_regs *regs);
> >   asmlinkage void do_irq(struct pt_regs *regs);
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> >   ASM_NOKPROBE(handle_kernel_stack_overflow)
> >   #endif
> > -SYM_CODE_START(ret_from_fork_asm)
> > +SYM_CODE_START(ret_from_fork_kernel_asm)
> >   	call schedule_tail
> >   	move a0, s1 /* fn */
> >   	move a1, s0 /* fn_arg */
> >   	move a2, sp /* pt_regs */
> > -	call ret_from_fork
> > +	call ret_from_fork_kernel
> >   	j ret_from_exception
> > -SYM_CODE_END(ret_from_fork_asm)
> > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > +
> > +SYM_CODE_START(ret_from_fork_user_asm)
> > +	call schedule_tail
> > +	move a0, sp /* pt_regs */
> > +	call ret_from_fork_user
> > +	j ret_from_exception
> > +SYM_CODE_END(ret_from_fork_user_asm)
> >   #ifdef CONFIG_IRQ_STACKS
> >   /*
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> >   EXPORT_SYMBOL(__stack_chk_guard);
> >   #endif
> > -extern asmlinkage void ret_from_fork_asm(void);
> > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > +extern asmlinkage void ret_from_fork_user_asm(void);
> >   void noinstr arch_cpu_idle(void)
> >   {
> > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >   	return 0;
> >   }
> > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> >   {
> > -	if (unlikely(fn))
> > -		fn(fn_arg);
> > +	fn(fn_arg);
> >   	syscall_exit_to_user_mode(regs);
> >   }
> > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > +{
> > +	syscall_exit_to_user_mode(regs);
> > +}
> > +
> >   int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >   {
> >   	unsigned long clone_flags = args->flags;
> > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >   		p->thread.s[0] = (unsigned long)args->fn;
> >   		p->thread.s[1] = (unsigned long)args->fn_arg;
> > +		p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> >   	} else {
> >   		*childregs = *(current_pt_regs());
> >   		/* Turn off status.VS */
> > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >   		if (clone_flags & CLONE_SETTLS)
> >   			childregs->tp = tls;
> >   		childregs->a0 = 0; /* Return value of fork() */
> > -		p->thread.s[0] = 0;
> > +		p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> >   	}
> >   	p->thread.riscv_v_flags = 0;
> >   	if (has_vector())
> >   		riscv_v_thread_alloc(p);
> > -	p->thread.ra = (unsigned long)ret_from_fork_asm;
> >   	p->thread.sp = (unsigned long)childregs; /* kernel sp */
> >   	return 0;
> >   }
> > 
> 
> Can you benchmark this change on some HW? I'm not sure we would indeed gain
> this 1%.

It reduces the syscall path by 3 instructions, two for not needing to
move the fn and fn_args from:

move a0, s1 /* fn */
move a1, s0 /* fn_arg */

And one for not needing to do the conditional. This one is also saved on
kernel threads.

It's a very small improvement, but there is only something like 100
instructions along the direct syscall path so it ends up being a large
percentage. On hardware moving registers is very cheap and this branch
will be almost always be correctly predicted so the cost is close to
zero. I just figured that since I am making changes around here it would
be nice if it was optimal instead of being close to optimal.

- Charlie

> 
> Thanks,
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
  2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
@ 2025-01-24  9:05   ` Huacai Chen
  2025-01-24 18:28     ` Charlie Jenkins
  0 siblings, 1 reply; 14+ messages in thread
From: Huacai Chen @ 2025-01-24  9:05 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
	Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
	linux-kernel, loongarch

Hi, Charlie,

On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Loongarch is the only architecture that calls
We usually use "LoongArch" instead of "loongarch" or "Loongarch".

> syscall_exit_to_user_mode() from asm. Move the call into C so that this
> function can be inlined across all architectures.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/loongarch/include/asm/asm-prototypes.h |  5 +++++
>  arch/loongarch/include/asm/switch_to.h      |  8 +++++++
>  arch/loongarch/kernel/entry.S               | 22 +++++++++----------
>  arch/loongarch/kernel/process.c             | 34 ++++++++++++++++++++++++-----
>  4 files changed, 51 insertions(+), 18 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> --- a/arch/loongarch/include/asm/asm-prototypes.h
> +++ b/arch/loongarch/include/asm/asm-prototypes.h
> @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
>  __int128_t __ashrti3(__int128_t a, int b);
>  __int128_t __lshrti3(__int128_t a, int b);
>  #endif
> +
> +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> +                                                                   struct pt_regs *regs,
> +                                                                   int (*fn)(void *),
> +                                                                   void *fn_arg);
It is a little strange that we only need to declare
ret_from_kernel_thread() but not ret_from_fork().

> diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
> --- a/arch/loongarch/include/asm/switch_to.h
> +++ b/arch/loongarch/include/asm/switch_to.h
> @@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
>                         struct task_struct *next, struct thread_info *next_ti,
>                         void *sched_ra, void *sched_cfa);
>
> +void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> +                                                        struct pt_regs *regs,
> +                                                        int (*fn)(void *),
> +                                                        void *fn_arg);
> +
> +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
> +                                               struct pt_regs *regs);
> +
I prefer alpha-betical order, which means put ret_from_fork() before
ret_from_kernel_thread().

>  /*
>   * For newly created kernel threads switch_to() will return to
>   * ret_from_kernel_thread, newly created user threads to ret_from_fork.
> diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
> index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
> --- a/arch/loongarch/kernel/entry.S
> +++ b/arch/loongarch/kernel/entry.S
> @@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
>  SYM_CODE_END(handle_syscall)
>  _ASM_NOKPROBE(handle_syscall)
>
> -SYM_CODE_START(ret_from_fork)
> +SYM_CODE_START(ret_from_fork_asm)
>         UNWIND_HINT_REGS
> -       bl              schedule_tail           # a0 = struct task_struct *prev
> -       move            a0, sp
> -       bl              syscall_exit_to_user_mode
> +       move            a1, sp
> +       bl              ret_from_fork
>         RESTORE_STATIC
>         RESTORE_SOME
>         RESTORE_SP_AND_RET
> -SYM_CODE_END(ret_from_fork)
> +SYM_CODE_END(ret_from_fork_asm)
>
> -SYM_CODE_START(ret_from_kernel_thread)
> +SYM_CODE_START(ret_from_kernel_thread_asm)
>         UNWIND_HINT_REGS
> -       bl              schedule_tail           # a0 = struct task_struct *prev
> -       move            a0, s1
> -       jirl            ra, s0, 0
> -       move            a0, sp
> -       bl              syscall_exit_to_user_mode
> +       move            a1, sp
> +       move            a2, s0
> +       move            a3, s1
> +       bl              ret_from_kernel_thread
>         RESTORE_STATIC
>         RESTORE_SOME
>         RESTORE_SP_AND_RET
> -SYM_CODE_END(ret_from_kernel_thread)
> +SYM_CODE_END(ret_from_kernel_thread_asm)
> diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
> --- a/arch/loongarch/kernel/process.c
> +++ b/arch/loongarch/kernel/process.c
> @@ -14,6 +14,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
> +#include <linux/entry-common.h>
>  #include <linux/sched.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task.h>
> @@ -33,6 +34,7 @@
>  #include <linux/prctl.h>
>  #include <linux/nmi.h>
>
> +#include <asm/asm-prototypes.h>
>  #include <asm/asm.h>
>  #include <asm/bootinfo.h>
>  #include <asm/cpu.h>
> @@ -47,6 +49,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/processor.h>
>  #include <asm/reg.h>
> +#include <asm/switch_to.h>
>  #include <asm/unwind.h>
>  #include <asm/vdso.h>
>
> @@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
>  unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
>  EXPORT_SYMBOL(boot_option_idle_override);
>
> -asmlinkage void ret_from_fork(void);
> -asmlinkage void ret_from_kernel_thread(void);
> +asmlinkage void restore_and_ret(void);
> +asmlinkage void ret_from_fork_asm(void);
> +asmlinkage void ret_from_kernel_thread_asm(void);
>
>  void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
>  {
> @@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>         return 0;
>  }
>
> +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> +                                                                   struct pt_regs *regs,
> +                                                                   int (*fn)(void *),
> +                                                                   void *fn_arg)
> +{
> +       schedule_tail(prev);
> +
> +       fn(fn_arg);
> +
The two blank lines can be removed, and again, I prefer alpha-betical
order, which means put ret_from_fork() before
ret_from_kernel_thread().

Huacai

> +       syscall_exit_to_user_mode(regs);
> +}
> +
> +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
> +{
> +       schedule_tail(prev);
> +       syscall_exit_to_user_mode(regs);
> +}
> +
>  /*
>   * Copy architecture-specific thread state
>   */
> @@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>                 p->thread.reg03 = childksp;
>                 p->thread.reg23 = (unsigned long)args->fn;
>                 p->thread.reg24 = (unsigned long)args->fn_arg;
> -               p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
> -               p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
> +               p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
> +               p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
>                 memset(childregs, 0, sizeof(struct pt_regs));
>                 childregs->csr_euen = p->thread.csr_euen;
>                 childregs->csr_crmd = p->thread.csr_crmd;
> @@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>                 childregs->regs[3] = usp;
>
>         p->thread.reg03 = (unsigned long) childregs;
> -       p->thread.reg01 = (unsigned long) ret_from_fork;
> -       p->thread.sched_ra = (unsigned long) ret_from_fork;
> +       p->thread.reg01 = (unsigned long) ret_from_fork_asm;
> +       p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
>
>         /*
>          * New tasks lose permission to use the fpu. This accelerates context
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
  2025-01-24  7:53     ` Charlie Jenkins
@ 2025-01-24 13:08       ` Brian Gerst
  2025-01-24 18:26         ` Charlie Jenkins
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2025-01-24 13:08 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
	WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Alexandre Ghiti, linux-riscv, linux-kernel, loongarch

On Fri, Jan 24, 2025 at 2:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> > Hi Charlie,
> >
> > On 23/01/2025 20:14, Charlie Jenkins wrote:
> > > This function was unified into a single function in commit ab9164dae273
> > > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > > However that imposed a performance degradation. Partially reverting this
> > > commit to have ret_from_fork() split again results in a 1% increase on
> > > the number of times fork is able to be called per second.
> > >
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > >   arch/riscv/include/asm/asm-prototypes.h |  3 ++-
> > >   arch/riscv/kernel/entry.S               | 13 ++++++++++---
> > >   arch/riscv/kernel/process.c             | 17 +++++++++++------
> > >   3 files changed, 23 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > >   DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > >   DECLARE_DO_ERROR_INFO(do_trap_break);
> > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> > >   asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > >   asmlinkage void do_page_fault(struct pt_regs *regs);
> > >   asmlinkage void do_irq(struct pt_regs *regs);
> > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > > --- a/arch/riscv/kernel/entry.S
> > > +++ b/arch/riscv/kernel/entry.S
> > > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > >   ASM_NOKPROBE(handle_kernel_stack_overflow)
> > >   #endif
> > > -SYM_CODE_START(ret_from_fork_asm)
> > > +SYM_CODE_START(ret_from_fork_kernel_asm)
> > >     call schedule_tail
> > >     move a0, s1 /* fn */
> > >     move a1, s0 /* fn_arg */
> > >     move a2, sp /* pt_regs */
> > > -   call ret_from_fork
> > > +   call ret_from_fork_kernel
> > >     j ret_from_exception
> > > -SYM_CODE_END(ret_from_fork_asm)
> > > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > > +
> > > +SYM_CODE_START(ret_from_fork_user_asm)
> > > +   call schedule_tail
> > > +   move a0, sp /* pt_regs */
> > > +   call ret_from_fork_user
> > > +   j ret_from_exception
> > > +SYM_CODE_END(ret_from_fork_user_asm)
> > >   #ifdef CONFIG_IRQ_STACKS
> > >   /*
> > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > > --- a/arch/riscv/kernel/process.c
> > > +++ b/arch/riscv/kernel/process.c
> > > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> > >   EXPORT_SYMBOL(__stack_chk_guard);
> > >   #endif
> > > -extern asmlinkage void ret_from_fork_asm(void);
> > > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > > +extern asmlinkage void ret_from_fork_user_asm(void);
> > >   void noinstr arch_cpu_idle(void)
> > >   {
> > > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > >     return 0;
> > >   }
> > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > >   {
> > > -   if (unlikely(fn))
> > > -           fn(fn_arg);
> > > +   fn(fn_arg);
> > >     syscall_exit_to_user_mode(regs);
> > >   }
> > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > > +{
> > > +   syscall_exit_to_user_mode(regs);
> > > +}
> > > +
> > >   int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >   {
> > >     unsigned long clone_flags = args->flags;
> > > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >             p->thread.s[0] = (unsigned long)args->fn;
> > >             p->thread.s[1] = (unsigned long)args->fn_arg;
> > > +           p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> > >     } else {
> > >             *childregs = *(current_pt_regs());
> > >             /* Turn off status.VS */
> > > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >             if (clone_flags & CLONE_SETTLS)
> > >                     childregs->tp = tls;
> > >             childregs->a0 = 0; /* Return value of fork() */
> > > -           p->thread.s[0] = 0;
> > > +           p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> > >     }
> > >     p->thread.riscv_v_flags = 0;
> > >     if (has_vector())
> > >             riscv_v_thread_alloc(p);
> > > -   p->thread.ra = (unsigned long)ret_from_fork_asm;
> > >     p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > >     return 0;
> > >   }
> > >
> >
> > Can you benchmark this change on some HW? I'm not sure we would indeed gain
> > this 1%.
>
> It reduces the syscall path by 3 instructions, two for not needing to
> move the fn and fn_args from:
>
> move a0, s1 /* fn */
> move a1, s0 /* fn_arg */
>
> And one for not needing to do the conditional. This one is also saved on
> kernel threads.
>
> It's a very small improvement, but there is only something like 100
> instructions along the direct syscall path so it ends up being a large
> percentage. On hardware moving registers is very cheap and this branch
> will be almost always be correctly predicted so the cost is close to
> zero. I just figured that since I am making changes around here it would
> be nice if it was optimal instead of being close to optimal.

That may be the case on the child process side, but compared to the
cost of the fork on the parent process side (allocating and
initializing a new task struct), it's miniscule.



Brian Gerst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
  2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
@ 2025-01-24 13:14   ` Brian Gerst
  2025-01-24 18:23     ` Charlie Jenkins
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Gerst @ 2025-01-24 13:14 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
	linux-riscv, linux-kernel, loongarch

On Thu, Jan 23, 2025 at 2:15 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Move the main section of ret_from_fork() to C to allow inlining of
> syscall_exit_to_user_mode().
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/asm-prototypes.h |  1 +
>  arch/riscv/kernel/entry.S               | 15 ++++++---------
>  arch/riscv/kernel/process.c             | 14 ++++++++++++--
>  3 files changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
> --- a/arch/riscv/include/asm/asm-prototypes.h
> +++ b/arch/riscv/include/asm/asm-prototypes.h
> @@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
>  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
>  DECLARE_DO_ERROR_INFO(do_trap_break);
>
> +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
>  asmlinkage void handle_bad_stack(struct pt_regs *regs);
>  asmlinkage void do_page_fault(struct pt_regs *regs);
>  asmlinkage void do_irq(struct pt_regs *regs);
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
>  ASM_NOKPROBE(handle_kernel_stack_overflow)
>  #endif
>
> -SYM_CODE_START(ret_from_fork)
> +SYM_CODE_START(ret_from_fork_asm)
>         call schedule_tail
> -       beqz s0, 1f     /* not from kernel thread */
> -       /* Call fn(arg) */
> -       move a0, s1
> -       jalr s0
> -1:
> -       move a0, sp /* pt_regs */
> -       call syscall_exit_to_user_mode
> +       move a0, s1 /* fn */
> +       move a1, s0 /* fn_arg */
> +       move a2, sp /* pt_regs */
> +       call ret_from_fork
>         j ret_from_exception
> -SYM_CODE_END(ret_from_fork)
> +SYM_CODE_END(ret_from_fork_asm)
>
>  #ifdef CONFIG_IRQ_STACKS
>  /*
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -17,7 +17,9 @@
>  #include <linux/ptrace.h>
>  #include <linux/uaccess.h>
>  #include <linux/personality.h>
> +#include <linux/entry-common.h>
>
> +#include <asm/asm-prototypes.h>
>  #include <asm/unistd.h>
>  #include <asm/processor.h>
>  #include <asm/csr.h>
> @@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
>  EXPORT_SYMBOL(__stack_chk_guard);
>  #endif
>
> -extern asmlinkage void ret_from_fork(void);
> +extern asmlinkage void ret_from_fork_asm(void);
>
>  void noinstr arch_cpu_idle(void)
>  {
> @@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
>         return 0;
>  }
>
> +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> +{
> +       if (unlikely(fn))
> +               fn(fn_arg);
> +
> +       syscall_exit_to_user_mode(regs);
> +}
> +
>  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  {
>         unsigned long clone_flags = args->flags;
> @@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>         p->thread.riscv_v_flags = 0;
>         if (has_vector())
>                 riscv_v_thread_alloc(p);
> -       p->thread.ra = (unsigned long)ret_from_fork;
> +       p->thread.ra = (unsigned long)ret_from_fork_asm;
>         p->thread.sp = (unsigned long)childregs; /* kernel sp */
>         return 0;
>  }
>
> --
> 2.43.0
>
>

Is there a specific reason you didn't move the call to schedule_tail()
to the C function, like on x86?


Brian Gerst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C
  2025-01-24 13:14   ` Brian Gerst
@ 2025-01-24 18:23     ` Charlie Jenkins
  0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:23 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Paul Walmsley, Palmer Dabbelt, Huacai Chen, WANG Xuerui,
	Thomas Gleixner, Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti,
	linux-riscv, linux-kernel, loongarch

On Fri, Jan 24, 2025 at 08:14:08AM -0500, Brian Gerst wrote:
> On Thu, Jan 23, 2025 at 2:15 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Move the main section of ret_from_fork() to C to allow inlining of
> > syscall_exit_to_user_mode().
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/asm-prototypes.h |  1 +
> >  arch/riscv/kernel/entry.S               | 15 ++++++---------
> >  arch/riscv/kernel/process.c             | 14 ++++++++++++--
> >  3 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644
> > --- a/arch/riscv/include/asm/asm-prototypes.h
> > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > @@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> >  DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> >  DECLARE_DO_ERROR_INFO(do_trap_break);
> >
> > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> >  asmlinkage void handle_bad_stack(struct pt_regs *regs);
> >  asmlinkage void do_page_fault(struct pt_regs *regs);
> >  asmlinkage void do_irq(struct pt_regs *regs);
> > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644
> > --- a/arch/riscv/kernel/entry.S
> > +++ b/arch/riscv/kernel/entry.S
> > @@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> >  ASM_NOKPROBE(handle_kernel_stack_overflow)
> >  #endif
> >
> > -SYM_CODE_START(ret_from_fork)
> > +SYM_CODE_START(ret_from_fork_asm)
> >         call schedule_tail
> > -       beqz s0, 1f     /* not from kernel thread */
> > -       /* Call fn(arg) */
> > -       move a0, s1
> > -       jalr s0
> > -1:
> > -       move a0, sp /* pt_regs */
> > -       call syscall_exit_to_user_mode
> > +       move a0, s1 /* fn */
> > +       move a1, s0 /* fn_arg */
> > +       move a2, sp /* pt_regs */
> > +       call ret_from_fork
> >         j ret_from_exception
> > -SYM_CODE_END(ret_from_fork)
> > +SYM_CODE_END(ret_from_fork_asm)
> >
> >  #ifdef CONFIG_IRQ_STACKS
> >  /*
> > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644
> > --- a/arch/riscv/kernel/process.c
> > +++ b/arch/riscv/kernel/process.c
> > @@ -17,7 +17,9 @@
> >  #include <linux/ptrace.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/personality.h>
> > +#include <linux/entry-common.h>
> >
> > +#include <asm/asm-prototypes.h>
> >  #include <asm/unistd.h>
> >  #include <asm/processor.h>
> >  #include <asm/csr.h>
> > @@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly;
> >  EXPORT_SYMBOL(__stack_chk_guard);
> >  #endif
> >
> > -extern asmlinkage void ret_from_fork(void);
> > +extern asmlinkage void ret_from_fork_asm(void);
> >
> >  void noinstr arch_cpu_idle(void)
> >  {
> > @@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >         return 0;
> >  }
> >
> > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > +{
> > +       if (unlikely(fn))
> > +               fn(fn_arg);
> > +
> > +       syscall_exit_to_user_mode(regs);
> > +}
> > +
> >  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >  {
> >         unsigned long clone_flags = args->flags;
> > @@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >         p->thread.riscv_v_flags = 0;
> >         if (has_vector())
> >                 riscv_v_thread_alloc(p);
> > -       p->thread.ra = (unsigned long)ret_from_fork;
> > +       p->thread.ra = (unsigned long)ret_from_fork_asm;
> >         p->thread.sp = (unsigned long)childregs; /* kernel sp */
> >         return 0;
> >  }
> >
> > --
> > 2.43.0
> >
> >
> 
> Is there a specific reason you didn't move the call to schedule_tail()
> to the C function, like on x86?

Yes, the generated code ends up being dramatically worse if
schedule_tail() is moved into C. This is because the arg for
schedule_tail() is already in a0 so the extra stack manipulation
instructions end up taking up a lot of instructions.

With this change:
<ret_from_fork_asm>:
       ff65b097                auipc   ra,0xff65b
       1ee080e7                jalr    494(ra) # ffffffff8005038a <schedule_tail>
       8526                    mv      a0,s1
       85a2                    mv      a1,s0
       860a                    mv      a2,sp
       ff61b097                auipc   ra,0xff61b
       606080e7                jalr    1542(ra) # ffffffff800107b0 <ret_from_fork>
       b5f5                    j       ffffffff809f509e <ret_from_exception>


<ret_from_fork>:
       1101                    addi    sp,sp,-32
       e822                    sd      s0,16(sp)
       ec06                    sd      ra,24(sp)
       1000                    addi    s0,sp,32
       e991                    bnez    a1,ffffffff800107cc <ret_from_fork+0x1c>
       8532                    mv      a0,a2
       009db097                auipc   ra,0x9db
       a32080e7                jalr    -1486(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
       60e2                    ld      ra,24(sp)
       6442                    ld      s0,16(sp)
       6105                    addi    sp,sp,32
       8082                    ret
       <following instructions used only in the kernel thread case>
       fec43423                sd      a2,-24(s0)
       9582                    jalr    a1
       fe843603                ld      a2,-24(s0)
       8532                    mv      a0,a2
       009db097                auipc   ra,0x9db
       a16080e7                jalr    -1514(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
       60e2                    ld      ra,24(sp)
       6442                    ld      s0,16(sp)
       6105                    addi    sp,sp,32
       8082                    ret

Contrasted with what this looks like if schedule_tail() is called from
C.

<ret_from_fork_asm>:
       85a6                    mv      a1,s1
       8622                    mv      a2,s0
       868a                    mv      a3,sp
       ff61b097                auipc   ra,0xff61b
       60e080e7                jalr    1550(ra) # ffffffff800107b0 <ret_from_fork>
       bdd5                    j       ffffffff809f509e <ret_from_exception>

<ret_from_fork>:
       7179                    addi    sp,sp,-48
       f022                    sd      s0,32(sp)
       ec26                    sd      s1,24(sp)
       e84a                    sd      s2,16(sp)
       e44e                    sd      s3,8(sp)
       f406                    sd      ra,40(sp)
       1800                    addi    s0,sp,48
       84b2                    mv      s1,a2
       89ae                    mv      s3,a1
       8936                    mv      s2,a3
       3c73f0ef                jal     ffffffff8005038a <schedule_tail>
       ec89                    bnez    s1,ffffffff800107e2 <ret_from_fork+0x32>
       854a                    mv      a0,s2
       009db097                auipc   ra,0x9db
       a22080e7                jalr    -1502(ra) # ffffffff809eb1ee <syscall_exit_to_user_mode>
       70a2                    ld      ra,40(sp)
       7402                    ld      s0,32(sp)
       64e2                    ld      s1,24(sp)
       6942                    ld      s2,16(sp)
       69a2                    ld      s3,8(sp)
       6145                    addi    sp,sp,48
       8082                    ret
       854e                    mv      a0,s3
       9482                    jalr    s1
       b7d5                    j       ffffffff800107ca <ret_from_fork+0x1a>


ret_from_fork_asm ends up being 2 instructions more when calling from
asm, but the user fork ret_from_fork ends up being only 12 instructions
rather than 22 instructions when calling from C. If we were able to mix
asm and C code in a naked function we would be able to get rid of the
stack manipulation and still be able to inline C but we don't live in
that world...

- Charlie

> 
> 
> Brian Gerst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel
  2025-01-24 13:08       ` Brian Gerst
@ 2025-01-24 18:26         ` Charlie Jenkins
  0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:26 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Huacai Chen,
	WANG Xuerui, Thomas Gleixner, Peter Zijlstra, Andy Lutomirski,
	Alexandre Ghiti, linux-riscv, linux-kernel, loongarch

On Fri, Jan 24, 2025 at 08:08:44AM -0500, Brian Gerst wrote:
> On Fri, Jan 24, 2025 at 2:53 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote:
> > > Hi Charlie,
> > >
> > > On 23/01/2025 20:14, Charlie Jenkins wrote:
> > > > This function was unified into a single function in commit ab9164dae273
> > > > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork").
> > > > However that imposed a performance degradation. Partially reverting this
> > > > commit to have ret_from_fork() split again results in a 1% increase on
> > > > the number of times fork is able to be called per second.
> > > >
> > > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > > ---
> > > >   arch/riscv/include/asm/asm-prototypes.h |  3 ++-
> > > >   arch/riscv/kernel/entry.S               | 13 ++++++++++---
> > > >   arch/riscv/kernel/process.c             | 17 +++++++++++------
> > > >   3 files changed, 23 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
> > > > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644
> > > > --- a/arch/riscv/include/asm/asm-prototypes.h
> > > > +++ b/arch/riscv/include/asm/asm-prototypes.h
> > > > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
> > > >   DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
> > > >   DECLARE_DO_ERROR_INFO(do_trap_break);
> > > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs);
> > > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs);
> > > >   asmlinkage void handle_bad_stack(struct pt_regs *regs);
> > > >   asmlinkage void do_page_fault(struct pt_regs *regs);
> > > >   asmlinkage void do_irq(struct pt_regs *regs);
> > > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> > > > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644
> > > > --- a/arch/riscv/kernel/entry.S
> > > > +++ b/arch/riscv/kernel/entry.S
> > > > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow)
> > > >   ASM_NOKPROBE(handle_kernel_stack_overflow)
> > > >   #endif
> > > > -SYM_CODE_START(ret_from_fork_asm)
> > > > +SYM_CODE_START(ret_from_fork_kernel_asm)
> > > >     call schedule_tail
> > > >     move a0, s1 /* fn */
> > > >     move a1, s0 /* fn_arg */
> > > >     move a2, sp /* pt_regs */
> > > > -   call ret_from_fork
> > > > +   call ret_from_fork_kernel
> > > >     j ret_from_exception
> > > > -SYM_CODE_END(ret_from_fork_asm)
> > > > +SYM_CODE_END(ret_from_fork_kernel_asm)
> > > > +
> > > > +SYM_CODE_START(ret_from_fork_user_asm)
> > > > +   call schedule_tail
> > > > +   move a0, sp /* pt_regs */
> > > > +   call ret_from_fork_user
> > > > +   j ret_from_exception
> > > > +SYM_CODE_END(ret_from_fork_user_asm)
> > > >   #ifdef CONFIG_IRQ_STACKS
> > > >   /*
> > > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> > > > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644
> > > > --- a/arch/riscv/kernel/process.c
> > > > +++ b/arch/riscv/kernel/process.c
> > > > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly;
> > > >   EXPORT_SYMBOL(__stack_chk_guard);
> > > >   #endif
> > > > -extern asmlinkage void ret_from_fork_asm(void);
> > > > +extern asmlinkage void ret_from_fork_kernel_asm(void);
> > > > +extern asmlinkage void ret_from_fork_user_asm(void);
> > > >   void noinstr arch_cpu_idle(void)
> > > >   {
> > > > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> > > >     return 0;
> > > >   }
> > > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs)
> > > >   {
> > > > -   if (unlikely(fn))
> > > > -           fn(fn_arg);
> > > > +   fn(fn_arg);
> > > >     syscall_exit_to_user_mode(regs);
> > > >   }
> > > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs)
> > > > +{
> > > > +   syscall_exit_to_user_mode(regs);
> > > > +}
> > > > +
> > > >   int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > >   {
> > > >     unsigned long clone_flags = args->flags;
> > > > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > >             p->thread.s[0] = (unsigned long)args->fn;
> > > >             p->thread.s[1] = (unsigned long)args->fn_arg;
> > > > +           p->thread.ra = (unsigned long)ret_from_fork_kernel_asm;
> > > >     } else {
> > > >             *childregs = *(current_pt_regs());
> > > >             /* Turn off status.VS */
> > > > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > >             if (clone_flags & CLONE_SETTLS)
> > > >                     childregs->tp = tls;
> > > >             childregs->a0 = 0; /* Return value of fork() */
> > > > -           p->thread.s[0] = 0;
> > > > +           p->thread.ra = (unsigned long)ret_from_fork_user_asm;
> > > >     }
> > > >     p->thread.riscv_v_flags = 0;
> > > >     if (has_vector())
> > > >             riscv_v_thread_alloc(p);
> > > > -   p->thread.ra = (unsigned long)ret_from_fork_asm;
> > > >     p->thread.sp = (unsigned long)childregs; /* kernel sp */
> > > >     return 0;
> > > >   }
> > > >
> > >
> > > Can you benchmark this change on some HW? I'm not sure we would indeed gain
> > > this 1%.
> >
> > It reduces the syscall path by 3 instructions, two for not needing to
> > move the fn and fn_args from:
> >
> > move a0, s1 /* fn */
> > move a1, s0 /* fn_arg */
> >
> > And one for not needing to do the conditional. This one is also saved on
> > kernel threads.
> >
> > It's a very small improvement, but there is only something like 100
> > instructions along the direct syscall path so it ends up being a large
> > percentage. On hardware moving registers is very cheap and this branch
> > will be almost always be correctly predicted so the cost is close to
> > zero. I just figured that since I am making changes around here it would
> > be nice if it was optimal instead of being close to optimal.
> 
> That may be the case on the child process side, but compared to the
> cost of the fork on the parent process side (allocating and
> initializing a new task struct), it's miniscule.
> 

Yes that is a good point. The change will have a probably unnoticeable
effect, mostly just depends on how people want this code to look.

- Charlie

> 
> 
> Brian Gerst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
  2025-01-24  9:05   ` Huacai Chen
@ 2025-01-24 18:28     ` Charlie Jenkins
  2025-01-24 22:23       ` Charlie Jenkins
  0 siblings, 1 reply; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 18:28 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
	Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
	linux-kernel, loongarch

On Fri, Jan 24, 2025 at 05:05:21PM +0800, Huacai Chen wrote:
> Hi, Charlie,
> 
> On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Loongarch is the only architecture that calls
> We usually use "LoongArch" instead of "loongarch" or "Loongarch".
> 
> > syscall_exit_to_user_mode() from asm. Move the call into C so that this
> > function can be inlined across all architectures.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/loongarch/include/asm/asm-prototypes.h |  5 +++++
> >  arch/loongarch/include/asm/switch_to.h      |  8 +++++++
> >  arch/loongarch/kernel/entry.S               | 22 +++++++++----------
> >  arch/loongarch/kernel/process.c             | 34 ++++++++++++++++++++++++-----
> >  4 files changed, 51 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> > index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> > --- a/arch/loongarch/include/asm/asm-prototypes.h
> > +++ b/arch/loongarch/include/asm/asm-prototypes.h
> > @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
> >  __int128_t __ashrti3(__int128_t a, int b);
> >  __int128_t __lshrti3(__int128_t a, int b);
> >  #endif
> > +
> > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > +                                                                   struct pt_regs *regs,
> > +                                                                   int (*fn)(void *),
> > +                                                                   void *fn_arg);
> It is a little strange that we only need to declare
> ret_from_kernel_thread() but not ret_from_fork().

Just an oversight by me, thank you for pointing that out.

> 
> > diff --git a/arch/loongarch/include/asm/switch_to.h b/arch/loongarch/include/asm/switch_to.h
> > index 5b225aff3ba21aa06d0713bc8e73e1b941389630..a1c5576f1fd145670e13038bec6dd390486099ab 100644
> > --- a/arch/loongarch/include/asm/switch_to.h
> > +++ b/arch/loongarch/include/asm/switch_to.h
> > @@ -26,6 +26,14 @@ extern asmlinkage struct task_struct *__switch_to(struct task_struct *prev,
> >                         struct task_struct *next, struct thread_info *next_ti,
> >                         void *sched_ra, void *sched_cfa);
> >
> > +void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > +                                                        struct pt_regs *regs,
> > +                                                        int (*fn)(void *),
> > +                                                        void *fn_arg);
> > +
> > +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev,
> > +                                               struct pt_regs *regs);
> > +
> I prefer alpha-betical order, which means put ret_from_fork() before
> ret_from_kernel_thread().

Makes sense!

> 
> >  /*
> >   * For newly created kernel threads switch_to() will return to
> >   * ret_from_kernel_thread, newly created user threads to ret_from_fork.
> > diff --git a/arch/loongarch/kernel/entry.S b/arch/loongarch/kernel/entry.S
> > index 48e7e34e355e83eae8165957ba2eac05a8bf17df..2abc29e573810e000f2fef4646ddca0dbb80eabe 100644
> > --- a/arch/loongarch/kernel/entry.S
> > +++ b/arch/loongarch/kernel/entry.S
> > @@ -77,24 +77,22 @@ SYM_CODE_START(handle_syscall)
> >  SYM_CODE_END(handle_syscall)
> >  _ASM_NOKPROBE(handle_syscall)
> >
> > -SYM_CODE_START(ret_from_fork)
> > +SYM_CODE_START(ret_from_fork_asm)
> >         UNWIND_HINT_REGS
> > -       bl              schedule_tail           # a0 = struct task_struct *prev
> > -       move            a0, sp
> > -       bl              syscall_exit_to_user_mode
> > +       move            a1, sp
> > +       bl              ret_from_fork
> >         RESTORE_STATIC
> >         RESTORE_SOME
> >         RESTORE_SP_AND_RET
> > -SYM_CODE_END(ret_from_fork)
> > +SYM_CODE_END(ret_from_fork_asm)
> >
> > -SYM_CODE_START(ret_from_kernel_thread)
> > +SYM_CODE_START(ret_from_kernel_thread_asm)
> >         UNWIND_HINT_REGS
> > -       bl              schedule_tail           # a0 = struct task_struct *prev
> > -       move            a0, s1
> > -       jirl            ra, s0, 0
> > -       move            a0, sp
> > -       bl              syscall_exit_to_user_mode
> > +       move            a1, sp
> > +       move            a2, s0
> > +       move            a3, s1
> > +       bl              ret_from_kernel_thread
> >         RESTORE_STATIC
> >         RESTORE_SOME
> >         RESTORE_SP_AND_RET
> > -SYM_CODE_END(ret_from_kernel_thread)
> > +SYM_CODE_END(ret_from_kernel_thread_asm)
> > diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
> > index 6e58f65455c7ca3eae2e88ed852c8655a6701e5c..16cc949fe43443d70f1d865ce04595c2d8c1615b 100644
> > --- a/arch/loongarch/kernel/process.c
> > +++ b/arch/loongarch/kernel/process.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> >  #include <linux/errno.h>
> > +#include <linux/entry-common.h>
> >  #include <linux/sched.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched/task.h>
> > @@ -33,6 +34,7 @@
> >  #include <linux/prctl.h>
> >  #include <linux/nmi.h>
> >
> > +#include <asm/asm-prototypes.h>
> >  #include <asm/asm.h>
> >  #include <asm/bootinfo.h>
> >  #include <asm/cpu.h>
> > @@ -47,6 +49,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/processor.h>
> >  #include <asm/reg.h>
> > +#include <asm/switch_to.h>
> >  #include <asm/unwind.h>
> >  #include <asm/vdso.h>
> >
> > @@ -63,8 +66,9 @@ EXPORT_SYMBOL(__stack_chk_guard);
> >  unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> >  EXPORT_SYMBOL(boot_option_idle_override);
> >
> > -asmlinkage void ret_from_fork(void);
> > -asmlinkage void ret_from_kernel_thread(void);
> > +asmlinkage void restore_and_ret(void);
> > +asmlinkage void ret_from_fork_asm(void);
> > +asmlinkage void ret_from_kernel_thread_asm(void);
> >
> >  void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
> >  {
> > @@ -138,6 +142,24 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
> >         return 0;
> >  }
> >
> > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > +                                                                   struct pt_regs *regs,
> > +                                                                   int (*fn)(void *),
> > +                                                                   void *fn_arg)
> > +{
> > +       schedule_tail(prev);
> > +
> > +       fn(fn_arg);
> > +
> The two blank lines can be removed, and again, I prefer alpha-betical
> order, which means put ret_from_fork() before
> ret_from_kernel_thread().

Will update in the next version.

- Charlie

> 
> Huacai
> 
> > +       syscall_exit_to_user_mode(regs);
> > +}
> > +
> > +void noinstr __no_stack_protector ret_from_fork(struct task_struct *prev, struct pt_regs *regs)
> > +{
> > +       schedule_tail(prev);
> > +       syscall_exit_to_user_mode(regs);
> > +}
> > +
> >  /*
> >   * Copy architecture-specific thread state
> >   */
> > @@ -165,8 +187,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >                 p->thread.reg03 = childksp;
> >                 p->thread.reg23 = (unsigned long)args->fn;
> >                 p->thread.reg24 = (unsigned long)args->fn_arg;
> > -               p->thread.reg01 = (unsigned long)ret_from_kernel_thread;
> > -               p->thread.sched_ra = (unsigned long)ret_from_kernel_thread;
> > +               p->thread.reg01 = (unsigned long)ret_from_kernel_thread_asm;
> > +               p->thread.sched_ra = (unsigned long)ret_from_kernel_thread_asm;
> >                 memset(childregs, 0, sizeof(struct pt_regs));
> >                 childregs->csr_euen = p->thread.csr_euen;
> >                 childregs->csr_crmd = p->thread.csr_crmd;
> > @@ -182,8 +204,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >                 childregs->regs[3] = usp;
> >
> >         p->thread.reg03 = (unsigned long) childregs;
> > -       p->thread.reg01 = (unsigned long) ret_from_fork;
> > -       p->thread.sched_ra = (unsigned long) ret_from_fork;
> > +       p->thread.reg01 = (unsigned long) ret_from_fork_asm;
> > +       p->thread.sched_ra = (unsigned long) ret_from_fork_asm;
> >
> >         /*
> >          * New tasks lose permission to use the fpu. This accelerates context
> >
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C
  2025-01-24 18:28     ` Charlie Jenkins
@ 2025-01-24 22:23       ` Charlie Jenkins
  0 siblings, 0 replies; 14+ messages in thread
From: Charlie Jenkins @ 2025-01-24 22:23 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Paul Walmsley, Palmer Dabbelt, WANG Xuerui, Thomas Gleixner,
	Peter Zijlstra, Andy Lutomirski, Alexandre Ghiti, linux-riscv,
	linux-kernel, loongarch

On Fri, Jan 24, 2025 at 10:28:50AM -0800, Charlie Jenkins wrote:
> On Fri, Jan 24, 2025 at 05:05:21PM +0800, Huacai Chen wrote:
> > Hi, Charlie,
> > 
> > On Fri, Jan 24, 2025 at 3:15 AM Charlie Jenkins <charlie@rivosinc.com> wrote:
> > >
> > > Loongarch is the only architecture that calls
> > We usually use "LoongArch" instead of "loongarch" or "Loongarch".
> > 
> > > syscall_exit_to_user_mode() from asm. Move the call into C so that this
> > > function can be inlined across all architectures.
> > >
> > > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > > ---
> > >  arch/loongarch/include/asm/asm-prototypes.h |  5 +++++
> > >  arch/loongarch/include/asm/switch_to.h      |  8 +++++++
> > >  arch/loongarch/kernel/entry.S               | 22 +++++++++----------
> > >  arch/loongarch/kernel/process.c             | 34 ++++++++++++++++++++++++-----
> > >  4 files changed, 51 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/arch/loongarch/include/asm/asm-prototypes.h b/arch/loongarch/include/asm/asm-prototypes.h
> > > index 51f224bcfc654228ae423e9a066b25b35102a5b9..0195d4309fd29f94664d5f34247198c769033b1b 100644
> > > --- a/arch/loongarch/include/asm/asm-prototypes.h
> > > +++ b/arch/loongarch/include/asm/asm-prototypes.h
> > > @@ -12,3 +12,8 @@ __int128_t __ashlti3(__int128_t a, int b);
> > >  __int128_t __ashrti3(__int128_t a, int b);
> > >  __int128_t __lshrti3(__int128_t a, int b);
> > >  #endif
> > > +
> > > +asmlinkage void noinstr __no_stack_protector ret_from_kernel_thread(struct task_struct *prev,
> > > +                                                                   struct pt_regs *regs,
> > > +                                                                   int (*fn)(void *),
> > > +                                                                   void *fn_arg);
> > It is a little strange that we only need to declare
> > ret_from_kernel_thread() but not ret_from_fork().
> 
> Just an oversight by me, thank you for pointing that out.

Oh I see what I did, I meant to put these functions in asm-prototypes
and not in switch_to but I ended up putting them in both.

- Charlie


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-01-24 22:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 19:14 [PATCH v2 0/4] entry: Move ret_from_fork() to C and inline syscall_exit_to_user_mode() Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Charlie Jenkins
2025-01-24 13:14   ` Brian Gerst
2025-01-24 18:23     ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Charlie Jenkins
2025-01-24  7:19   ` Alexandre Ghiti
2025-01-24  7:53     ` Charlie Jenkins
2025-01-24 13:08       ` Brian Gerst
2025-01-24 18:26         ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 3/4] loongarch: entry: Migrate ret_from_fork() to C Charlie Jenkins
2025-01-24  9:05   ` Huacai Chen
2025-01-24 18:28     ` Charlie Jenkins
2025-01-24 22:23       ` Charlie Jenkins
2025-01-23 19:14 ` [PATCH v2 4/4] entry: Inline syscall_exit_to_user_mode() Charlie Jenkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).