linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS
@ 2025-04-07 18:08 Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 02/12] riscv: ftrace factor out code defined by !WITH_ARG Andy Chiu
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Evgenii Shatokhin, Nathan Chancellor,
	Björn Töpel, Palmer Dabbelt, Puranjay Mohan,
	linux-kernel, linux-trace-kernel, llvm, Mark Rutland,
	Alexandre Ghiti, Nick Desaulniers, Bill Wendling, Justin Stitt,
	puranjay12, paul.walmsley, greentime.hu, nick.hu, nylon.chen,
	eric.lin, vicent.chen, zong.li, yongxuan.wang, samuel.holland,
	olivia.chu, c2232430

From: Andy Chiu <andy.chiu@sifive.com>

Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.

- [1]: https://reviews.llvm.org/D68559

Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS")
Acked-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>

---
Changelogs v4:
 - Add a fix tag (Björn, Evgenii)
---
 arch/riscv/include/asm/ftrace.h |  7 +++++++
 arch/riscv/kernel/asm-offsets.c |  7 +++++++
 arch/riscv/kernel/mcount-dyn.S  | 16 ++++++++++++++--
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index d627f63ee289..d8b2138bd9c6 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -146,6 +146,13 @@ struct __arch_ftrace_regs {
 			unsigned long a5;
 			unsigned long a6;
 			unsigned long a7;
+#ifdef CONFIG_CC_IS_CLANG
+			unsigned long t2;
+			unsigned long t3;
+			unsigned long t4;
+			unsigned long t5;
+			unsigned long t6;
+#endif
 		};
 	};
 };
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 16490755304e..7c43c8e26ae7 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -501,6 +501,13 @@ void asm_offsets(void)
 	DEFINE(FREGS_SP,	    offsetof(struct __arch_ftrace_regs, sp));
 	DEFINE(FREGS_S0,	    offsetof(struct __arch_ftrace_regs, s0));
 	DEFINE(FREGS_T1,	    offsetof(struct __arch_ftrace_regs, t1));
+#ifdef CONFIG_CC_IS_CLANG
+	DEFINE(FREGS_T2,	    offsetof(struct __arch_ftrace_regs, t2));
+	DEFINE(FREGS_T3,	    offsetof(struct __arch_ftrace_regs, t3));
+	DEFINE(FREGS_T4,	    offsetof(struct __arch_ftrace_regs, t4));
+	DEFINE(FREGS_T5,	    offsetof(struct __arch_ftrace_regs, t5));
+	DEFINE(FREGS_T6,	    offsetof(struct __arch_ftrace_regs, t6));
+#endif
 	DEFINE(FREGS_A0,	    offsetof(struct __arch_ftrace_regs, a0));
 	DEFINE(FREGS_A1,	    offsetof(struct __arch_ftrace_regs, a1));
 	DEFINE(FREGS_A2,	    offsetof(struct __arch_ftrace_regs, a2));
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 745dd4c4a69c..e988bd26b28b 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -96,7 +96,13 @@
 	REG_S	x8,  FREGS_S0(sp)
 #endif
 	REG_S	x6,  FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+	REG_S	x7,  FREGS_T2(sp)
+	REG_S	x28, FREGS_T3(sp)
+	REG_S	x29, FREGS_T4(sp)
+	REG_S	x30, FREGS_T5(sp)
+	REG_S	x31, FREGS_T6(sp)
+#endif
 	// save the arguments
 	REG_S	x10, FREGS_A0(sp)
 	REG_S	x11, FREGS_A1(sp)
@@ -115,7 +121,13 @@
 	REG_L	x8, FREGS_S0(sp)
 #endif
 	REG_L	x6,  FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+	REG_L	x7,  FREGS_T2(sp)
+	REG_L	x28, FREGS_T3(sp)
+	REG_L	x29, FREGS_T4(sp)
+	REG_L	x30, FREGS_T5(sp)
+	REG_L	x31, FREGS_T6(sp)
+#endif
 	// restore the arguments
 	REG_L	x10, FREGS_A0(sp)
 	REG_L	x11, FREGS_A1(sp)
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 02/12] riscv: ftrace factor out code defined by !WITH_ARG
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi Andy Chiu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, linux-kernel, linux-trace-kernel, Mark Rutland,
	Alexandre Ghiti, bjorn, puranjay12, paul.walmsley, greentime.hu,
	nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in
riscv, so we can remove ifdef jargons of WITH_ARG when it is known that
DYNAMIC_FTRACE is true.

Signed-off-by: Andy Chiu <andybnac@gmail.com>
---
Changelog v4:
 - Add anew patch that makes ftrace and its asm easier to maintain
---
 arch/riscv/kernel/ftrace.c     | 15 ---------------
 arch/riscv/kernel/mcount-dyn.S | 34 ----------------------------------
 2 files changed, 49 deletions(-)

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 674dcdfae7a1..1fd10555c580 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -210,7 +210,6 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
 }
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
 void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs)
 {
@@ -231,19 +230,5 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 	if (!function_graph_enter_regs(old, ip, frame_pointer, parent, fregs))
 		*parent = return_hooker;
 }
-#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
-extern void ftrace_graph_call(void);
-int ftrace_enable_ftrace_graph_caller(void)
-{
-	return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, true, true);
-}
-
-int ftrace_disable_ftrace_graph_caller(void)
-{
-	return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
-				    (unsigned long)&prepare_ftrace_return, false, true);
-}
-#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
 #endif /* CONFIG_DYNAMIC_FTRACE */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index e988bd26b28b..3f06b40bb6c8 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -56,8 +56,6 @@
 	addi	sp, sp, ABI_SIZE_ON_STACK
 	.endm
 
-#ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
-
 /**
 * SAVE_ABI_REGS - save regs against the ftrace_regs struct
 *
@@ -149,36 +147,6 @@
 	mv	a3, sp
 	.endm
 
-#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
-
-#ifndef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
-SYM_FUNC_START(ftrace_caller)
-	SAVE_ABI
-
-	addi	a0, t0, -FENTRY_RA_OFFSET
-	la	a1, function_trace_op
-	REG_L	a2, 0(a1)
-	mv	a1, ra
-	mv	a3, sp
-
-SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
-	call	ftrace_stub
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-	addi	a0, sp, ABI_RA
-	REG_L	a1, ABI_T0(sp)
-	addi	a1, a1, -FENTRY_RA_OFFSET
-#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	mv	a2, s0
-#endif
-SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
-	call	ftrace_stub
-#endif
-	RESTORE_ABI
-	jr	t0
-SYM_FUNC_END(ftrace_caller)
-
-#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
 SYM_FUNC_START(ftrace_caller)
 	mv	t1, zero
 	SAVE_ABI_REGS
@@ -194,8 +162,6 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	jr	t1
 SYM_FUNC_END(ftrace_caller)
 
-#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
-
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 SYM_CODE_START(ftrace_stub_direct_tramp)
 	jr	t0
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 02/12] riscv: ftrace factor out code defined by !WITH_ARG Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-08 22:31   ` kernel test robot
  2025-04-07 18:08 ` [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, linux-kernel, linux-trace-kernel, Mark Rutland,
	Mathieu Desnoyers, Alexandre Ghiti, bjorn, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

The following ftrace patch for riscv uses a data store to update ftrace
function. Therefore, a romote fence is required to order it against
function_trace_op updates. The mechanism is similar to the fence between
function_trace_op and update_ftrace_func in the generic ftrace, so we
leverage the same ftrace_sync_ipi function.

Signed-off-by: Andy Chiu <andybnac@gmail.com>
---
 include/linux/ftrace.h | 1 +
 kernel/trace/ftrace.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index fbabc3d848b3..0d4eec574707 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -807,6 +807,7 @@ extern void ftrace_call(void);
 extern void ftrace_regs_call(void);
 extern void mcount_call(void);
 
+void ftrace_sync_ipi(void *data);
 void ftrace_modify_all_code(int command);
 
 #ifndef FTRACE_ADDR
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index ee662f380b61..d06bd4a046de 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -188,7 +188,7 @@ static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
 	op->saved_func(ip, parent_ip, op, fregs);
 }
 
-static void ftrace_sync_ipi(void *data)
+void ftrace_sync_ipi(void *data)
 {
 	/* Probably not needed, but do it anyway */
 	smp_rmb();
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 02/12] riscv: ftrace factor out code defined by !WITH_ARG Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-11 13:15   ` Robbin Ehn
  2025-04-23  8:22   ` Alexandre Ghiti
  2025-04-07 18:08 ` [PATCH v4 06/12] riscv: ftrace: do not use stop_machine to update code Andy Chiu
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Mark Rutland, Alexandre Ghiti, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

From: Andy Chiu <andy.chiu@sifive.com>

We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.

Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.

The new don't-stop-the-world text patching on change only one RISC-V
instruction:

  |  -8: &ftrace_ops of the associated tracer function.
  | <ftrace enable>:
  |   0: auipc  t0, hi(ftrace_caller)
  |   4: jalr   t0, lo(ftrace_caller)
  |
  |  -8: &ftrace_nop_ops
  | <ftrace disable>:
  |   0: auipc  t0, hi(ftrace_caller)
  |   4: nop

This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.

Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v4:
 - Include Björn's fix for kprobe
 - Refactor code for better reading (Robbin, Björn)
 - Remove make_call_ra and friedns (Björn)
 - Update comments to match reality (Björn)
 - Drop code defined by !WITH_ARG
 - Add a synchronization point when updating ftrace_call_dest (Björn)
---
 arch/riscv/include/asm/ftrace.h |  49 ++++++------
 arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
 arch/riscv/kernel/mcount-dyn.S  |   9 +--
 3 files changed, 92 insertions(+), 96 deletions(-)

diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index d8b2138bd9c6..6a5c0a7fb826 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
 #define ftrace_return_address(n) return_address(n)
 
 void _mcount(void);
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
-	return addr;
-}
+unsigned long ftrace_call_adjust(unsigned long addr);
+unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
+#define ftrace_get_symaddr(fentry_ip) arch_ftrace_get_symaddr(fentry_ip)
 
 /*
  * Let's do like x86/arm64 and ignore the compat syscalls.
@@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
  * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
  *          return address (original pc + 4)
  *
+ * The first 2 instructions for each tracable function is compiled to 2 nop
+ * instructions. Then, the kernel initializes the first instruction to auipc at
+ * boot time (<ftrace disable>). The second instruction is patched to jalr to
+ * start the trace.
+ *
+ *<Image>:
+ * 0: nop
+ * 4: nop
+ *
  *<ftrace enable>:
- * 0: auipc  t0/ra, 0x?
- * 4: jalr   t0/ra, ?(t0/ra)
+ * 0: auipc  t0, 0x?
+ * 4: jalr   t0, ?(t0)
  *
  *<ftrace disable>:
- * 0: nop
+ * 0: auipc  t0, 0x?
  * 4: nop
  *
  * Dynamic ftrace generates probes to call sites, so we must deal with
@@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
 #define AUIPC_OFFSET_MASK	(0xfffff000)
 #define AUIPC_PAD		(0x00001000)
 #define JALR_SHIFT		20
-#define JALR_RA			(0x000080e7)
-#define AUIPC_RA		(0x00000097)
 #define JALR_T0			(0x000282e7)
 #define AUIPC_T0		(0x00000297)
+#define JALR_RANGE		(JALR_SIGN_MASK - 1)
 
 #define to_jalr_t0(offset)						\
 	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
@@ -96,26 +103,14 @@ do {									\
 	call[1] = to_jalr_t0(offset);					\
 } while (0)
 
-#define to_jalr_ra(offset)						\
-	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
-
-#define to_auipc_ra(offset)						\
-	((offset & JALR_SIGN_MASK) ?					\
-	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :	\
-	((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
-
-#define make_call_ra(caller, callee, call)				\
-do {									\
-	unsigned int offset =						\
-		(unsigned long) (callee) - (unsigned long) (caller);	\
-	call[0] = to_auipc_ra(offset);					\
-	call[1] = to_jalr_ra(offset);					\
-} while (0)
-
 /*
- * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
+ * Only the jalr insn in the auipc+jalr is patched, so we make it 4
+ * bytes here.
  */
-#define MCOUNT_INSN_SIZE 8
+#define MCOUNT_INSN_SIZE	4
+#define MCOUNT_AUIPC_SIZE	4
+#define MCOUNT_JALR_SIZE	4
+#define MCOUNT_NOP4_SIZE	4
 
 #ifndef __ASSEMBLY__
 struct dyn_ftrace;
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 1fd10555c580..cf78eef073a0 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -8,10 +8,21 @@
 #include <linux/ftrace.h>
 #include <linux/uaccess.h>
 #include <linux/memory.h>
+#include <linux/irqflags.h>
 #include <linux/stop_machine.h>
 #include <asm/cacheflush.h>
 #include <asm/text-patching.h>
 
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+	return addr + MCOUNT_AUIPC_SIZE;
+}
+
+unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
+{
+	return fentry_ip - MCOUNT_AUIPC_SIZE;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
 {
@@ -32,51 +43,32 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
 	mutex_unlock(&text_mutex);
 }
 
-static int ftrace_check_current_call(unsigned long hook_pos,
-				     unsigned int *expected)
+static int __ftrace_modify_call(unsigned long source, unsigned long target, bool validate)
 {
+	unsigned int call[2], offset;
 	unsigned int replaced[2];
-	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
 
-	/* we expect nops at the hook position */
-	if (!expected)
-		expected = nops;
+	offset = target - source;
+	call[1] = to_jalr_t0(offset);
 
-	/*
-	 * Read the text we want to modify;
-	 * return must be -EFAULT on read error
-	 */
-	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
-			MCOUNT_INSN_SIZE))
-		return -EFAULT;
-
-	/*
-	 * Make sure it is what we expect it to be;
-	 * return must be -EINVAL on failed comparison
-	 */
-	if (memcmp(expected, replaced, sizeof(replaced))) {
-		pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
-		       (void *)hook_pos, expected[0], expected[1], replaced[0],
-		       replaced[1]);
-		return -EINVAL;
+	if (validate) {
+		call[0] = to_auipc_t0(offset);
+		/*
+		 * Read the text we want to modify;
+		 * return must be -EFAULT on read error
+		 */
+		if (copy_from_kernel_nofault(replaced, (void *)source, 2 * MCOUNT_INSN_SIZE))
+			return -EFAULT;
+
+		if (replaced[0] != call[0]) {
+			pr_err("%p: expected (%08x) but got (%08x)\n",
+			       (void *)source, call[0], replaced[0]);
+			return -EINVAL;
+		}
 	}
 
-	return 0;
-}
-
-static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
-				bool enable, bool ra)
-{
-	unsigned int call[2];
-	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
-
-	if (ra)
-		make_call_ra(hook_pos, target, call);
-	else
-		make_call_t0(hook_pos, target, call);
-
-	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
-	if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+	/* Replace the jalr at once. Return -EPERM on write error. */
+	if (patch_insn_write((void *)(source + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -84,22 +76,21 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned int call[2];
+	unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE;
 
-	make_call_t0(rec->ip, addr, call);
-
-	if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
-		return -EPERM;
+	orig_addr = (unsigned long)&ftrace_caller;
+	distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
+	if (distance > JALR_RANGE)
+		return -EINVAL;
 
-	return 0;
+	return __ftrace_modify_call(pc, addr, false);
 }
 
-int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
-		    unsigned long addr)
+int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
 {
-	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
+	u32 nop4 = RISCV_INSN_NOP4;
 
-	if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
+	if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -114,21 +105,38 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
  */
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 {
-	int out;
+	unsigned long pc = rec->ip - MCOUNT_AUIPC_SIZE;
+	unsigned int nops[2], offset;
+	int ret;
+
+	offset = (unsigned long) &ftrace_caller - pc;
+	nops[0] = to_auipc_t0(offset);
+	nops[1] = RISCV_INSN_NOP4;
 
 	mutex_lock(&text_mutex);
-	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+	ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE);
 	mutex_unlock(&text_mutex);
 
-	return out;
+	return ret;
 }
 
+ftrace_func_t ftrace_call_dest = ftrace_stub;
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
-	int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
-				       (unsigned long)func, true, true);
-
-	return ret;
+	WRITE_ONCE(ftrace_call_dest, func);
+	/*
+	 * The data fence ensure that the update to ftrace_call_dest happens
+	 * before the write to function_trace_op later in the generic ftrace.
+	 * If the sequence is not enforced, then an old ftrace_call_dest may
+	 * race loading a new function_trace_op set in ftrace_modify_all_code
+	 *
+	 * If we are in stop_machine, then we don't need to call remote fence
+	 * as there is no concurrent read-side of ftrace_call_dest.
+	 */
+	smp_wmb();
+	if (!irqs_disabled())
+		smp_call_function(ftrace_sync_ipi, NULL, 1);
+	return 0;
 }
 
 struct ftrace_modify_param {
@@ -172,17 +180,11 @@ void arch_ftrace_update_code(int command)
 int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		       unsigned long addr)
 {
+	unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
 	unsigned int call[2];
-	unsigned long caller = rec->ip;
-	int ret;
 
 	make_call_t0(caller, old_addr, call);
-	ret = ftrace_check_current_call(caller, call);
-
-	if (ret)
-		return ret;
-
-	return __ftrace_modify_call(caller, addr, true, false);
+	return __ftrace_modify_call(caller, addr, true);
 }
 #endif
 
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 3f06b40bb6c8..8aa554d56096 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -13,7 +13,6 @@
 
 	.text
 
-#define FENTRY_RA_OFFSET	8
 #define ABI_SIZE_ON_STACK	80
 #define ABI_A0			0
 #define ABI_A1			8
@@ -62,8 +61,7 @@
 * After the stack is established,
 *
 * 0(sp) stores the PC of the traced function which can be accessed
-* by &(fregs)->epc in tracing function. Note that the real
-* function entry address should be computed with -FENTRY_RA_OFFSET.
+* by &(fregs)->epc in tracing function.
 *
 * 8(sp) stores the function return address (i.e. parent IP) that
 * can be accessed by &(fregs)->ra in tracing function.
@@ -140,7 +138,7 @@
 	.endm
 
 	.macro PREPARE_ARGS
-	addi	a0, t0, -FENTRY_RA_OFFSET
+	addi	a0, t0, -MCOUNT_JALR_SIZE	// ip (callsite's jalr insn)
 	la	a1, function_trace_op
 	REG_L	a2, 0(a1)
 	mv	a1, ra
@@ -153,7 +151,8 @@ SYM_FUNC_START(ftrace_caller)
 	PREPARE_ARGS
 
 SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
-	call	ftrace_stub
+	REG_L	ra, ftrace_call_dest
+	jalr	ra, 0(ra)
 
 	RESTORE_ABI_REGS
 	bnez	t1, .Ldirect
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 06/12] riscv: ftrace: do not use stop_machine to update code
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
                   ` (2 preceding siblings ...)
  2025-04-07 18:08 ` [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 10/12] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Andy Chiu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, linux-kernel, linux-trace-kernel, Mark Rutland,
	Alexandre Ghiti, bjorn, puranjay12, paul.walmsley, greentime.hu,
	nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

From: Andy Chiu <andy.chiu@sifive.com>

Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
Changelog v4:
 - assume ftrace_update_ftrace_func is always called with irqs enabled
---
 arch/riscv/kernel/ftrace.c | 64 ++++++--------------------------------
 1 file changed, 10 insertions(+), 54 deletions(-)

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index cf78eef073a0..aca1a322e0aa 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -24,23 +24,13 @@ unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
 }
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
+void arch_ftrace_update_code(int command)
 {
 	mutex_lock(&text_mutex);
-
-	/*
-	 * The code sequences we use for ftrace can't be patched while the
-	 * kernel is running, so we need to use stop_machine() to modify them
-	 * for now.  This doesn't play nice with text_mutex, we use this flag
-	 * to elide the check.
-	 */
-	riscv_patch_in_stop_machine = true;
-}
-
-void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
-{
-	riscv_patch_in_stop_machine = false;
+	command |= FTRACE_MAY_SLEEP;
+	ftrace_modify_all_code(command);
 	mutex_unlock(&text_mutex);
+	flush_icache_all();
 }
 
 static int __ftrace_modify_call(unsigned long source, unsigned long target, bool validate)
@@ -129,51 +119,17 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	 * before the write to function_trace_op later in the generic ftrace.
 	 * If the sequence is not enforced, then an old ftrace_call_dest may
 	 * race loading a new function_trace_op set in ftrace_modify_all_code
-	 *
-	 * If we are in stop_machine, then we don't need to call remote fence
-	 * as there is no concurrent read-side of ftrace_call_dest.
 	 */
 	smp_wmb();
-	if (!irqs_disabled())
-		smp_call_function(ftrace_sync_ipi, NULL, 1);
-	return 0;
-}
-
-struct ftrace_modify_param {
-	int command;
-	atomic_t cpu_count;
-};
-
-static int __ftrace_modify_code(void *data)
-{
-	struct ftrace_modify_param *param = data;
-
-	if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
-		ftrace_modify_all_code(param->command);
-		/*
-		 * Make sure the patching store is effective *before* we
-		 * increment the counter which releases all waiting CPUs
-		 * by using the release variant of atomic increment. The
-		 * release pairs with the call to local_flush_icache_all()
-		 * on the waiting CPU.
-		 */
-		atomic_inc_return_release(&param->cpu_count);
-	} else {
-		while (atomic_read(&param->cpu_count) <= num_online_cpus())
-			cpu_relax();
-
-		local_flush_icache_all();
-	}
-
+	/*
+	 * Updating ftrace dpes not take stop_machine path, so irqs should not
+	 * be disabled.
+	 */
+	WARN_ON(irqs_disabled());
+	smp_call_function(ftrace_sync_ipi, NULL, 1);
 	return 0;
 }
 
-void arch_ftrace_update_code(int command)
-{
-	struct ftrace_modify_param param = { command, ATOMIC_INIT(0) };
-
-	stop_machine(__ftrace_modify_code, &param, cpu_online_mask);
-}
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 10/12] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
                   ` (3 preceding siblings ...)
  2025-04-07 18:08 ` [PATCH v4 06/12] riscv: ftrace: do not use stop_machine to update code Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-07 18:08 ` [PATCH v4 11/12] riscv: ftrace: support direct call using call_ops Andy Chiu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Puranjay Mohan, Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Alexandre Ghiti, Mark Rutland, paul.walmsley,
	greentime.hu, nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

From: Puranjay Mohan <puranjay12@gmail.com>

This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.

The idea and most of the implementation is taken from the ARM64's
implementation of the same feature. The idea is to place a pointer to
the ftrace_ops as a literal at a fixed offset from the function entry
point, which can be recovered by the common ftrace trampoline.

We use -fpatchable-function-entry to reserve 8 bytes above the function
entry by emitting 2 4 byte or 4 2 byte  nops depending on the presence of
CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
to the associated ftrace_ops for that callsite. Functions are aligned to
8 bytes to make sure that the accesses to this literal are atomic.

This approach allows for directly invoking ftrace_ops::func even for
ftrace_ops which are dynamically-allocated (or part of a module),
without going via ftrace_ops_list_func.

We've benchamrked this with the ftrace_ops sample module on Spacemit K1
Jupiter:

Without this patch:

baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
|  Number of tracers    | Total time (ns) | Per-call average time      |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
|        0 |          0 |        1357958 |          13 |             - |
|        0 |          1 |        1302375 |          13 |             - |
|        0 |          2 |        1302375 |          13 |             - |
|        0 |         10 |        1379084 |          13 |             - |
|        0 |        100 |        1302458 |          13 |             - |
|        0 |        200 |        1302333 |          13 |             - |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |       13677833 |         136 |           123 |
|        1 |          1 |       18500916 |         185 |           172 |
|        1 |          2 |       22856459 |         228 |           215 |
|        1 |         10 |       58824709 |         588 |           575 |
|        1 |        100 |      505141584 |        5051 |          5038 |
|        1 |        200 |     1580473126 |       15804 |         15791 |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |       13561000 |         135 |           122 |
|        2 |          0 |       19707292 |         197 |           184 |
|       10 |          0 |       67774750 |         677 |           664 |
|      100 |          0 |      714123125 |        7141 |          7128 |
|      200 |          0 |     1918065668 |       19180 |         19167 |
+----------+------------+-----------------+------------+---------------+

Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.

With this patch:

v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
|  Number of tracers    | Total time (ns) | Per-call average time      |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant |    100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
|        0 |          0 |         1459917 |         14 |             - |
|        0 |          1 |         1408000 |         14 |             - |
|        0 |          2 |         1383792 |         13 |             - |
|        0 |         10 |         1430709 |         14 |             - |
|        0 |        100 |         1383791 |         13 |             - |
|        0 |        200 |         1383750 |         13 |             - |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |         5238041 |         52 |            38 |
|        1 |          1 |         5228542 |         52 |            38 |
|        1 |          2 |         5325917 |         53 |            40 |
|        1 |         10 |         5299667 |         52 |            38 |
|        1 |        100 |         5245250 |         52 |            39 |
|        1 |        200 |         5238459 |         52 |            39 |
|----------+------------+-----------------+------------+---------------|
|        1 |          0 |         5239083 |         52 |            38 |
|        2 |          0 |        19449417 |        194 |           181 |
|       10 |          0 |        67718584 |        677 |           663 |
|      100 |          0 |       709840708 |       7098 |          7085 |
|      200 |          0 |      2203580626 |      22035 |         22022 |
+----------+------------+-----------------+------------+---------------+

Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.

As can be seen from the above:

 a) Whenever there is a single relevant tracer function associated with a
    tracee, the overhead of invoking the tracer is constant, and does not
    scale with the number of tracers which are *not* associated with that
    tracee.

 b) The overhead for a single relevant tracer has dropped to ~1/3 of the
    overhead prior to this series (from 122ns to 38ns). This is largely
    due to permitting calls to dynamically-allocated ftrace_ops without
    going through ftrace_ops_list_func.

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>

[update kconfig, asm, refactor]

Signed-off-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
---
Changelog v4:
 - include benchmark result from a real hardware, shout-out to Björn!
 - new patch copy from Puranjay's RFC implementation
 - Drop code related to !WITH_ARG && DYNAMIC_FTRACE
---
 arch/riscv/Kconfig              |  2 +
 arch/riscv/Makefile             |  4 +-
 arch/riscv/kernel/asm-offsets.c |  3 ++
 arch/riscv/kernel/ftrace.c      | 67 +++++++++++++++++++++++++++++++++
 arch/riscv/kernel/mcount-dyn.S  | 35 +++++++++++++++--
 5 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index dc0fc11b6e96..ec986c9120e3 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -99,6 +99,7 @@ config RISCV
 	select EDAC_SUPPORT
 	select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY if DYNAMIC_FTRACE
+	select FUNCTION_ALIGNMENT_8B if DYNAMIC_FTRACE_WITH_CALL_OPS
 	select GENERIC_ARCH_TOPOLOGY
 	select GENERIC_ATOMIC64 if !64BIT
 	select GENERIC_CLOCKEVENTS_BROADCAST if SMP
@@ -152,6 +153,7 @@ config RISCV
 	select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
 	select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG)
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE
 	select HAVE_FTRACE_GRAPH_FUNC
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 539d2aef5cab..df57654a615e 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -15,9 +15,9 @@ ifeq ($(CONFIG_DYNAMIC_FTRACE),y)
 	LDFLAGS_vmlinux += --no-relax
 	KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
 ifeq ($(CONFIG_RISCV_ISA_C),y)
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=4
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=8,4
 else
-	CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+	CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
 endif
 endif
 
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 7c43c8e26ae7..2d96197a8abf 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -493,6 +493,9 @@ void asm_offsets(void)
 	DEFINE(STACKFRAME_SIZE_ON_STACK, ALIGN(sizeof(struct stackframe), STACK_ALIGN));
 	OFFSET(STACKFRAME_FP, stackframe, fp);
 	OFFSET(STACKFRAME_RA, stackframe, ra);
+#ifdef CONFIG_FUNCTION_TRACER
+	DEFINE(FTRACE_OPS_FUNC,		offsetof(struct ftrace_ops, func));
+#endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
 	DEFINE(FREGS_SIZE_ON_STACK, ALIGN(sizeof(struct __arch_ftrace_regs), STACK_ALIGN));
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index aca1a322e0aa..30bcf60135d8 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -15,6 +15,9 @@
 
 unsigned long ftrace_call_adjust(unsigned long addr)
 {
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+		return addr + 8;
+
 	return addr + MCOUNT_AUIPC_SIZE;
 }
 
@@ -64,9 +67,52 @@ static int __ftrace_modify_call(unsigned long source, unsigned long target, bool
 	return 0;
 }
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+static const struct ftrace_ops *riscv64_rec_get_ops(struct dyn_ftrace *rec)
+{
+	const struct ftrace_ops *ops = NULL;
+
+	if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
+		ops = ftrace_find_unique_ops(rec);
+		WARN_ON_ONCE(!ops);
+	}
+
+	if (!ops)
+		ops = &ftrace_list_ops;
+
+	return ops;
+}
+
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec,
+			      const struct ftrace_ops *ops)
+{
+	unsigned long literal = rec->ip - 8;
+
+	return patch_text_nosync((void *)literal, &ops, sizeof(ops));
+}
+
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, &ftrace_nop_ops);
+}
+
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec)
+{
+	return ftrace_rec_set_ops(rec, riscv64_rec_get_ops(rec));
+}
+#else
+static int ftrace_rec_set_nop_ops(struct dyn_ftrace *rec) { return 0; }
+static int ftrace_rec_update_ops(struct dyn_ftrace *rec) { return 0; }
+#endif
+
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE;
+	int ret;
+
+	ret = ftrace_rec_update_ops(rec);
+	if (ret)
+		return ret;
 
 	orig_addr = (unsigned long)&ftrace_caller;
 	distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
@@ -79,6 +125,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
 {
 	u32 nop4 = RISCV_INSN_NOP4;
+	int ret;
+
+	ret = ftrace_rec_set_nop_ops(rec);
+	if (ret)
+		return ret;
 
 	if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE))
 		return -EPERM;
@@ -99,6 +150,10 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	unsigned int nops[2], offset;
 	int ret;
 
+	ret = ftrace_rec_set_nop_ops(rec);
+	if (ret)
+		return ret;
+
 	offset = (unsigned long) &ftrace_caller - pc;
 	nops[0] = to_auipc_t0(offset);
 	nops[1] = RISCV_INSN_NOP4;
@@ -113,6 +168,13 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 ftrace_func_t ftrace_call_dest = ftrace_stub;
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
+	/*
+	 * When using CALL_OPS, the function to call is associated with the
+	 * call site, and we don't have a global function pointer to update.
+	 */
+	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
+		return 0;
+
 	WRITE_ONCE(ftrace_call_dest, func);
 	/*
 	 * The data fence ensure that the update to ftrace_call_dest happens
@@ -138,8 +200,13 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 {
 	unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
 	unsigned int call[2];
+	int ret;
 
 	make_call_t0(caller, old_addr, call);
+	ret = ftrace_rec_update_ops(rec);
+	if (ret)
+		return ret;
+
 	return __ftrace_modify_call(caller, addr, true);
 }
 #endif
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 8aa554d56096..699684eea7f0 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -139,10 +139,34 @@
 
 	.macro PREPARE_ARGS
 	addi	a0, t0, -MCOUNT_JALR_SIZE	// ip (callsite's jalr insn)
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	/*
+	 * When CALL_OPS is enabled (2 or 4) nops [8B] are placed before the
+	 * function entry, these are later overwritten with the pointer to the
+	 * associated struct ftrace_ops.
+	 *
+	 * -8: &ftrace_ops of the associated tracer function.
+	 *<ftrace enable>:
+	 *  0: auipc  t0/ra, 0x?
+	 *  4: jalr   t0/ra, ?(t0/ra)
+	 *
+	 * -8: &ftrace_nop_ops
+	 *<ftrace disable>:
+	 *  0: nop
+	 *  4: nop
+	 *
+	 * t0 is set to ip+8 after the jalr is executed at the callsite,
+	 * so we find the associated op at t0-16.
+	 */
+	mv 	a1, ra				// parent_ip
+	REG_L   a2, -16(t0)			// op
+	REG_L   ra, FTRACE_OPS_FUNC(a2)		// op->func
+#else
 	la	a1, function_trace_op
-	REG_L	a2, 0(a1)
-	mv	a1, ra
-	mv	a3, sp
+	REG_L	a2, 0(a1)			// op
+	mv	a1, ra				// parent_ip
+#endif
+	mv	a3, sp				// regs
 	.endm
 
 SYM_FUNC_START(ftrace_caller)
@@ -150,10 +174,13 @@ SYM_FUNC_START(ftrace_caller)
 	SAVE_ABI_REGS
 	PREPARE_ARGS
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	jalr	ra
+#else
 SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	REG_L	ra, ftrace_call_dest
 	jalr	ra, 0(ra)
-
+#endif
 	RESTORE_ABI_REGS
 	bnez	t1, .Ldirect
 	jr	t0
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 11/12] riscv: ftrace: support direct call using call_ops
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
                   ` (4 preceding siblings ...)
  2025-04-07 18:08 ` [PATCH v4 10/12] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Andy Chiu
@ 2025-04-07 18:08 ` Andy Chiu
  2025-04-10 20:05 ` [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Björn Töpel
  2025-06-02 22:12 ` patchwork-bot+linux-riscv
  7 siblings, 0 replies; 17+ messages in thread
From: Andy Chiu @ 2025-04-07 18:08 UTC (permalink / raw)
  To: linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Alexandre Ghiti, Mark Rutland, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

jump to FTRACE_ADDR if distance is out of reach

Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andybnac@gmail.com>
---
Changelog v4:
 - New patch since v4
 - Include Björn's fix for kprobe (adjusting ftrace address with
   MCOUNT_INSN_SIZE)
 - Clean out an unused variable
---
 arch/riscv/Kconfig              |  2 +-
 arch/riscv/include/asm/ftrace.h |  6 ++++
 arch/riscv/kernel/asm-offsets.c |  3 ++
 arch/riscv/kernel/ftrace.c      | 13 ++++-----
 arch/riscv/kernel/mcount-dyn.S  | 51 +++++++++++++++++++++------------
 5 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ec986c9120e3..8fdca6345fa3 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -152,7 +152,7 @@ config RISCV
 	select HAVE_DMA_CONTIGUOUS if MMU
 	select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
 	select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C
-	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
 	select HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS if (DYNAMIC_FTRACE_WITH_ARGS && !CFI_CLANG)
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE
 	select HAVE_FTRACE_GRAPH_FUNC
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 6a5c0a7fb826..22ebea3c2b26 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -130,6 +130,9 @@ struct __arch_ftrace_regs {
 	unsigned long sp;
 	unsigned long s0;
 	unsigned long t1;
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	unsigned long direct_tramp;
+#endif
 	union {
 		unsigned long args[8];
 		struct {
@@ -223,10 +226,13 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
 		       struct ftrace_ops *op, struct ftrace_regs *fregs);
 #define ftrace_graph_func ftrace_graph_func
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs, unsigned long addr)
 {
 	arch_ftrace_regs(fregs)->t1 = addr;
 }
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 2d96197a8abf..b26334075697 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -495,6 +495,9 @@ void asm_offsets(void)
 	OFFSET(STACKFRAME_RA, stackframe, ra);
 #ifdef CONFIG_FUNCTION_TRACER
 	DEFINE(FTRACE_OPS_FUNC,		offsetof(struct ftrace_ops, func));
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	DEFINE(FTRACE_OPS_DIRECT_CALL,	offsetof(struct ftrace_ops, direct_call));
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 30bcf60135d8..d65f06bfb457 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -16,7 +16,7 @@
 unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
-		return addr + 8;
+		return addr + 8 + MCOUNT_AUIPC_SIZE;
 
 	return addr + MCOUNT_AUIPC_SIZE;
 }
@@ -83,10 +83,9 @@ static const struct ftrace_ops *riscv64_rec_get_ops(struct dyn_ftrace *rec)
 	return ops;
 }
 
-static int ftrace_rec_set_ops(const struct dyn_ftrace *rec,
-			      const struct ftrace_ops *ops)
+static int ftrace_rec_set_ops(const struct dyn_ftrace *rec, const struct ftrace_ops *ops)
 {
-	unsigned long literal = rec->ip - 8;
+	unsigned long literal = ALIGN_DOWN(rec->ip - 12, 8);
 
 	return patch_text_nosync((void *)literal, &ops, sizeof(ops));
 }
@@ -117,7 +116,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	orig_addr = (unsigned long)&ftrace_caller;
 	distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
 	if (distance > JALR_RANGE)
-		return -EINVAL;
+		addr = FTRACE_ADDR;
 
 	return __ftrace_modify_call(pc, addr, false);
 }
@@ -199,15 +198,13 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
 		       unsigned long addr)
 {
 	unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
-	unsigned int call[2];
 	int ret;
 
-	make_call_t0(caller, old_addr, call);
 	ret = ftrace_rec_update_ops(rec);
 	if (ret)
 		return ret;
 
-	return __ftrace_modify_call(caller, addr, true);
+	return __ftrace_modify_call(caller, FTRACE_ADDR, true);
 }
 #endif
 
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 699684eea7f0..48f6c4f7dca0 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -82,12 +82,9 @@
 *                       +++++++++
 **/
 	.macro SAVE_ABI_REGS
-	mv	t4, sp			// Save original SP in T4
 	addi	sp, sp, -FREGS_SIZE_ON_STACK
-
 	REG_S	t0,  FREGS_EPC(sp)
 	REG_S	x1,  FREGS_RA(sp)
-	REG_S	t4,  FREGS_SP(sp)	// Put original SP on stack
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	REG_S	x8,  FREGS_S0(sp)
 #endif
@@ -108,9 +105,12 @@
 	REG_S	x15, FREGS_A5(sp)
 	REG_S	x16, FREGS_A6(sp)
 	REG_S	x17, FREGS_A7(sp)
+	mv	a0, sp
+	addi	a0, a0, FREGS_SIZE_ON_STACK
+	REG_S	a0, FREGS_SP(sp)	// Put original SP on stack
 	.endm
 
-	.macro RESTORE_ABI_REGS, all=0
+	.macro RESTORE_ABI_REGS
 	REG_L	t0, FREGS_EPC(sp)
 	REG_L	x1, FREGS_RA(sp)
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
@@ -139,6 +139,19 @@
 
 	.macro PREPARE_ARGS
 	addi	a0, t0, -MCOUNT_JALR_SIZE	// ip (callsite's jalr insn)
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
+	mv 	a1, ra				// parent_ip
+	REG_L   a2, -16(t0)			// op
+	REG_L   ra, FTRACE_OPS_FUNC(a2)		// op->func
+#else
+	la	a1, function_trace_op
+	REG_L	a2, 0(a1)			// op
+	mv	a1, ra				// parent_ip
+#endif
+	mv	a3, sp				// regs
+	.endm
+
+SYM_FUNC_START(ftrace_caller)
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
 	/*
 	 * When CALL_OPS is enabled (2 or 4) nops [8B] are placed before the
@@ -158,19 +171,17 @@
 	 * t0 is set to ip+8 after the jalr is executed at the callsite,
 	 * so we find the associated op at t0-16.
 	 */
-	mv 	a1, ra				// parent_ip
-	REG_L   a2, -16(t0)			// op
-	REG_L   ra, FTRACE_OPS_FUNC(a2)		// op->func
-#else
-	la	a1, function_trace_op
-	REG_L	a2, 0(a1)			// op
-	mv	a1, ra				// parent_ip
-#endif
-	mv	a3, sp				// regs
-	.endm
+	REG_L	t1, -16(t0) // op Should be SZ_REG instead of 16
 
-SYM_FUNC_START(ftrace_caller)
-	mv	t1, zero
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	/*
+	 * If the op has a direct call, handle it immediately without
+	 * saving/restoring registers.
+	 */
+	REG_L	t1, FTRACE_OPS_DIRECT_CALL(t1)
+	bnez	t1, ftrace_caller_direct
+#endif
+#endif
 	SAVE_ABI_REGS
 	PREPARE_ARGS
 
@@ -182,10 +193,14 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
 	jalr	ra, 0(ra)
 #endif
 	RESTORE_ABI_REGS
-	bnez	t1, .Ldirect
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	bnez	t1, ftrace_caller_direct
+#endif
 	jr	t0
-.Ldirect:
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_INNER_LABEL(ftrace_caller_direct, SYM_L_LOCAL)
 	jr	t1
+#endif
 SYM_FUNC_END(ftrace_caller)
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi
  2025-04-07 18:08 ` [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi Andy Chiu
@ 2025-04-08 22:31   ` kernel test robot
  2025-04-23  8:13     ` Alexandre Ghiti
  0 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2025-04-08 22:31 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, alexghiti, palmer
  Cc: oe-kbuild-all, Andy Chiu, linux-kernel, linux-trace-kernel,
	Mark Rutland, Mathieu Desnoyers, Alexandre Ghiti, bjorn,
	puranjay12, paul.walmsley, greentime.hu, nick.hu, nylon.chen,
	eric.lin, vicent.chen, zong.li, yongxuan.wang, samuel.holland,
	olivia.chu, c2232430

Hi Andy,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.15-rc1 next-20250408]
[cannot apply to trace/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Andy-Chiu/riscv-ftrace-factor-out-code-defined-by-WITH_ARG/20250408-025114
base:   linus/master
patch link:    https://lore.kernel.org/r/20250407180838.42877-4-andybnac%40gmail.com
patch subject: [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi
config: xtensa-randconfig-001-20250409 (https://download.01.org/0day-ci/archive/20250409/202504090657.5fiH4UIS-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250409/202504090657.5fiH4UIS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504090657.5fiH4UIS-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/trace/ftrace.c:191:6: warning: no previous prototype for 'ftrace_sync_ipi' [-Wmissing-prototypes]
     191 | void ftrace_sync_ipi(void *data)
         |      ^~~~~~~~~~~~~~~


vim +/ftrace_sync_ipi +191 kernel/trace/ftrace.c

   190	
 > 191	void ftrace_sync_ipi(void *data)
   192	{
   193		/* Probably not needed, but do it anyway */
   194		smp_rmb();
   195	}
   196	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
                   ` (5 preceding siblings ...)
  2025-04-07 18:08 ` [PATCH v4 11/12] riscv: ftrace: support direct call using call_ops Andy Chiu
@ 2025-04-10 20:05 ` Björn Töpel
  2025-05-07 13:58   ` Andy Chiu
  2025-06-02 22:12 ` patchwork-bot+linux-riscv
  7 siblings, 1 reply; 17+ messages in thread
From: Björn Töpel @ 2025-04-10 20:05 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, alexghiti, palmer, Steven Rostedt
  Cc: Andy Chiu, Evgenii Shatokhin, Nathan Chancellor,
	Björn Töpel, Palmer Dabbelt, Puranjay Mohan,
	linux-kernel, linux-trace-kernel, Alexandre Ghiti, paul.walmsley,
	samuel.holland

(Trimming the Cc:-list a bit)

Hi Andy!

Andy Chiu <andybnac@gmail.com> writes:

> From: Andy Chiu <andy.chiu@sifive.com>
...

(Oh, if you send another revision, please add a cover, so it's easier to
comment around the full series.)

Thanks a lot for doing the lion part of this series, Andy! With this in
place, *finally* stop machine way of text patching is past us, and we
can move RISC-V out from the 20th century. ;-)

I applied your series, and Steven's series [1] to [2], and ran that on
QEMU (riscv64, ~RVA23), and on Milk-V Jupiter (Spacemit K1) with:
 * CONFIG_FTRACE_STARTUP_TEST
 * ftrace kselftest

No visible regressions, and now the ftrace kselftest can actually
complete! For the series:

Tested-by: Björn Töpel <bjorn@rivosinc.com>

[1] https://lore.kernel.org/all/20250409151549.788068911@goodmis.org/
[2] commit 3b07108ada81 ("Merge tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest")


Björn

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-04-07 18:08 ` [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
@ 2025-04-11 13:15   ` Robbin Ehn
  2025-04-23  8:22   ` Alexandre Ghiti
  1 sibling, 0 replies; 17+ messages in thread
From: Robbin Ehn @ 2025-04-11 13:15 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Mark Rutland, Alexandre Ghiti, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

Hi, thanks,

On Tue, 2025-04-08 at 02:08 +0800, Andy Chiu wrote:
> From: Andy Chiu <andy.chiu@sifive.com>
> 
> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> instruction fetch can break down to 4 byte at a time, it is impossible
> to update two instructions without a race. In order to mitigate it, we
> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> patching can change NOP4 to JALR to eable/disable ftrcae from a
> function. This limits the reach of each ftrace entry to +-2KB displacing
> from ftrace_caller.
> 
> Starting from the trampoline, we add a level of indirection for it to
> reach ftrace caller target. Now, it loads the target address from a
> memory location, then perform the jump. This enable the kernel to update
> the target atomically.
> 
> The new don't-stop-the-world text patching on change only one RISC-V
> instruction:
> 
>   |  -8: &ftrace_ops of the associated tracer function.
>   | <ftrace enable>:
>   |   0: auipc  t0, hi(ftrace_caller)
>   |   4: jalr   t0, lo(ftrace_caller)
>   |
>   |  -8: &ftrace_nop_ops
>   | <ftrace disable>:
>   |   0: auipc  t0, hi(ftrace_caller)
>   |   4: nop
> 
> This means that f+0x0 is fixed, and should not be claimed by ftrace,
> e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
> offset and MCOUNT_INSN_SIZE accordingly.
> 
> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v4:
>  - Include Björn's fix for kprobe
>  - Refactor code for better reading (Robbin, Björn)
>  - Remove make_call_ra and friedns (Björn)
>  - Update comments to match reality (Björn)
>  - Drop code defined by !WITH_ARG
>  - Add a synchronization point when updating ftrace_call_dest (Björn)
> ---
>  arch/riscv/include/asm/ftrace.h |  49 ++++++------
>  arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
>  arch/riscv/kernel/mcount-dyn.S  |   9 +--
>  3 files changed, 92 insertions(+), 96 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index d8b2138bd9c6..6a5c0a7fb826 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
>  #define ftrace_return_address(n) return_address(n)
>  
>  void _mcount(void);
> -static inline unsigned long ftrace_call_adjust(unsigned long addr)
> -{
> -	return addr;
> -}
> +unsigned long ftrace_call_adjust(unsigned long addr);
> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
> +#define ftrace_get_symaddr(fentry_ip) arch_ftrace_get_symaddr(fentry_ip)
>  
>  /*
>   * Let's do like x86/arm64 and ignore the compat syscalls.
> @@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
>   * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
>   *          return address (original pc + 4)
>   *
> + * The first 2 instructions for each tracable function is compiled to 2 nop
> + * instructions. Then, the kernel initializes the first instruction to auipc
> at
> + * boot time (<ftrace disable>). The second instruction is patched to jalr to
> + * start the trace.
> + *
> + *<Image>:
> + * 0: nop
> + * 4: nop
> + *
>   *<ftrace enable>:
> - * 0: auipc  t0/ra, 0x?
> - * 4: jalr   t0/ra, ?(t0/ra)
> + * 0: auipc  t0, 0x?
> + * 4: jalr   t0, ?(t0)
>   *
>   *<ftrace disable>:
> - * 0: nop
> + * 0: auipc  t0, 0x?
>   * 4: nop
>   *
>   * Dynamic ftrace generates probes to call sites, so we must deal with
> @@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
>  #define AUIPC_OFFSET_MASK	(0xfffff000)
>  #define AUIPC_PAD		(0x00001000)
>  #define JALR_SHIFT		20
> -#define JALR_RA			(0x000080e7)
> -#define AUIPC_RA		(0x00000097)
>  #define JALR_T0			(0x000282e7)
>  #define AUIPC_T0		(0x00000297)
> +#define JALR_RANGE		(JALR_SIGN_MASK - 1)
>  
>  #define to_jalr_t0(offset)						\
>  	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> @@ -96,26 +103,14 @@ do
> {									\
>  	call[1] = to_jalr_t0(offset);					\
>  } while (0)
>  
> -#define to_jalr_ra(offset)						\
> -	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
> -
> -#define to_auipc_ra(offset)						\
> -	((offset & JALR_SIGN_MASK) ?					\
> -	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :	\
> -	((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
> -
> -#define make_call_ra(caller, callee, call)				\
> -do {									\
> -	unsigned int offset =						\
> -		(unsigned long) (callee) - (unsigned long) (caller);	\
> -	call[0] = to_auipc_ra(offset);					\
> -	call[1] = to_jalr_ra(offset);					\
> -} while (0)
> -
>  /*
> - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> + * Only the jalr insn in the auipc+jalr is patched, so we make it 4
> + * bytes here.
>   */
> -#define MCOUNT_INSN_SIZE 8
> +#define MCOUNT_INSN_SIZE	4
> +#define MCOUNT_AUIPC_SIZE	4
> +#define MCOUNT_JALR_SIZE	4
> +#define MCOUNT_NOP4_SIZE	4
>  
>  #ifndef __ASSEMBLY__
>  struct dyn_ftrace;
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 1fd10555c580..cf78eef073a0 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -8,10 +8,21 @@
>  #include <linux/ftrace.h>
>  #include <linux/uaccess.h>
>  #include <linux/memory.h>
> +#include <linux/irqflags.h>
>  #include <linux/stop_machine.h>
>  #include <asm/cacheflush.h>
>  #include <asm/text-patching.h>
>  
> +unsigned long ftrace_call_adjust(unsigned long addr)
> +{
> +	return addr + MCOUNT_AUIPC_SIZE;
> +}
> +
> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
> +{
> +	return fentry_ip - MCOUNT_AUIPC_SIZE;
> +}
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
>  {
> @@ -32,51 +43,32 @@ void ftrace_arch_code_modify_post_process(void)
> __releases(&text_mutex)
>  	mutex_unlock(&text_mutex);
>  }
>  
> -static int ftrace_check_current_call(unsigned long hook_pos,
> -				     unsigned int *expected)
> +static int __ftrace_modify_call(unsigned long source, unsigned long target,
> bool validate)
>  {
> +	unsigned int call[2], offset;

There is a mix of unsigned int and u32.
It doesn't seem to serve any purpose?

>  	unsigned int replaced[2];
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
>  
> -	/* we expect nops at the hook position */
> -	if (!expected)
> -		expected = nops;
> +	offset = target - source;
> +	call[1] = to_jalr_t0(offset);
>  
> -	/*
> -	 * Read the text we want to modify;
> -	 * return must be -EFAULT on read error
> -	 */
> -	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> -			MCOUNT_INSN_SIZE))
> -		return -EFAULT;
> -
> -	/*
> -	 * Make sure it is what we expect it to be;
> -	 * return must be -EINVAL on failed comparison
> -	 */
> -	if (memcmp(expected, replaced, sizeof(replaced))) {
> -		pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
> -		       (void *)hook_pos, expected[0], expected[1],
> replaced[0],
> -		       replaced[1]);
> -		return -EINVAL;
> +	if (validate) {
> +		call[0] = to_auipc_t0(offset);
> +		/*
> +		 * Read the text we want to modify;
> +		 * return must be -EFAULT on read error
> +		 */
> +		if (copy_from_kernel_nofault(replaced, (void *)source, 2 *
> MCOUNT_INSN_SIZE))
> +			return -EFAULT;
> +
> +		if (replaced[0] != call[0]) {
> +			pr_err("%p: expected (%08x) but got (%08x)\n",
> +			       (void *)source, call[0], replaced[0]);
> +			return -EINVAL;
> +		}
>  	}
>  
> -	return 0;
> -}
> -
> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> -				bool enable, bool ra)
> -{
> -	unsigned int call[2];
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
> -
> -	if (ra)
> -		make_call_ra(hook_pos, target, call);
> -	else
> -		make_call_t0(hook_pos, target, call);
> -
> -	/* Replace the auipc-jalr pair at once. Return -EPERM on write error.
> */
> -	if (patch_insn_write((void *)hook_pos, enable ? call : nops,
> MCOUNT_INSN_SIZE))
> +	/* Replace the jalr at once. Return -EPERM on write error. */
> +	if (patch_insn_write((void *)(source + MCOUNT_AUIPC_SIZE), call + 1,
> MCOUNT_JALR_SIZE))
>  		return -EPERM;
>  
>  	return 0;
> @@ -84,22 +76,21 @@ static int __ftrace_modify_call(unsigned long hook_pos,
> unsigned long target,
>  
>  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>  {
> -	unsigned int call[2];
> +	unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE;
>  
> -	make_call_t0(rec->ip, addr, call);
> -
> -	if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> -		return -EPERM;
> +	orig_addr = (unsigned long)&ftrace_caller;
> +	distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> +	if (distance > JALR_RANGE)
> +		return -EINVAL;
>  
> -	return 0;
> +	return __ftrace_modify_call(pc, addr, false);
>  }
>  
> -int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> -		    unsigned long addr)
> +int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long
> addr)
>  {
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
> +	u32 nop4 = RISCV_INSN_NOP4;
>  
> -	if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> +	if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE))
>  		return -EPERM;
>  
>  	return 0;
> @@ -114,21 +105,38 @@ int ftrace_make_nop(struct module *mod, struct
> dyn_ftrace *rec,
>   */
>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>  {
> -	int out;
> +	unsigned long pc = rec->ip - MCOUNT_AUIPC_SIZE;
> +	unsigned int nops[2], offset;
> +	int ret;
> +
> +	offset = (unsigned long) &ftrace_caller - pc;
> +	nops[0] = to_auipc_t0(offset);
> +	nops[1] = RISCV_INSN_NOP4;
>  
>  	mutex_lock(&text_mutex);
> -	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> +	ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE);
>  	mutex_unlock(&text_mutex);
>  
> -	return out;
> +	return ret;
>  }
>  
> +ftrace_func_t ftrace_call_dest = ftrace_stub;
>  int ftrace_update_ftrace_func(ftrace_func_t func)
>  {
> -	int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> -				       (unsigned long)func, true, true);
> -
> -	return ret;
> +	WRITE_ONCE(ftrace_call_dest, func);
> +	/*
> +	 * The data fence ensure that the update to ftrace_call_dest happens
> +	 * before the write to function_trace_op later in the generic ftrace.
> +	 * If the sequence is not enforced, then an old ftrace_call_dest may
> +	 * race loading a new function_trace_op set in ftrace_modify_all_code
> +	 *
> +	 * If we are in stop_machine, then we don't need to call remote fence
> +	 * as there is no concurrent read-side of ftrace_call_dest.
> +	 */
> +	smp_wmb();
> +	if (!irqs_disabled())
> +		smp_call_function(ftrace_sync_ipi, NULL, 1);
> +	return 0;

Here we allow the read of cpu_online_mask being reordered with the store to
ftrace_call_dest.

Can't this happen then?

Hart X                 | Hart new
load cpu_online_mask   |
                       | store cpu_online_mask
                       | tlb shootdown/icache flush
store ftrace_call_dest |
ftrace_sync_ipi        | // not hit
                       | sees old value ?

Thanks, Robbin

>  }
>  
>  struct ftrace_modify_param {
> @@ -172,17 +180,11 @@ void arch_ftrace_update_code(int command)
>  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>  		       unsigned long addr)
>  {
> +	unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
>  	unsigned int call[2];
> -	unsigned long caller = rec->ip;
> -	int ret;
>  
>  	make_call_t0(caller, old_addr, call);
> -	ret = ftrace_check_current_call(caller, call);
> -
> -	if (ret)
> -		return ret;
> -
> -	return __ftrace_modify_call(caller, addr, true, false);
> +	return __ftrace_modify_call(caller, addr, true);
>  }
>  #endif
>  
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index 3f06b40bb6c8..8aa554d56096 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -13,7 +13,6 @@
>  
>  	.text
>  
> -#define FENTRY_RA_OFFSET	8
>  #define ABI_SIZE_ON_STACK	80
>  #define ABI_A0			0
>  #define ABI_A1			8
> @@ -62,8 +61,7 @@
>  * After the stack is established,
>  *
>  * 0(sp) stores the PC of the traced function which can be accessed
> -* by &(fregs)->epc in tracing function. Note that the real
> -* function entry address should be computed with -FENTRY_RA_OFFSET.
> +* by &(fregs)->epc in tracing function.
>  *
>  * 8(sp) stores the function return address (i.e. parent IP) that
>  * can be accessed by &(fregs)->ra in tracing function.
> @@ -140,7 +138,7 @@
>  	.endm
>  
>  	.macro PREPARE_ARGS
> -	addi	a0, t0, -FENTRY_RA_OFFSET
> +	addi	a0, t0, -MCOUNT_JALR_SIZE	// ip (callsite's jalr insn)
>  	la	a1, function_trace_op
>  	REG_L	a2, 0(a1)
>  	mv	a1, ra
> @@ -153,7 +151,8 @@ SYM_FUNC_START(ftrace_caller)
>  	PREPARE_ARGS
>  
>  SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> -	call	ftrace_stub
> +	REG_L	ra, ftrace_call_dest
> +	jalr	ra, 0(ra)
>  
>  	RESTORE_ABI_REGS
>  	bnez	t1, .Ldirect


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi
  2025-04-08 22:31   ` kernel test robot
@ 2025-04-23  8:13     ` Alexandre Ghiti
  0 siblings, 0 replies; 17+ messages in thread
From: Alexandre Ghiti @ 2025-04-23  8:13 UTC (permalink / raw)
  To: kernel test robot, Andy Chiu, linux-riscv, alexghiti, palmer
  Cc: oe-kbuild-all, linux-kernel, linux-trace-kernel, Mark Rutland,
	Mathieu Desnoyers, bjorn, puranjay12, paul.walmsley, greentime.hu,
	nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

Hi Andy,

On 09/04/2025 00:31, kernel test robot wrote:
> Hi Andy,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on linus/master]
> [also build test WARNING on v6.15-rc1 next-20250408]
> [cannot apply to trace/for-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Andy-Chiu/riscv-ftrace-factor-out-code-defined-by-WITH_ARG/20250408-025114
> base:   linus/master
> patch link:    https://lore.kernel.org/r/20250407180838.42877-4-andybnac%40gmail.com
> patch subject: [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi
> config: xtensa-randconfig-001-20250409 (https://download.01.org/0day-ci/archive/20250409/202504090657.5fiH4UIS-lkp@intel.com/config)
> compiler: xtensa-linux-gcc (GCC) 11.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250409/202504090657.5fiH4UIS-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202504090657.5fiH4UIS-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>>> kernel/trace/ftrace.c:191:6: warning: no previous prototype for 'ftrace_sync_ipi' [-Wmissing-prototypes]
>       191 | void ftrace_sync_ipi(void *data)
>           |      ^~~~~~~~~~~~~~~
>
>
> vim +/ftrace_sync_ipi +191 kernel/trace/ftrace.c
>
>     190	
>   > 191	void ftrace_sync_ipi(void *data)
>     192	{
>     193		/* Probably not needed, but do it anyway */
>     194		smp_rmb();
>     195	}
>     196	
>

I fixed this with the following that I squashed into your patch, no need 
to resend a new series for this.

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0d4eec5747074..30374478cb077 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -635,6 +635,8 @@ enum {
  #define ftrace_get_symaddr(fentry_ip) (0)
  #endif

+void ftrace_sync_ipi(void *data);
+
  #ifdef CONFIG_DYNAMIC_FTRACE

  void ftrace_arch_code_modify_prepare(void);
@@ -807,7 +809,6 @@ extern void ftrace_call(void);
  extern void ftrace_regs_call(void);
  extern void mcount_call(void);

-void ftrace_sync_ipi(void *data);
  void ftrace_modify_all_code(int command);

  #ifndef FTRACE_ADDR


Thanks,

Alex


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-04-07 18:08 ` [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
  2025-04-11 13:15   ` Robbin Ehn
@ 2025-04-23  8:22   ` Alexandre Ghiti
  2025-05-05 14:06     ` Alexandre Ghiti
  1 sibling, 1 reply; 17+ messages in thread
From: Alexandre Ghiti @ 2025-04-23  8:22 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Mark Rutland, puranjay12, paul.walmsley,
	greentime.hu, nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

Hi Andy,

On 07/04/2025 20:08, Andy Chiu wrote:
> From: Andy Chiu <andy.chiu@sifive.com>
>
> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> instruction fetch can break down to 4 byte at a time, it is impossible
> to update two instructions without a race. In order to mitigate it, we
> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> patching can change NOP4 to JALR to eable/disable ftrcae from a
> function. This limits the reach of each ftrace entry to +-2KB displacing
> from ftrace_caller.
>
> Starting from the trampoline, we add a level of indirection for it to
> reach ftrace caller target. Now, it loads the target address from a
> memory location, then perform the jump. This enable the kernel to update
> the target atomically.
>
> The new don't-stop-the-world text patching on change only one RISC-V
> instruction:
>
>    |  -8: &ftrace_ops of the associated tracer function.
>    | <ftrace enable>:
>    |   0: auipc  t0, hi(ftrace_caller)
>    |   4: jalr   t0, lo(ftrace_caller)
>    |
>    |  -8: &ftrace_nop_ops
>    | <ftrace disable>:
>    |   0: auipc  t0, hi(ftrace_caller)
>    |   4: nop
>
> This means that f+0x0 is fixed, and should not be claimed by ftrace,
> e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
> offset and MCOUNT_INSN_SIZE accordingly.
>
> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> Changelog v4:
>   - Include Björn's fix for kprobe
>   - Refactor code for better reading (Robbin, Björn)
>   - Remove make_call_ra and friedns (Björn)
>   - Update comments to match reality (Björn)
>   - Drop code defined by !WITH_ARG
>   - Add a synchronization point when updating ftrace_call_dest (Björn)
> ---
>   arch/riscv/include/asm/ftrace.h |  49 ++++++------
>   arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
>   arch/riscv/kernel/mcount-dyn.S  |   9 +--
>   3 files changed, 92 insertions(+), 96 deletions(-)
>
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index d8b2138bd9c6..6a5c0a7fb826 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
>   #define ftrace_return_address(n) return_address(n)
>   
>   void _mcount(void);
> -static inline unsigned long ftrace_call_adjust(unsigned long addr)
> -{
> -	return addr;
> -}
> +unsigned long ftrace_call_adjust(unsigned long addr);
> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
> +#define ftrace_get_symaddr(fentry_ip) arch_ftrace_get_symaddr(fentry_ip)
>   
>   /*
>    * Let's do like x86/arm64 and ignore the compat syscalls.
> @@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
>    * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
>    *          return address (original pc + 4)
>    *
> + * The first 2 instructions for each tracable function is compiled to 2 nop
> + * instructions. Then, the kernel initializes the first instruction to auipc at
> + * boot time (<ftrace disable>). The second instruction is patched to jalr to
> + * start the trace.
> + *
> + *<Image>:
> + * 0: nop
> + * 4: nop
> + *
>    *<ftrace enable>:
> - * 0: auipc  t0/ra, 0x?
> - * 4: jalr   t0/ra, ?(t0/ra)
> + * 0: auipc  t0, 0x?
> + * 4: jalr   t0, ?(t0)
>    *
>    *<ftrace disable>:
> - * 0: nop
> + * 0: auipc  t0, 0x?
>    * 4: nop
>    *
>    * Dynamic ftrace generates probes to call sites, so we must deal with
> @@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
>   #define AUIPC_OFFSET_MASK	(0xfffff000)
>   #define AUIPC_PAD		(0x00001000)
>   #define JALR_SHIFT		20
> -#define JALR_RA			(0x000080e7)
> -#define AUIPC_RA		(0x00000097)
>   #define JALR_T0			(0x000282e7)
>   #define AUIPC_T0		(0x00000297)
> +#define JALR_RANGE		(JALR_SIGN_MASK - 1)
>   
>   #define to_jalr_t0(offset)						\
>   	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> @@ -96,26 +103,14 @@ do {									\
>   	call[1] = to_jalr_t0(offset);					\
>   } while (0)
>   
> -#define to_jalr_ra(offset)						\
> -	(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
> -
> -#define to_auipc_ra(offset)						\
> -	((offset & JALR_SIGN_MASK) ?					\
> -	(((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :	\
> -	((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
> -
> -#define make_call_ra(caller, callee, call)				\
> -do {									\
> -	unsigned int offset =						\
> -		(unsigned long) (callee) - (unsigned long) (caller);	\
> -	call[0] = to_auipc_ra(offset);					\
> -	call[1] = to_jalr_ra(offset);					\
> -} while (0)
> -
>   /*
> - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> + * Only the jalr insn in the auipc+jalr is patched, so we make it 4
> + * bytes here.
>    */
> -#define MCOUNT_INSN_SIZE 8
> +#define MCOUNT_INSN_SIZE	4
> +#define MCOUNT_AUIPC_SIZE	4
> +#define MCOUNT_JALR_SIZE	4
> +#define MCOUNT_NOP4_SIZE	4
>   
>   #ifndef __ASSEMBLY__
>   struct dyn_ftrace;
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 1fd10555c580..cf78eef073a0 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -8,10 +8,21 @@
>   #include <linux/ftrace.h>
>   #include <linux/uaccess.h>
>   #include <linux/memory.h>
> +#include <linux/irqflags.h>
>   #include <linux/stop_machine.h>
>   #include <asm/cacheflush.h>
>   #include <asm/text-patching.h>
>   
> +unsigned long ftrace_call_adjust(unsigned long addr)
> +{
> +	return addr + MCOUNT_AUIPC_SIZE;
> +}
> +
> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
> +{
> +	return fentry_ip - MCOUNT_AUIPC_SIZE;
> +}
> +


Those functions cause the following errors when building with 
!CONFIG_DYNAMIC_FTRACE, but I'm not sure how to fix this:

../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
../arch/riscv/kernel/ftrace.c:19:35: error: 'MCOUNT_AUIPC_SIZE' 
undeclared (first use in this function)
    19 |                 return addr + 8 + MCOUNT_AUIPC_SIZE;
       |                                   ^~~~~~~~~~~~~~~~~
../arch/riscv/kernel/ftrace.c:19:35: note: each undeclared identifier is 
reported only once for each function it appears in
   CC      fs/9p/vfs_dir.o
../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
../arch/riscv/kernel/ftrace.c:26:28: error: 'MCOUNT_AUIPC_SIZE' 
undeclared (first use in this function)
    26 |         return fentry_ip - MCOUNT_AUIPC_SIZE;
       |                            ^~~~~~~~~~~~~~~~~
   CC      drivers/pci/pcie/pme.o
../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
../arch/riscv/kernel/ftrace.c:22:1: error: control reaches end of 
non-void function [-Werror=return-type]
    22 | }
       | ^
../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
../arch/riscv/kernel/ftrace.c:27:1: error: control reaches end of 
non-void function [-Werror=return-type]
    27 | }
       | ^
cc1: some warnings being treated as errors
make[5]: *** [../scripts/Makefile.build:203: arch/riscv/kernel/ftrace.o] 
Error 1


>   #ifdef CONFIG_DYNAMIC_FTRACE
>   void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
>   {
> @@ -32,51 +43,32 @@ void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
>   	mutex_unlock(&text_mutex);
>   }
>   
> -static int ftrace_check_current_call(unsigned long hook_pos,
> -				     unsigned int *expected)
> +static int __ftrace_modify_call(unsigned long source, unsigned long target, bool validate)
>   {
> +	unsigned int call[2], offset;
>   	unsigned int replaced[2];
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
>   
> -	/* we expect nops at the hook position */
> -	if (!expected)
> -		expected = nops;
> +	offset = target - source;
> +	call[1] = to_jalr_t0(offset);
>   
> -	/*
> -	 * Read the text we want to modify;
> -	 * return must be -EFAULT on read error
> -	 */
> -	if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> -			MCOUNT_INSN_SIZE))
> -		return -EFAULT;
> -
> -	/*
> -	 * Make sure it is what we expect it to be;
> -	 * return must be -EINVAL on failed comparison
> -	 */
> -	if (memcmp(expected, replaced, sizeof(replaced))) {
> -		pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
> -		       (void *)hook_pos, expected[0], expected[1], replaced[0],
> -		       replaced[1]);
> -		return -EINVAL;
> +	if (validate) {
> +		call[0] = to_auipc_t0(offset);
> +		/*
> +		 * Read the text we want to modify;
> +		 * return must be -EFAULT on read error
> +		 */
> +		if (copy_from_kernel_nofault(replaced, (void *)source, 2 * MCOUNT_INSN_SIZE))
> +			return -EFAULT;
> +
> +		if (replaced[0] != call[0]) {
> +			pr_err("%p: expected (%08x) but got (%08x)\n",
> +			       (void *)source, call[0], replaced[0]);
> +			return -EINVAL;
> +		}
>   	}
>   
> -	return 0;
> -}
> -
> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> -				bool enable, bool ra)
> -{
> -	unsigned int call[2];
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
> -
> -	if (ra)
> -		make_call_ra(hook_pos, target, call);
> -	else
> -		make_call_t0(hook_pos, target, call);
> -
> -	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> -	if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> +	/* Replace the jalr at once. Return -EPERM on write error. */
> +	if (patch_insn_write((void *)(source + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
>   		return -EPERM;
>   
>   	return 0;
> @@ -84,22 +76,21 @@ static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
>   
>   int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>   {
> -	unsigned int call[2];
> +	unsigned long distance, orig_addr, pc = rec->ip - MCOUNT_AUIPC_SIZE;
>   
> -	make_call_t0(rec->ip, addr, call);
> -
> -	if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> -		return -EPERM;
> +	orig_addr = (unsigned long)&ftrace_caller;
> +	distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> +	if (distance > JALR_RANGE)
> +		return -EINVAL;
>   
> -	return 0;
> +	return __ftrace_modify_call(pc, addr, false);
>   }
>   
> -int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> -		    unsigned long addr)
> +int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long addr)
>   {
> -	unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
> +	u32 nop4 = RISCV_INSN_NOP4;
>   
> -	if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> +	if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE))
>   		return -EPERM;
>   
>   	return 0;
> @@ -114,21 +105,38 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>    */
>   int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>   {
> -	int out;
> +	unsigned long pc = rec->ip - MCOUNT_AUIPC_SIZE;
> +	unsigned int nops[2], offset;
> +	int ret;
> +
> +	offset = (unsigned long) &ftrace_caller - pc;
> +	nops[0] = to_auipc_t0(offset);
> +	nops[1] = RISCV_INSN_NOP4;
>   
>   	mutex_lock(&text_mutex);
> -	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> +	ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE);
>   	mutex_unlock(&text_mutex);
>   
> -	return out;
> +	return ret;
>   }
>   
> +ftrace_func_t ftrace_call_dest = ftrace_stub;
>   int ftrace_update_ftrace_func(ftrace_func_t func)
>   {
> -	int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> -				       (unsigned long)func, true, true);
> -
> -	return ret;
> +	WRITE_ONCE(ftrace_call_dest, func);
> +	/*
> +	 * The data fence ensure that the update to ftrace_call_dest happens
> +	 * before the write to function_trace_op later in the generic ftrace.
> +	 * If the sequence is not enforced, then an old ftrace_call_dest may
> +	 * race loading a new function_trace_op set in ftrace_modify_all_code
> +	 *
> +	 * If we are in stop_machine, then we don't need to call remote fence
> +	 * as there is no concurrent read-side of ftrace_call_dest.
> +	 */
> +	smp_wmb();
> +	if (!irqs_disabled())
> +		smp_call_function(ftrace_sync_ipi, NULL, 1);
> +	return 0;
>   }
>   
>   struct ftrace_modify_param {
> @@ -172,17 +180,11 @@ void arch_ftrace_update_code(int command)
>   int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>   		       unsigned long addr)
>   {
> +	unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
>   	unsigned int call[2];
> -	unsigned long caller = rec->ip;
> -	int ret;
>   
>   	make_call_t0(caller, old_addr, call);
> -	ret = ftrace_check_current_call(caller, call);
> -
> -	if (ret)
> -		return ret;
> -
> -	return __ftrace_modify_call(caller, addr, true, false);
> +	return __ftrace_modify_call(caller, addr, true);
>   }
>   #endif
>   
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index 3f06b40bb6c8..8aa554d56096 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -13,7 +13,6 @@
>   
>   	.text
>   
> -#define FENTRY_RA_OFFSET	8
>   #define ABI_SIZE_ON_STACK	80
>   #define ABI_A0			0
>   #define ABI_A1			8
> @@ -62,8 +61,7 @@
>   * After the stack is established,
>   *
>   * 0(sp) stores the PC of the traced function which can be accessed
> -* by &(fregs)->epc in tracing function. Note that the real
> -* function entry address should be computed with -FENTRY_RA_OFFSET.
> +* by &(fregs)->epc in tracing function.
>   *
>   * 8(sp) stores the function return address (i.e. parent IP) that
>   * can be accessed by &(fregs)->ra in tracing function.
> @@ -140,7 +138,7 @@
>   	.endm
>   
>   	.macro PREPARE_ARGS
> -	addi	a0, t0, -FENTRY_RA_OFFSET
> +	addi	a0, t0, -MCOUNT_JALR_SIZE	// ip (callsite's jalr insn)
>   	la	a1, function_trace_op
>   	REG_L	a2, 0(a1)
>   	mv	a1, ra
> @@ -153,7 +151,8 @@ SYM_FUNC_START(ftrace_caller)
>   	PREPARE_ARGS
>   
>   SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> -	call	ftrace_stub
> +	REG_L	ra, ftrace_call_dest
> +	jalr	ra, 0(ra)
>   
>   	RESTORE_ABI_REGS
>   	bnez	t1, .Ldirect

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-04-23  8:22   ` Alexandre Ghiti
@ 2025-05-05 14:06     ` Alexandre Ghiti
  2025-05-07 14:18       ` Andy Chiu
  0 siblings, 1 reply; 17+ messages in thread
From: Alexandre Ghiti @ 2025-05-05 14:06 UTC (permalink / raw)
  To: Andy Chiu, linux-riscv, alexghiti, palmer
  Cc: Andy Chiu, Björn Töpel, linux-kernel,
	linux-trace-kernel, Mark Rutland, puranjay12, paul.walmsley,
	greentime.hu, nick.hu, nylon.chen, eric.lin, vicent.chen, zong.li,
	yongxuan.wang, samuel.holland, olivia.chu, c2232430

On 23/04/2025 10:22, Alexandre Ghiti wrote:
> Hi Andy,
>
> On 07/04/2025 20:08, Andy Chiu wrote:
>> From: Andy Chiu <andy.chiu@sifive.com>
>>
>> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
>> instruction fetch can break down to 4 byte at a time, it is impossible
>> to update two instructions without a race. In order to mitigate it, we
>> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
>> patching can change NOP4 to JALR to eable/disable ftrcae from a
>> function. This limits the reach of each ftrace entry to +-2KB displacing
>> from ftrace_caller.
>>
>> Starting from the trampoline, we add a level of indirection for it to
>> reach ftrace caller target. Now, it loads the target address from a
>> memory location, then perform the jump. This enable the kernel to update
>> the target atomically.
>>
>> The new don't-stop-the-world text patching on change only one RISC-V
>> instruction:
>>
>>    |  -8: &ftrace_ops of the associated tracer function.
>>    | <ftrace enable>:
>>    |   0: auipc  t0, hi(ftrace_caller)
>>    |   4: jalr   t0, lo(ftrace_caller)
>>    |
>>    |  -8: &ftrace_nop_ops
>>    | <ftrace disable>:
>>    |   0: auipc  t0, hi(ftrace_caller)
>>    |   4: nop
>>
>> This means that f+0x0 is fixed, and should not be claimed by ftrace,
>> e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
>> offset and MCOUNT_INSN_SIZE accordingly.
>>
>> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
>> ---
>> Changelog v4:
>>   - Include Björn's fix for kprobe
>>   - Refactor code for better reading (Robbin, Björn)
>>   - Remove make_call_ra and friedns (Björn)
>>   - Update comments to match reality (Björn)
>>   - Drop code defined by !WITH_ARG
>>   - Add a synchronization point when updating ftrace_call_dest (Björn)
>> ---
>>   arch/riscv/include/asm/ftrace.h |  49 ++++++------
>>   arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
>>   arch/riscv/kernel/mcount-dyn.S  |   9 +--
>>   3 files changed, 92 insertions(+), 96 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/ftrace.h 
>> b/arch/riscv/include/asm/ftrace.h
>> index d8b2138bd9c6..6a5c0a7fb826 100644
>> --- a/arch/riscv/include/asm/ftrace.h
>> +++ b/arch/riscv/include/asm/ftrace.h
>> @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
>>   #define ftrace_return_address(n) return_address(n)
>>     void _mcount(void);
>> -static inline unsigned long ftrace_call_adjust(unsigned long addr)
>> -{
>> -    return addr;
>> -}
>> +unsigned long ftrace_call_adjust(unsigned long addr);
>> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
>> +#define ftrace_get_symaddr(fentry_ip) 
>> arch_ftrace_get_symaddr(fentry_ip)
>>     /*
>>    * Let's do like x86/arm64 and ignore the compat syscalls.
>> @@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
>>    * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
>>    *          return address (original pc + 4)
>>    *
>> + * The first 2 instructions for each tracable function is compiled 
>> to 2 nop
>> + * instructions. Then, the kernel initializes the first instruction 
>> to auipc at
>> + * boot time (<ftrace disable>). The second instruction is patched 
>> to jalr to
>> + * start the trace.
>> + *
>> + *<Image>:
>> + * 0: nop
>> + * 4: nop
>> + *
>>    *<ftrace enable>:
>> - * 0: auipc  t0/ra, 0x?
>> - * 4: jalr   t0/ra, ?(t0/ra)
>> + * 0: auipc  t0, 0x?
>> + * 4: jalr   t0, ?(t0)
>>    *
>>    *<ftrace disable>:
>> - * 0: nop
>> + * 0: auipc  t0, 0x?
>>    * 4: nop
>>    *
>>    * Dynamic ftrace generates probes to call sites, so we must deal with
>> @@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
>>   #define AUIPC_OFFSET_MASK    (0xfffff000)
>>   #define AUIPC_PAD        (0x00001000)
>>   #define JALR_SHIFT        20
>> -#define JALR_RA            (0x000080e7)
>> -#define AUIPC_RA        (0x00000097)
>>   #define JALR_T0            (0x000282e7)
>>   #define AUIPC_T0        (0x00000297)
>> +#define JALR_RANGE        (JALR_SIGN_MASK - 1)
>>     #define to_jalr_t0(offset)                        \
>>       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
>> @@ -96,26 +103,14 @@ do {                                    \
>>       call[1] = to_jalr_t0(offset);                    \
>>   } while (0)
>>   -#define to_jalr_ra(offset)                        \
>> -    (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
>> -
>> -#define to_auipc_ra(offset)                        \
>> -    ((offset & JALR_SIGN_MASK) ?                    \
>> -    (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :    \
>> -    ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
>> -
>> -#define make_call_ra(caller, callee, call)                \
>> -do {                                    \
>> -    unsigned int offset =                        \
>> -        (unsigned long) (callee) - (unsigned long) (caller); \
>> -    call[0] = to_auipc_ra(offset);                    \
>> -    call[1] = to_jalr_ra(offset);                    \
>> -} while (0)
>> -
>>   /*
>> - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes 
>> here.
>> + * Only the jalr insn in the auipc+jalr is patched, so we make it 4
>> + * bytes here.
>>    */
>> -#define MCOUNT_INSN_SIZE 8
>> +#define MCOUNT_INSN_SIZE    4
>> +#define MCOUNT_AUIPC_SIZE    4
>> +#define MCOUNT_JALR_SIZE    4
>> +#define MCOUNT_NOP4_SIZE    4
>>     #ifndef __ASSEMBLY__
>>   struct dyn_ftrace;
>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>> index 1fd10555c580..cf78eef073a0 100644
>> --- a/arch/riscv/kernel/ftrace.c
>> +++ b/arch/riscv/kernel/ftrace.c
>> @@ -8,10 +8,21 @@
>>   #include <linux/ftrace.h>
>>   #include <linux/uaccess.h>
>>   #include <linux/memory.h>
>> +#include <linux/irqflags.h>
>>   #include <linux/stop_machine.h>
>>   #include <asm/cacheflush.h>
>>   #include <asm/text-patching.h>
>>   +unsigned long ftrace_call_adjust(unsigned long addr)
>> +{
>> +    return addr + MCOUNT_AUIPC_SIZE;
>> +}
>> +
>> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
>> +{
>> +    return fentry_ip - MCOUNT_AUIPC_SIZE;
>> +}
>> +
>
>
> Those functions cause the following errors when building with 
> !CONFIG_DYNAMIC_FTRACE, but I'm not sure how to fix this:
>
> ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
> ../arch/riscv/kernel/ftrace.c:19:35: error: 'MCOUNT_AUIPC_SIZE' 
> undeclared (first use in this function)
>    19 |                 return addr + 8 + MCOUNT_AUIPC_SIZE;
>       |                                   ^~~~~~~~~~~~~~~~~
> ../arch/riscv/kernel/ftrace.c:19:35: note: each undeclared identifier 
> is reported only once for each function it appears in
>   CC      fs/9p/vfs_dir.o
> ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
> ../arch/riscv/kernel/ftrace.c:26:28: error: 'MCOUNT_AUIPC_SIZE' 
> undeclared (first use in this function)
>    26 |         return fentry_ip - MCOUNT_AUIPC_SIZE;
>       |                            ^~~~~~~~~~~~~~~~~
>   CC      drivers/pci/pcie/pme.o
> ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
> ../arch/riscv/kernel/ftrace.c:22:1: error: control reaches end of 
> non-void function [-Werror=return-type]
>    22 | }
>       | ^
> ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
> ../arch/riscv/kernel/ftrace.c:27:1: error: control reaches end of 
> non-void function [-Werror=return-type]
>    27 | }
>       | ^
> cc1: some warnings being treated as errors
> make[5]: *** [../scripts/Makefile.build:203: 
> arch/riscv/kernel/ftrace.o] Error 1
>

So I fixed those errors with the following, let me know if that's not 
correct:

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index d65f06bfb4573..4c6c24380cfd9 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -13,6 +13,7 @@
  #include <asm/cacheflush.h>
  #include <asm/text-patching.h>

+#ifdef CONFIG_DYNAMIC_FTRACE
  unsigned long ftrace_call_adjust(unsigned long addr)
  {
         if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
@@ -26,7 +27,6 @@ unsigned long arch_ftrace_get_symaddr(unsigned long 
fentry_ip)
         return fentry_ip - MCOUNT_AUIPC_SIZE;
  }

-#ifdef CONFIG_DYNAMIC_FTRACE
  void arch_ftrace_update_code(int command)
  {
         mutex_lock(&text_mutex);
@@ -191,7 +191,12 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
         return 0;
  }

-#endif
+#else /* CONFIG_DYNAMIC_FTRACE */
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+       return addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE */

  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,


Thanks,

Alex


>
>>   #ifdef CONFIG_DYNAMIC_FTRACE
>>   void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
>>   {
>> @@ -32,51 +43,32 @@ void ftrace_arch_code_modify_post_process(void) 
>> __releases(&text_mutex)
>>       mutex_unlock(&text_mutex);
>>   }
>>   -static int ftrace_check_current_call(unsigned long hook_pos,
>> -                     unsigned int *expected)
>> +static int __ftrace_modify_call(unsigned long source, unsigned long 
>> target, bool validate)
>>   {
>> +    unsigned int call[2], offset;
>>       unsigned int replaced[2];
>> -    unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
>>   -    /* we expect nops at the hook position */
>> -    if (!expected)
>> -        expected = nops;
>> +    offset = target - source;
>> +    call[1] = to_jalr_t0(offset);
>>   -    /*
>> -     * Read the text we want to modify;
>> -     * return must be -EFAULT on read error
>> -     */
>> -    if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
>> -            MCOUNT_INSN_SIZE))
>> -        return -EFAULT;
>> -
>> -    /*
>> -     * Make sure it is what we expect it to be;
>> -     * return must be -EINVAL on failed comparison
>> -     */
>> -    if (memcmp(expected, replaced, sizeof(replaced))) {
>> -        pr_err("%p: expected (%08x %08x) but got (%08x %08x)\n",
>> -               (void *)hook_pos, expected[0], expected[1], replaced[0],
>> -               replaced[1]);
>> -        return -EINVAL;
>> +    if (validate) {
>> +        call[0] = to_auipc_t0(offset);
>> +        /*
>> +         * Read the text we want to modify;
>> +         * return must be -EFAULT on read error
>> +         */
>> +        if (copy_from_kernel_nofault(replaced, (void *)source, 2 * 
>> MCOUNT_INSN_SIZE))
>> +            return -EFAULT;
>> +
>> +        if (replaced[0] != call[0]) {
>> +            pr_err("%p: expected (%08x) but got (%08x)\n",
>> +                   (void *)source, call[0], replaced[0]);
>> +            return -EINVAL;
>> +        }
>>       }
>>   -    return 0;
>> -}
>> -
>> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned 
>> long target,
>> -                bool enable, bool ra)
>> -{
>> -    unsigned int call[2];
>> -    unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
>> -
>> -    if (ra)
>> -        make_call_ra(hook_pos, target, call);
>> -    else
>> -        make_call_t0(hook_pos, target, call);
>> -
>> -    /* Replace the auipc-jalr pair at once. Return -EPERM on write 
>> error. */
>> -    if (patch_insn_write((void *)hook_pos, enable ? call : nops, 
>> MCOUNT_INSN_SIZE))
>> +    /* Replace the jalr at once. Return -EPERM on write error. */
>> +    if (patch_insn_write((void *)(source + MCOUNT_AUIPC_SIZE), call 
>> + 1, MCOUNT_JALR_SIZE))
>>           return -EPERM;
>>         return 0;
>> @@ -84,22 +76,21 @@ static int __ftrace_modify_call(unsigned long 
>> hook_pos, unsigned long target,
>>     int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>>   {
>> -    unsigned int call[2];
>> +    unsigned long distance, orig_addr, pc = rec->ip - 
>> MCOUNT_AUIPC_SIZE;
>>   -    make_call_t0(rec->ip, addr, call);
>> -
>> -    if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
>> -        return -EPERM;
>> +    orig_addr = (unsigned long)&ftrace_caller;
>> +    distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
>> +    if (distance > JALR_RANGE)
>> +        return -EINVAL;
>>   -    return 0;
>> +    return __ftrace_modify_call(pc, addr, false);
>>   }
>>   -int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>> -            unsigned long addr)
>> +int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, 
>> unsigned long addr)
>>   {
>> -    unsigned int nops[2] = {RISCV_INSN_NOP4, RISCV_INSN_NOP4};
>> +    u32 nop4 = RISCV_INSN_NOP4;
>>   -    if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
>> +    if (patch_insn_write((void *)rec->ip, &nop4, MCOUNT_NOP4_SIZE))
>>           return -EPERM;
>>         return 0;
>> @@ -114,21 +105,38 @@ int ftrace_make_nop(struct module *mod, struct 
>> dyn_ftrace *rec,
>>    */
>>   int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>>   {
>> -    int out;
>> +    unsigned long pc = rec->ip - MCOUNT_AUIPC_SIZE;
>> +    unsigned int nops[2], offset;
>> +    int ret;
>> +
>> +    offset = (unsigned long) &ftrace_caller - pc;
>> +    nops[0] = to_auipc_t0(offset);
>> +    nops[1] = RISCV_INSN_NOP4;
>>         mutex_lock(&text_mutex);
>> -    out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>> +    ret = patch_insn_write((void *)pc, nops, 2 * MCOUNT_INSN_SIZE);
>>       mutex_unlock(&text_mutex);
>>   -    return out;
>> +    return ret;
>>   }
>>   +ftrace_func_t ftrace_call_dest = ftrace_stub;
>>   int ftrace_update_ftrace_func(ftrace_func_t func)
>>   {
>> -    int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
>> -                       (unsigned long)func, true, true);
>> -
>> -    return ret;
>> +    WRITE_ONCE(ftrace_call_dest, func);
>> +    /*
>> +     * The data fence ensure that the update to ftrace_call_dest 
>> happens
>> +     * before the write to function_trace_op later in the generic 
>> ftrace.
>> +     * If the sequence is not enforced, then an old ftrace_call_dest 
>> may
>> +     * race loading a new function_trace_op set in 
>> ftrace_modify_all_code
>> +     *
>> +     * If we are in stop_machine, then we don't need to call remote 
>> fence
>> +     * as there is no concurrent read-side of ftrace_call_dest.
>> +     */
>> +    smp_wmb();
>> +    if (!irqs_disabled())
>> +        smp_call_function(ftrace_sync_ipi, NULL, 1);
>> +    return 0;
>>   }
>>     struct ftrace_modify_param {
>> @@ -172,17 +180,11 @@ void arch_ftrace_update_code(int command)
>>   int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>>                  unsigned long addr)
>>   {
>> +    unsigned long caller = rec->ip - MCOUNT_AUIPC_SIZE;
>>       unsigned int call[2];
>> -    unsigned long caller = rec->ip;
>> -    int ret;
>>         make_call_t0(caller, old_addr, call);
>> -    ret = ftrace_check_current_call(caller, call);
>> -
>> -    if (ret)
>> -        return ret;
>> -
>> -    return __ftrace_modify_call(caller, addr, true, false);
>> +    return __ftrace_modify_call(caller, addr, true);
>>   }
>>   #endif
>>   diff --git a/arch/riscv/kernel/mcount-dyn.S 
>> b/arch/riscv/kernel/mcount-dyn.S
>> index 3f06b40bb6c8..8aa554d56096 100644
>> --- a/arch/riscv/kernel/mcount-dyn.S
>> +++ b/arch/riscv/kernel/mcount-dyn.S
>> @@ -13,7 +13,6 @@
>>         .text
>>   -#define FENTRY_RA_OFFSET    8
>>   #define ABI_SIZE_ON_STACK    80
>>   #define ABI_A0            0
>>   #define ABI_A1            8
>> @@ -62,8 +61,7 @@
>>   * After the stack is established,
>>   *
>>   * 0(sp) stores the PC of the traced function which can be accessed
>> -* by &(fregs)->epc in tracing function. Note that the real
>> -* function entry address should be computed with -FENTRY_RA_OFFSET.
>> +* by &(fregs)->epc in tracing function.
>>   *
>>   * 8(sp) stores the function return address (i.e. parent IP) that
>>   * can be accessed by &(fregs)->ra in tracing function.
>> @@ -140,7 +138,7 @@
>>       .endm
>>         .macro PREPARE_ARGS
>> -    addi    a0, t0, -FENTRY_RA_OFFSET
>> +    addi    a0, t0, -MCOUNT_JALR_SIZE    // ip (callsite's jalr insn)
>>       la    a1, function_trace_op
>>       REG_L    a2, 0(a1)
>>       mv    a1, ra
>> @@ -153,7 +151,8 @@ SYM_FUNC_START(ftrace_caller)
>>       PREPARE_ARGS
>>     SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>> -    call    ftrace_stub
>> +    REG_L    ra, ftrace_call_dest
>> +    jalr    ra, 0(ra)
>>         RESTORE_ABI_REGS
>>       bnez    t1, .Ldirect
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS
  2025-04-10 20:05 ` [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Björn Töpel
@ 2025-05-07 13:58   ` Andy Chiu
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Chiu @ 2025-05-07 13:58 UTC (permalink / raw)
  To: Björn Töpel
  Cc: linux-riscv, alexghiti, palmer, Steven Rostedt, Andy Chiu,
	Evgenii Shatokhin, Nathan Chancellor, Björn Töpel,
	Palmer Dabbelt, Puranjay Mohan, linux-kernel, linux-trace-kernel,
	Alexandre Ghiti, paul.walmsley, samuel.holland

On Fri, Apr 11, 2025 at 4:05 AM Björn Töpel <bjorn@kernel.org> wrote:
>
> (Trimming the Cc:-list a bit)
>
> Hi Andy!
>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > From: Andy Chiu <andy.chiu@sifive.com>
> ...
>
> (Oh, if you send another revision, please add a cover, so it's easier to
> comment around the full series.)

I am so sorry about this! I wrote a cover letter but apparently I
didn't send it. I am attaching the original cover letter below. Please
let me know if there is any better way to address this trouble.

>
> Thanks a lot for doing the lion part of this series, Andy! With this in
> place, *finally* stop machine way of text patching is past us, and we
> can move RISC-V out from the 20th century. ;-)
>
> I applied your series, and Steven's series [1] to [2], and ran that on
> QEMU (riscv64, ~RVA23), and on Milk-V Jupiter (Spacemit K1) with:
>  * CONFIG_FTRACE_STARTUP_TEST
>  * ftrace kselftest
>
> No visible regressions, and now the ftrace kselftest can actually
> complete! For the series:
>
> Tested-by: Björn Töpel <bjorn@rivosinc.com>
>
> [1] https://lore.kernel.org/all/20250409151549.788068911@goodmis.org/
> [2] commit 3b07108ada81 ("Merge tag 'linux_kselftest-fixes-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest")
>
>
> Björn

This series makes atomic code patching in ftrace possible and eliminates
the need of the stop_machine dance. The major difference of this version
is that we merge the CALL_OPS support from Puranjay [1] and make direct
calls available for practical uses such as BPF. Thanks for the time
reviewing the series and suggestions, we hope this version gets a step
closer to happening in the upstream.

Please reference the link to v3 below for more introductory view of the
implementation

Added patch: 2, 4, 10, 11, 12
Modified patch: 5, 6
Unchanged patch: 1, 3, 7, 8, 9
(1, 8 has commit msg modified)

Special thanks to Björn for his efforts on testing and guiding the
series!

[1]: https://lore.kernel.org/lkml/20240306165904.108141-1-puranjay12@gmail.com/

Changes in v4:
- Rebase on top of v6.15-rc1
- Add a fix tag and R-b from Björn (1)
- Remove unused code enclosed by !WITH_ARG and unused macros (2, 5)
- Export ftrace_sync_ipi for its use in riscv (3, 4)
- Fix a bug with kprobe after probing at the start of symbol is allowed,
  by correcting ftrace_call_adjust (5, 11)
- Synchronize update of ftrace destination and the data passed to it
  (5)
- Include Puranjay's patch for CALL_OPS (10)
- Support direct calls based on CALL_OPS (11)
- Add a documentation that breifly explain CMODX for dynamic ftrace (12)
- Link to v3: https://lore.kernel.org/r/20241127172908.17149-1-andybnac@gmail.com

Changes in v3:
- Add a fix tag for patch 1
- Add a data fence before sending out remote fence.i (6)
- Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/

Changes in v2:
- Drop patch 1 as it is merged through fixes.
- Drop patch 2, which converts kernel_text_address into notrace. As
  users can prevent tracing it by configuring the tracefs.
- Use a more generic way in kconfig to align functions.
- Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com

Andy Chiu (11):
  riscv: ftrace: support fastcc in Clang for WITH_ARGS
  riscv: ftrace factor out code defined by !WITH_ARG
  riscv: ftrace: align patchable functions to 4 Byte boundary
  kernel: ftrace: export ftrace_sync_ipi
  riscv: ftrace: prepare ftrace for atomic code patching
  riscv: ftrace: do not use stop_machine to update code
  riscv: vector: Support calling schedule() for preemptible Vector
  riscv: add a data fence for CMODX in the kernel mode
  riscv: ftrace: support PREEMPT
  riscv: ftrace: support direct call using call_ops
  riscv: Documentation: add a description about dynamic ftrace

Puranjay Mohan (1):
  riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS

 Documentation/arch/riscv/cmodx.rst |  46 +++++-
 arch/riscv/Kconfig                 |   8 +-
 arch/riscv/Makefile                |   4 +-
 arch/riscv/include/asm/ftrace.h    |  62 ++++----
 arch/riscv/include/asm/processor.h |   5 +
 arch/riscv/include/asm/vector.h    |  22 ++-
 arch/riscv/kernel/asm-offsets.c    |  13 ++
 arch/riscv/kernel/ftrace.c         | 241 +++++++++++++++--------------
 arch/riscv/kernel/mcount-dyn.S     | 117 ++++++++------
 arch/riscv/mm/cacheflush.c         |  15 +-
 include/linux/ftrace.h             |   1 +
 kernel/trace/ftrace.c              |   2 +-
 12 files changed, 327 insertions(+), 209 deletions(-)

--
2.39.3 (Apple Git-145)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-05-05 14:06     ` Alexandre Ghiti
@ 2025-05-07 14:18       ` Andy Chiu
  2025-05-07 14:35         ` Alexandre Ghiti
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Chiu @ 2025-05-07 14:18 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux-riscv, alexghiti, palmer, Andy Chiu, Björn Töpel,
	linux-kernel, linux-trace-kernel, Mark Rutland, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

On Mon, May 5, 2025 at 10:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> On 23/04/2025 10:22, Alexandre Ghiti wrote:
> > Hi Andy,
> >
> > On 07/04/2025 20:08, Andy Chiu wrote:
> >> From: Andy Chiu <andy.chiu@sifive.com>
> >>
> >> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> >> instruction fetch can break down to 4 byte at a time, it is impossible
> >> to update two instructions without a race. In order to mitigate it, we
> >> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> >> patching can change NOP4 to JALR to eable/disable ftrcae from a
> >> function. This limits the reach of each ftrace entry to +-2KB displacing
> >> from ftrace_caller.
> >>
> >> Starting from the trampoline, we add a level of indirection for it to
> >> reach ftrace caller target. Now, it loads the target address from a
> >> memory location, then perform the jump. This enable the kernel to update
> >> the target atomically.
> >>
> >> The new don't-stop-the-world text patching on change only one RISC-V
> >> instruction:
> >>
> >>    |  -8: &ftrace_ops of the associated tracer function.
> >>    | <ftrace enable>:
> >>    |   0: auipc  t0, hi(ftrace_caller)
> >>    |   4: jalr   t0, lo(ftrace_caller)
> >>    |
> >>    |  -8: &ftrace_nop_ops
> >>    | <ftrace disable>:
> >>    |   0: auipc  t0, hi(ftrace_caller)
> >>    |   4: nop
> >>
> >> This means that f+0x0 is fixed, and should not be claimed by ftrace,
> >> e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
> >> offset and MCOUNT_INSN_SIZE accordingly.
> >>
> >> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> >> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> >> ---
> >> Changelog v4:
> >>   - Include Björn's fix for kprobe
> >>   - Refactor code for better reading (Robbin, Björn)
> >>   - Remove make_call_ra and friedns (Björn)
> >>   - Update comments to match reality (Björn)
> >>   - Drop code defined by !WITH_ARG
> >>   - Add a synchronization point when updating ftrace_call_dest (Björn)
> >> ---
> >>   arch/riscv/include/asm/ftrace.h |  49 ++++++------
> >>   arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
> >>   arch/riscv/kernel/mcount-dyn.S  |   9 +--
> >>   3 files changed, 92 insertions(+), 96 deletions(-)
> >>
> >> diff --git a/arch/riscv/include/asm/ftrace.h
> >> b/arch/riscv/include/asm/ftrace.h
> >> index d8b2138bd9c6..6a5c0a7fb826 100644
> >> --- a/arch/riscv/include/asm/ftrace.h
> >> +++ b/arch/riscv/include/asm/ftrace.h
> >> @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
> >>   #define ftrace_return_address(n) return_address(n)
> >>     void _mcount(void);
> >> -static inline unsigned long ftrace_call_adjust(unsigned long addr)
> >> -{
> >> -    return addr;
> >> -}
> >> +unsigned long ftrace_call_adjust(unsigned long addr);
> >> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
> >> +#define ftrace_get_symaddr(fentry_ip)
> >> arch_ftrace_get_symaddr(fentry_ip)
> >>     /*
> >>    * Let's do like x86/arm64 and ignore the compat syscalls.
> >> @@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
> >>    * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
> >>    *          return address (original pc + 4)
> >>    *
> >> + * The first 2 instructions for each tracable function is compiled
> >> to 2 nop
> >> + * instructions. Then, the kernel initializes the first instruction
> >> to auipc at
> >> + * boot time (<ftrace disable>). The second instruction is patched
> >> to jalr to
> >> + * start the trace.
> >> + *
> >> + *<Image>:
> >> + * 0: nop
> >> + * 4: nop
> >> + *
> >>    *<ftrace enable>:
> >> - * 0: auipc  t0/ra, 0x?
> >> - * 4: jalr   t0/ra, ?(t0/ra)
> >> + * 0: auipc  t0, 0x?
> >> + * 4: jalr   t0, ?(t0)
> >>    *
> >>    *<ftrace disable>:
> >> - * 0: nop
> >> + * 0: auipc  t0, 0x?
> >>    * 4: nop
> >>    *
> >>    * Dynamic ftrace generates probes to call sites, so we must deal with
> >> @@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
> >>   #define AUIPC_OFFSET_MASK    (0xfffff000)
> >>   #define AUIPC_PAD        (0x00001000)
> >>   #define JALR_SHIFT        20
> >> -#define JALR_RA            (0x000080e7)
> >> -#define AUIPC_RA        (0x00000097)
> >>   #define JALR_T0            (0x000282e7)
> >>   #define AUIPC_T0        (0x00000297)
> >> +#define JALR_RANGE        (JALR_SIGN_MASK - 1)
> >>     #define to_jalr_t0(offset)                        \
> >>       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> >> @@ -96,26 +103,14 @@ do {                                    \
> >>       call[1] = to_jalr_t0(offset);                    \
> >>   } while (0)
> >>   -#define to_jalr_ra(offset)                        \
> >> -    (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
> >> -
> >> -#define to_auipc_ra(offset)                        \
> >> -    ((offset & JALR_SIGN_MASK) ?                    \
> >> -    (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :    \
> >> -    ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
> >> -
> >> -#define make_call_ra(caller, callee, call)                \
> >> -do {                                    \
> >> -    unsigned int offset =                        \
> >> -        (unsigned long) (callee) - (unsigned long) (caller); \
> >> -    call[0] = to_auipc_ra(offset);                    \
> >> -    call[1] = to_jalr_ra(offset);                    \
> >> -} while (0)
> >> -
> >>   /*
> >> - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes
> >> here.
> >> + * Only the jalr insn in the auipc+jalr is patched, so we make it 4
> >> + * bytes here.
> >>    */
> >> -#define MCOUNT_INSN_SIZE 8
> >> +#define MCOUNT_INSN_SIZE    4
> >> +#define MCOUNT_AUIPC_SIZE    4
> >> +#define MCOUNT_JALR_SIZE    4
> >> +#define MCOUNT_NOP4_SIZE    4
> >>     #ifndef __ASSEMBLY__
> >>   struct dyn_ftrace;
> >> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> >> index 1fd10555c580..cf78eef073a0 100644
> >> --- a/arch/riscv/kernel/ftrace.c
> >> +++ b/arch/riscv/kernel/ftrace.c
> >> @@ -8,10 +8,21 @@
> >>   #include <linux/ftrace.h>
> >>   #include <linux/uaccess.h>
> >>   #include <linux/memory.h>
> >> +#include <linux/irqflags.h>
> >>   #include <linux/stop_machine.h>
> >>   #include <asm/cacheflush.h>
> >>   #include <asm/text-patching.h>
> >>   +unsigned long ftrace_call_adjust(unsigned long addr)
> >> +{
> >> +    return addr + MCOUNT_AUIPC_SIZE;
> >> +}
> >> +
> >> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
> >> +{
> >> +    return fentry_ip - MCOUNT_AUIPC_SIZE;
> >> +}
> >> +
> >
> >
> > Those functions cause the following errors when building with
> > !CONFIG_DYNAMIC_FTRACE, but I'm not sure how to fix this:
> >
> > ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
> > ../arch/riscv/kernel/ftrace.c:19:35: error: 'MCOUNT_AUIPC_SIZE'
> > undeclared (first use in this function)
> >    19 |                 return addr + 8 + MCOUNT_AUIPC_SIZE;
> >       |                                   ^~~~~~~~~~~~~~~~~
> > ../arch/riscv/kernel/ftrace.c:19:35: note: each undeclared identifier
> > is reported only once for each function it appears in
> >   CC      fs/9p/vfs_dir.o
> > ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
> > ../arch/riscv/kernel/ftrace.c:26:28: error: 'MCOUNT_AUIPC_SIZE'
> > undeclared (first use in this function)
> >    26 |         return fentry_ip - MCOUNT_AUIPC_SIZE;
> >       |                            ^~~~~~~~~~~~~~~~~
> >   CC      drivers/pci/pcie/pme.o
> > ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
> > ../arch/riscv/kernel/ftrace.c:22:1: error: control reaches end of
> > non-void function [-Werror=return-type]
> >    22 | }
> >       | ^
> > ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
> > ../arch/riscv/kernel/ftrace.c:27:1: error: control reaches end of
> > non-void function [-Werror=return-type]
> >    27 | }
> >       | ^
> > cc1: some warnings being treated as errors
> > make[5]: *** [../scripts/Makefile.build:203:
> > arch/riscv/kernel/ftrace.o] Error 1
> >
>
> So I fixed those errors with the following, let me know if that's not
> correct:
>
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index d65f06bfb4573..4c6c24380cfd9 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -13,6 +13,7 @@
>   #include <asm/cacheflush.h>
>   #include <asm/text-patching.h>
>
> +#ifdef CONFIG_DYNAMIC_FTRACE
>   unsigned long ftrace_call_adjust(unsigned long addr)
>   {
>          if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
> @@ -26,7 +27,6 @@ unsigned long arch_ftrace_get_symaddr(unsigned long
> fentry_ip)
>          return fentry_ip - MCOUNT_AUIPC_SIZE;
>   }
>
> -#ifdef CONFIG_DYNAMIC_FTRACE
>   void arch_ftrace_update_code(int command)
>   {
>          mutex_lock(&text_mutex);
> @@ -191,7 +191,12 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>          return 0;
>   }
>
> -#endif
> +#else /* CONFIG_DYNAMIC_FTRACE */
> +unsigned long ftrace_call_adjust(unsigned long addr)
> +{
> +       return addr;
> +}
> +#endif /* CONFIG_DYNAMIC_FTRACE */
>
>   #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>   int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>

Hi Alex,

Yes, this is valid, thanks for noticing and fixing the error. I would
personally prefer leaving the #else /* CONFIG_DYNAMIC_FTRACE */ part
in ftrace.h, but it can also come later as a refactor. Or, I can
respin the series, with fixes on this and the previous patch, along
with some typos and variable declarations that Robin mentioned.

Thanks,
Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching
  2025-05-07 14:18       ` Andy Chiu
@ 2025-05-07 14:35         ` Alexandre Ghiti
  0 siblings, 0 replies; 17+ messages in thread
From: Alexandre Ghiti @ 2025-05-07 14:35 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, alexghiti, palmer, Andy Chiu, Björn Töpel,
	linux-kernel, linux-trace-kernel, Mark Rutland, puranjay12,
	paul.walmsley, greentime.hu, nick.hu, nylon.chen, eric.lin,
	vicent.chen, zong.li, yongxuan.wang, samuel.holland, olivia.chu,
	c2232430

Hi Andy,

On 07/05/2025 16:18, Andy Chiu wrote:
> On Mon, May 5, 2025 at 10:06 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>> On 23/04/2025 10:22, Alexandre Ghiti wrote:
>>> Hi Andy,
>>>
>>> On 07/04/2025 20:08, Andy Chiu wrote:
>>>> From: Andy Chiu <andy.chiu@sifive.com>
>>>>
>>>> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
>>>> instruction fetch can break down to 4 byte at a time, it is impossible
>>>> to update two instructions without a race. In order to mitigate it, we
>>>> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
>>>> patching can change NOP4 to JALR to eable/disable ftrcae from a
>>>> function. This limits the reach of each ftrace entry to +-2KB displacing
>>>> from ftrace_caller.
>>>>
>>>> Starting from the trampoline, we add a level of indirection for it to
>>>> reach ftrace caller target. Now, it loads the target address from a
>>>> memory location, then perform the jump. This enable the kernel to update
>>>> the target atomically.
>>>>
>>>> The new don't-stop-the-world text patching on change only one RISC-V
>>>> instruction:
>>>>
>>>>     |  -8: &ftrace_ops of the associated tracer function.
>>>>     | <ftrace enable>:
>>>>     |   0: auipc  t0, hi(ftrace_caller)
>>>>     |   4: jalr   t0, lo(ftrace_caller)
>>>>     |
>>>>     |  -8: &ftrace_nop_ops
>>>>     | <ftrace disable>:
>>>>     |   0: auipc  t0, hi(ftrace_caller)
>>>>     |   4: nop
>>>>
>>>> This means that f+0x0 is fixed, and should not be claimed by ftrace,
>>>> e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
>>>> offset and MCOUNT_INSN_SIZE accordingly.
>>>>
>>>> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
>>>> ---
>>>> Changelog v4:
>>>>    - Include Björn's fix for kprobe
>>>>    - Refactor code for better reading (Robbin, Björn)
>>>>    - Remove make_call_ra and friedns (Björn)
>>>>    - Update comments to match reality (Björn)
>>>>    - Drop code defined by !WITH_ARG
>>>>    - Add a synchronization point when updating ftrace_call_dest (Björn)
>>>> ---
>>>>    arch/riscv/include/asm/ftrace.h |  49 ++++++------
>>>>    arch/riscv/kernel/ftrace.c      | 130 ++++++++++++++++----------------
>>>>    arch/riscv/kernel/mcount-dyn.S  |   9 +--
>>>>    3 files changed, 92 insertions(+), 96 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/include/asm/ftrace.h
>>>> b/arch/riscv/include/asm/ftrace.h
>>>> index d8b2138bd9c6..6a5c0a7fb826 100644
>>>> --- a/arch/riscv/include/asm/ftrace.h
>>>> +++ b/arch/riscv/include/asm/ftrace.h
>>>> @@ -20,10 +20,9 @@ extern void *return_address(unsigned int level);
>>>>    #define ftrace_return_address(n) return_address(n)
>>>>      void _mcount(void);
>>>> -static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>>> -{
>>>> -    return addr;
>>>> -}
>>>> +unsigned long ftrace_call_adjust(unsigned long addr);
>>>> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip);
>>>> +#define ftrace_get_symaddr(fentry_ip)
>>>> arch_ftrace_get_symaddr(fentry_ip)
>>>>      /*
>>>>     * Let's do like x86/arm64 and ignore the compat syscalls.
>>>> @@ -57,12 +56,21 @@ struct dyn_arch_ftrace {
>>>>     * 2) jalr: setting low-12 offset to ra, jump to ra, and set ra to
>>>>     *          return address (original pc + 4)
>>>>     *
>>>> + * The first 2 instructions for each tracable function is compiled
>>>> to 2 nop
>>>> + * instructions. Then, the kernel initializes the first instruction
>>>> to auipc at
>>>> + * boot time (<ftrace disable>). The second instruction is patched
>>>> to jalr to
>>>> + * start the trace.
>>>> + *
>>>> + *<Image>:
>>>> + * 0: nop
>>>> + * 4: nop
>>>> + *
>>>>     *<ftrace enable>:
>>>> - * 0: auipc  t0/ra, 0x?
>>>> - * 4: jalr   t0/ra, ?(t0/ra)
>>>> + * 0: auipc  t0, 0x?
>>>> + * 4: jalr   t0, ?(t0)
>>>>     *
>>>>     *<ftrace disable>:
>>>> - * 0: nop
>>>> + * 0: auipc  t0, 0x?
>>>>     * 4: nop
>>>>     *
>>>>     * Dynamic ftrace generates probes to call sites, so we must deal with
>>>> @@ -75,10 +83,9 @@ struct dyn_arch_ftrace {
>>>>    #define AUIPC_OFFSET_MASK    (0xfffff000)
>>>>    #define AUIPC_PAD        (0x00001000)
>>>>    #define JALR_SHIFT        20
>>>> -#define JALR_RA            (0x000080e7)
>>>> -#define AUIPC_RA        (0x00000097)
>>>>    #define JALR_T0            (0x000282e7)
>>>>    #define AUIPC_T0        (0x00000297)
>>>> +#define JALR_RANGE        (JALR_SIGN_MASK - 1)
>>>>      #define to_jalr_t0(offset)                        \
>>>>        (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
>>>> @@ -96,26 +103,14 @@ do {                                    \
>>>>        call[1] = to_jalr_t0(offset);                    \
>>>>    } while (0)
>>>>    -#define to_jalr_ra(offset)                        \
>>>> -    (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_RA)
>>>> -
>>>> -#define to_auipc_ra(offset)                        \
>>>> -    ((offset & JALR_SIGN_MASK) ?                    \
>>>> -    (((offset & AUIPC_OFFSET_MASK) + AUIPC_PAD) | AUIPC_RA) :    \
>>>> -    ((offset & AUIPC_OFFSET_MASK) | AUIPC_RA))
>>>> -
>>>> -#define make_call_ra(caller, callee, call)                \
>>>> -do {                                    \
>>>> -    unsigned int offset =                        \
>>>> -        (unsigned long) (callee) - (unsigned long) (caller); \
>>>> -    call[0] = to_auipc_ra(offset);                    \
>>>> -    call[1] = to_jalr_ra(offset);                    \
>>>> -} while (0)
>>>> -
>>>>    /*
>>>> - * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes
>>>> here.
>>>> + * Only the jalr insn in the auipc+jalr is patched, so we make it 4
>>>> + * bytes here.
>>>>     */
>>>> -#define MCOUNT_INSN_SIZE 8
>>>> +#define MCOUNT_INSN_SIZE    4
>>>> +#define MCOUNT_AUIPC_SIZE    4
>>>> +#define MCOUNT_JALR_SIZE    4
>>>> +#define MCOUNT_NOP4_SIZE    4
>>>>      #ifndef __ASSEMBLY__
>>>>    struct dyn_ftrace;
>>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>>> index 1fd10555c580..cf78eef073a0 100644
>>>> --- a/arch/riscv/kernel/ftrace.c
>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>> @@ -8,10 +8,21 @@
>>>>    #include <linux/ftrace.h>
>>>>    #include <linux/uaccess.h>
>>>>    #include <linux/memory.h>
>>>> +#include <linux/irqflags.h>
>>>>    #include <linux/stop_machine.h>
>>>>    #include <asm/cacheflush.h>
>>>>    #include <asm/text-patching.h>
>>>>    +unsigned long ftrace_call_adjust(unsigned long addr)
>>>> +{
>>>> +    return addr + MCOUNT_AUIPC_SIZE;
>>>> +}
>>>> +
>>>> +unsigned long arch_ftrace_get_symaddr(unsigned long fentry_ip)
>>>> +{
>>>> +    return fentry_ip - MCOUNT_AUIPC_SIZE;
>>>> +}
>>>> +
>>>
>>> Those functions cause the following errors when building with
>>> !CONFIG_DYNAMIC_FTRACE, but I'm not sure how to fix this:
>>>
>>> ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
>>> ../arch/riscv/kernel/ftrace.c:19:35: error: 'MCOUNT_AUIPC_SIZE'
>>> undeclared (first use in this function)
>>>     19 |                 return addr + 8 + MCOUNT_AUIPC_SIZE;
>>>        |                                   ^~~~~~~~~~~~~~~~~
>>> ../arch/riscv/kernel/ftrace.c:19:35: note: each undeclared identifier
>>> is reported only once for each function it appears in
>>>    CC      fs/9p/vfs_dir.o
>>> ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
>>> ../arch/riscv/kernel/ftrace.c:26:28: error: 'MCOUNT_AUIPC_SIZE'
>>> undeclared (first use in this function)
>>>     26 |         return fentry_ip - MCOUNT_AUIPC_SIZE;
>>>        |                            ^~~~~~~~~~~~~~~~~
>>>    CC      drivers/pci/pcie/pme.o
>>> ../arch/riscv/kernel/ftrace.c: In function 'ftrace_call_adjust':
>>> ../arch/riscv/kernel/ftrace.c:22:1: error: control reaches end of
>>> non-void function [-Werror=return-type]
>>>     22 | }
>>>        | ^
>>> ../arch/riscv/kernel/ftrace.c: In function 'arch_ftrace_get_symaddr':
>>> ../arch/riscv/kernel/ftrace.c:27:1: error: control reaches end of
>>> non-void function [-Werror=return-type]
>>>     27 | }
>>>        | ^
>>> cc1: some warnings being treated as errors
>>> make[5]: *** [../scripts/Makefile.build:203:
>>> arch/riscv/kernel/ftrace.o] Error 1
>>>
>> So I fixed those errors with the following, let me know if that's not
>> correct:
>>
>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>> index d65f06bfb4573..4c6c24380cfd9 100644
>> --- a/arch/riscv/kernel/ftrace.c
>> +++ b/arch/riscv/kernel/ftrace.c
>> @@ -13,6 +13,7 @@
>>    #include <asm/cacheflush.h>
>>    #include <asm/text-patching.h>
>>
>> +#ifdef CONFIG_DYNAMIC_FTRACE
>>    unsigned long ftrace_call_adjust(unsigned long addr)
>>    {
>>           if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
>> @@ -26,7 +27,6 @@ unsigned long arch_ftrace_get_symaddr(unsigned long
>> fentry_ip)
>>           return fentry_ip - MCOUNT_AUIPC_SIZE;
>>    }
>>
>> -#ifdef CONFIG_DYNAMIC_FTRACE
>>    void arch_ftrace_update_code(int command)
>>    {
>>           mutex_lock(&text_mutex);
>> @@ -191,7 +191,12 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
>>           return 0;
>>    }
>>
>> -#endif
>> +#else /* CONFIG_DYNAMIC_FTRACE */
>> +unsigned long ftrace_call_adjust(unsigned long addr)
>> +{
>> +       return addr;
>> +}
>> +#endif /* CONFIG_DYNAMIC_FTRACE */
>>
>>    #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>>    int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
>>
> Hi Alex,
>
> Yes, this is valid, thanks for noticing and fixing the error. I would
> personally prefer leaving the #else /* CONFIG_DYNAMIC_FTRACE */ part
> in ftrace.h, but it can also come later as a refactor. Or, I can
> respin the series, with fixes on this and the previous patch, along
> with some typos and variable declarations that Robin mentioned.


If you can respin a new patchset soon, that's nice, in the meantime I 
keep this version + my fixes (without Robbin remarks though) in my 
for-next branch as we definitely want this in 6.16 :)

So it's up to you!

Thanks,

Alex


>
> Thanks,
> Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS
  2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
                   ` (6 preceding siblings ...)
  2025-04-10 20:05 ` [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Björn Töpel
@ 2025-06-02 22:12 ` patchwork-bot+linux-riscv
  7 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+linux-riscv @ 2025-06-02 22:12 UTC (permalink / raw)
  To: Andy Chiu
  Cc: linux-riscv, alexghiti, palmer, andy.chiu, e.shatokhin, nathan,
	bjorn, palmer, puranjay, linux-kernel, linux-trace-kernel, llvm,
	mark.rutland, alex, nick.desaulniers+lkml, morbo, justinstitt,
	puranjay12, paul.walmsley, greentime.hu, nick.hu, nylon.chen,
	eric.lin, vicent.chen, zong.li, yongxuan.wang, samuel.holland,
	olivia.chu, c2232430

Hello:

This series was applied to riscv/linux.git (for-next)
by Alexandre Ghiti <alexghiti@rivosinc.com>:

On Tue,  8 Apr 2025 02:08:25 +0800 you wrote:
> From: Andy Chiu <andy.chiu@sifive.com>
> 
> Some caller-saved registers which are not defined as function arguments
> in the ABI can still be passed as arguments when the kernel is compiled
> with Clang. As a result, we must save and restore those registers to
> prevent ftrace from clobbering them.
> 
> [...]

Here is the summary with links:
  - [v4,01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS
    https://git.kernel.org/riscv/c/7cecf4f30c33
  - [v4,02/12] riscv: ftrace factor out code defined by !WITH_ARG
    https://git.kernel.org/riscv/c/2efa234f5e0c
  - [v4,03/12] riscv: ftrace: align patchable functions to 4 Byte boundary
    https://git.kernel.org/riscv/c/cced570c2c0c
  - [v4,04/12] kernel: ftrace: export ftrace_sync_ipi
    (no matching commit)
  - [v4,05/12] riscv: ftrace: prepare ftrace for atomic code patching
    (no matching commit)
  - [v4,06/12] riscv: ftrace: do not use stop_machine to update code
    (no matching commit)
  - [v4,07/12] riscv: vector: Support calling schedule() for preemptible Vector
    https://git.kernel.org/riscv/c/e2a8cbdbe932
  - [v4,08/12] riscv: add a data fence for CMODX in the kernel mode
    https://git.kernel.org/riscv/c/29b59e3bbb6e
  - [v4,09/12] riscv: ftrace: support PREEMPT
    https://git.kernel.org/riscv/c/f48ba55bb8a8
  - [v4,10/12] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
    (no matching commit)
  - [v4,11/12] riscv: ftrace: support direct call using call_ops
    https://git.kernel.org/riscv/c/7ef9ae7457c0
  - [v4,12/12] riscv: Documentation: add a description about dynamic ftrace
    https://git.kernel.org/riscv/c/0e07200b2af6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-06-02 22:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 18:08 [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
2025-04-07 18:08 ` [PATCH v4 02/12] riscv: ftrace factor out code defined by !WITH_ARG Andy Chiu
2025-04-07 18:08 ` [PATCH v4 04/12] kernel: ftrace: export ftrace_sync_ipi Andy Chiu
2025-04-08 22:31   ` kernel test robot
2025-04-23  8:13     ` Alexandre Ghiti
2025-04-07 18:08 ` [PATCH v4 05/12] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
2025-04-11 13:15   ` Robbin Ehn
2025-04-23  8:22   ` Alexandre Ghiti
2025-05-05 14:06     ` Alexandre Ghiti
2025-05-07 14:18       ` Andy Chiu
2025-05-07 14:35         ` Alexandre Ghiti
2025-04-07 18:08 ` [PATCH v4 06/12] riscv: ftrace: do not use stop_machine to update code Andy Chiu
2025-04-07 18:08 ` [PATCH v4 10/12] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Andy Chiu
2025-04-07 18:08 ` [PATCH v4 11/12] riscv: ftrace: support direct call using call_ops Andy Chiu
2025-04-10 20:05 ` [PATCH v4 01/12] riscv: ftrace: support fastcc in Clang for WITH_ARGS Björn Töpel
2025-05-07 13:58   ` Andy Chiu
2025-06-02 22:12 ` patchwork-bot+linux-riscv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).