[PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode
@ 2025-11-14  9:24 Menglong Dong
  2025-11-14  9:24 ` [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP Menglong Dong
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

For now, the bpf trampoline is called by the "call" instruction. However,
it break the RSB and introduce extra overhead in x86_64 arch.

For example, we hook the function "foo" with fexit, the call and return
logic will be like this:
  call foo -> call trampoline -> call foo-body ->
  return foo-body -> return foo

As we can see above, there are 3 call, but 2 return, which break the RSB
balance. We can pseudo a "return" here, but it's not the best choice,
as it will still cause once RSB miss:
  call foo -> call trampoline -> call foo-body ->
  return foo-body -> return dummy -> return foo

The "return dummy" doesn't pair the "call trampoline", which can also
cause the RSB miss.

Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
Alexei in [1]. And the logic will become this:
  call foo -> jmp trampoline -> call foo-body ->
  return foo-body -> return foo

As we can see above, the RSB is totally balanced. After the modification,
the performance of fexit increases from 76M/s to 130M/s.

In this series, we introduce the FTRACE_OPS_FL_JMP for ftrace to make it
use the "jmp" instruction instead of "call".

And we introduce the bpf_arch_text_poke_type(), which is able to specify
both the current and new opcode.

Not sure if I should split the first 2 patches into a separate series and
send to the ftrace tree.

Link: https://lore.kernel.org/bpf/CAADnVQLX54sVi1oaHrkSiLqjJaJdm3TQjoVrgU-LZimK6iDcSA@mail.gmail.com/[1]
Menglong Dong (7):
  ftrace: introduce FTRACE_OPS_FL_JMP
  x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP
  bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME
  bpf,x86: adjust the "jmp" mode for bpf trampoline
  bpf: introduce bpf_arch_text_poke_type
  bpf,x86: implement bpf_arch_text_poke_type for x86_64
  bpf: implement "jmp" mode for trampoline

 arch/riscv/net/bpf_jit_comp64.c |  2 +-
 arch/x86/Kconfig                |  1 +
 arch/x86/kernel/ftrace.c        |  7 ++++-
 arch/x86/kernel/ftrace_64.S     | 12 +++++++-
 arch/x86/net/bpf_jit_comp.c     | 45 ++++++++++++++++++++--------
 include/linux/bpf.h             | 22 ++++++++++++++
 include/linux/ftrace.h          | 48 +++++++++++++++++++++++++++++
 kernel/bpf/core.c               | 10 +++++++
 kernel/bpf/trampoline.c         | 53 +++++++++++++++++++++++++++------
 kernel/trace/Kconfig            | 12 ++++++++
 kernel/trace/ftrace.c           |  9 +++++-
 11 files changed, 195 insertions(+), 26 deletions(-)

-- 
2.51.2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 10:20   ` bot+bpf-ci
  2025-11-14  9:24 ` [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP Menglong Dong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

For now, the "nop" will be replaced with a "call" instruction when a
function is hooked by the ftrace. However, sometimes the "call" can break
the RSB and introduce extra overhead. Therefore, introduce the flag
FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called
with a "jmp" instead of "call". For now, it is only used by the direct
call case.

When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of
the ops->direct_call will be set to 1. Therefore, we can tell if we should
use "jmp" for the callback in ftrace_call_replace().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 include/linux/ftrace.h | 33 +++++++++++++++++++++++++++++++++
 kernel/trace/Kconfig   | 12 ++++++++++++
 kernel/trace/ftrace.c  |  9 ++++++++-
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7ded7df6e9b5..14705dec1b08 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -351,6 +351,7 @@ enum {
 	FTRACE_OPS_FL_DIRECT			= BIT(17),
 	FTRACE_OPS_FL_SUBOP			= BIT(18),
 	FTRACE_OPS_FL_GRAPH			= BIT(19),
+	FTRACE_OPS_FL_JMP			= BIT(20),
 };
 
 #ifndef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
@@ -569,6 +570,38 @@ static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs,
 						 unsigned long addr) { }
 #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_JMP
+static inline bool ftrace_is_jmp(unsigned long addr)
+{
+	return addr & 1;
+}
+
+static inline unsigned long ftrace_jmp_set(unsigned long addr)
+{
+	return addr | 1UL;
+}
+
+static inline unsigned long ftrace_jmp_get(unsigned long addr)
+{
+	return addr & ~1UL;
+}
+#else
+static inline bool ftrace_is_jmp(unsigned long addr)
+{
+	return false;
+}
+
+static inline unsigned long ftrace_jmp_set(unsigned long addr)
+{
+	return addr;
+}
+
+static inline unsigned long ftrace_jmp_get(unsigned long addr)
+{
+	return addr;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_JMP */
+
 #ifdef CONFIG_STACK_TRACER
 
 int stack_trace_sysctl(const struct ctl_table *table, int write, void *buffer,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index d2c79da81e4f..4661b9e606e0 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -80,6 +80,12 @@ config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
 	  If the architecture generates __patchable_function_entries sections
 	  but does not want them included in the ftrace locations.
 
+config HAVE_DYNAMIC_FTRACE_WITH_JMP
+	bool
+	help
+	  If the architecture supports to replace the __fentry__ with a
+	  "jmp" instruction.
+
 config HAVE_SYSCALL_TRACEPOINTS
 	bool
 	help
@@ -330,6 +336,12 @@ config DYNAMIC_FTRACE_WITH_ARGS
 	depends on DYNAMIC_FTRACE
 	depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS
 
+config DYNAMIC_FTRACE_WITH_JMP
+	def_bool y
+	depends on DYNAMIC_FTRACE
+	depends on DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	depends on HAVE_DYNAMIC_FTRACE_WITH_JMP
+
 config FPROBE
 	bool "Kernel Function Probe (fprobe)"
 	depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index efb5ce32298f..8d7b2a7f4b15 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5938,7 +5938,8 @@ static void remove_direct_functions_hash(struct ftrace_hash *hash, unsigned long
 	for (i = 0; i < size; i++) {
 		hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
 			del = __ftrace_lookup_ip(direct_functions, entry->ip);
-			if (del && del->direct == addr) {
+			if (del && ftrace_jmp_get(del->direct) ==
+				   ftrace_jmp_get(addr)) {
 				remove_hash_entry(direct_functions, del);
 				kfree(del);
 			}
@@ -5994,6 +5995,9 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
 
 	mutex_lock(&direct_mutex);
 
+	if (ops->flags & FTRACE_OPS_FL_JMP)
+		addr = ftrace_jmp_set(addr);
+
 	/* Make sure requested entries are not already registered.. */
 	size = 1 << hash->size_bits;
 	for (i = 0; i < size; i++) {
@@ -6117,6 +6121,9 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
 
 	lockdep_assert_held_once(&direct_mutex);
 
+	if (ops->flags & FTRACE_OPS_FL_JMP)
+		addr = ftrace_jmp_set(addr);
+
 	/* Enable the tmp_ops to have the same functions as the direct ops */
 	ftrace_ops_init(&tmp_ops);
 	tmp_ops.func_hash = ops->func_hash;
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
  2025-11-14  9:24 ` [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 16:39   ` Steven Rostedt
  2025-11-14  9:24 ` [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME Menglong Dong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace,
we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address
should use "jmp".

Meanwhile, adjust the direct call in the ftrace_regs_caller.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/x86/Kconfig            |  1 +
 arch/x86/kernel/ftrace.c    |  7 ++++++-
 arch/x86/kernel/ftrace_64.S | 12 +++++++++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fa3b616af03a..462250a20311 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -230,6 +230,7 @@ config X86
 	select HAVE_DYNAMIC_FTRACE_WITH_ARGS	if X86_64
 	select HAVE_FTRACE_REGS_HAVING_PT_REGS	if X86_64
 	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+	select HAVE_DYNAMIC_FTRACE_WITH_JMP	if X86_64
 	select HAVE_SAMPLE_FTRACE_DIRECT	if X86_64
 	select HAVE_SAMPLE_FTRACE_DIRECT_MULTI	if X86_64
 	select HAVE_EBPF_JIT
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 4450acec9390..0543b57f54ee 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -74,7 +74,12 @@ static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 	 * No need to translate into a callthunk. The trampoline does
 	 * the depth accounting itself.
 	 */
-	return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
+	if (ftrace_is_jmp(addr)) {
+		addr = ftrace_jmp_get(addr);
+		return text_gen_insn(JMP32_INSN_OPCODE, (void *)ip, (void *)addr);
+	} else {
+		return text_gen_insn(CALL_INSN_OPCODE, (void *)ip, (void *)addr);
+	}
 }
 
 static int ftrace_verify_code(unsigned long ip, const char *old_code)
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 367da3638167..068242e9c857 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -285,8 +285,18 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
 	ANNOTATE_NOENDBR
 	RET
 
+1:
+	testb	$1, %al
+	jz	2f
+	andq $0xfffffffffffffffe, %rax
+	movq %rax, MCOUNT_REG_SIZE+8(%rsp)
+	restore_mcount_regs
+	/* Restore flags */
+	popfq
+	RET
+
 	/* Swap the flags with orig_rax */
-1:	movq MCOUNT_REG_SIZE(%rsp), %rdi
+2:	movq MCOUNT_REG_SIZE(%rsp), %rdi
 	movq %rdi, MCOUNT_REG_SIZE-8(%rsp)
 	movq %rax, MCOUNT_REG_SIZE(%rsp)
 
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
  2025-11-14  9:24 ` [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP Menglong Dong
  2025-11-14  9:24 ` [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 18:23   ` Alexei Starovoitov
  2025-11-14  9:24 ` [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline Menglong Dong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Some places calculate the origin_call by checking if
BPF_TRAMP_F_SKIP_FRAME is set. However, it should use
BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/riscv/net/bpf_jit_comp64.c | 2 +-
 arch/x86/net/bpf_jit_comp.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 45cbc7c6fe49..21c70ae3296b 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1131,7 +1131,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	store_args(nr_arg_slots, args_off, ctx);
 
 	/* skip to actual body of traced function */
-	if (flags & BPF_TRAMP_F_SKIP_FRAME)
+	if (flags & BPF_TRAMP_F_ORIG_STACK)
 		orig_call += RV_FENTRY_NINSNS * 4;
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index de5083cb1d37..2d300ab37cdd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3272,7 +3272,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 
 	arg_stack_off = stack_size;
 
-	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		/* skip patched call instruction and point orig_call to actual
 		 * body of the kernel function.
 		 */
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
                   ` (2 preceding siblings ...)
  2025-11-14  9:24 ` [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 18:22   ` Alexei Starovoitov
  2025-11-14  9:24 ` [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type Menglong Dong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
that the trampoline is not called, but "jmp".

Introduce the function bpf_trampoline_need_jmp() to check if the
trampoline is in "jmp" mode.

Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
that we make is for the stack parameter passing case, as the stack
alignment logic changes in the "jmp" mode without the "rip". What's more,
the location of the parameters on the stack also changes.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/x86/net/bpf_jit_comp.c | 15 ++++++++++-----
 include/linux/bpf.h         | 12 ++++++++++++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 2d300ab37cdd..21ce2b8457ec 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2830,7 +2830,7 @@ static int get_nr_used_regs(const struct btf_func_model *m)
 }
 
 static void save_args(const struct btf_func_model *m, u8 **prog,
-		      int stack_size, bool for_call_origin)
+		      int stack_size, bool for_call_origin, bool jmp)
 {
 	int arg_regs, first_off = 0, nr_regs = 0, nr_stack_slots = 0;
 	int i, j;
@@ -2873,7 +2873,7 @@ static void save_args(const struct btf_func_model *m, u8 **prog,
 			 */
 			for (j = 0; j < arg_regs; j++) {
 				emit_ldx(prog, BPF_DW, BPF_REG_0, BPF_REG_FP,
-					 nr_stack_slots * 8 + 0x18);
+					 nr_stack_slots * 8 + 16 + (!jmp) * 8);
 				emit_stx(prog, BPF_DW, BPF_REG_FP, BPF_REG_0,
 					 -stack_size);
 
@@ -3267,7 +3267,12 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		 * should be 16-byte aligned. Following code depend on
 		 * that stack_size is already 8-byte aligned.
 		 */
-		stack_size += (stack_size % 16) ? 0 : 8;
+		if (bpf_trampoline_need_jmp(flags)) {
+			/* no rip in the "jmp" case */
+			stack_size += (stack_size % 16) ? 8 : 0;
+		} else {
+			stack_size += (stack_size % 16) ? 0 : 8;
+		}
 	}
 
 	arg_stack_off = stack_size;
@@ -3327,7 +3332,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off);
 	}
 
-	save_args(m, &prog, regs_off, false);
+	save_args(m, &prog, regs_off, false, bpf_trampoline_need_jmp(flags));
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		/* arg1: mov rdi, im */
@@ -3360,7 +3365,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		restore_regs(m, &prog, regs_off);
-		save_args(m, &prog, arg_stack_off, true);
+		save_args(m, &prog, arg_stack_off, true, bpf_trampoline_need_jmp(flags));
 
 		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
 			/* Before calling the original function, load the
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a47d67db3be5..d65a71042aa3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1263,6 +1263,18 @@ typedef void (*bpf_trampoline_exit_t)(struct bpf_prog *prog, u64 start,
 bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog);
 bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog);
 
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_JMP
+static inline bool bpf_trampoline_need_jmp(u64 flags)
+{
+	return flags & BPF_TRAMP_F_CALL_ORIG && !(flags & BPF_TRAMP_F_SKIP_FRAME);
+}
+#else
+static inline bool bpf_trampoline_need_jmp(u64 flags)
+{
+	return false;
+}
+#endif
+
 struct bpf_ksym {
 	unsigned long		 start;
 	unsigned long		 end;
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
                   ` (3 preceding siblings ...)
  2025-11-14  9:24 ` [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 10:20   ` bot+bpf-ci
  2025-11-14 18:41   ` Alexei Starovoitov
  2025-11-14  9:24 ` [PATCH RFC bpf-next 6/7] bpf,x86: implement bpf_arch_text_poke_type for x86_64 Menglong Dong
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Introduce the function bpf_arch_text_poke_type(), which is able to specify
both the current and new opcode. If it is not implemented by the arch,
bpf_arch_text_poke() will be called directly if the current opcode is the
same as the new one. Otherwise, -EOPNOTSUPP will be returned.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 include/linux/bpf.h |  4 ++++
 kernel/bpf/core.c   | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d65a71042aa3..aec7c65539f5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3711,6 +3711,10 @@ enum bpf_text_poke_type {
 	BPF_MOD_JUMP,
 };
 
+int bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
+			    enum bpf_text_poke_type new_t, void *addr1,
+			    void *addr2);
+
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		       void *addr1, void *addr2);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index d595fe512498..608c636e6cf0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -3135,6 +3135,16 @@ int __weak bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return -ENOTSUPP;
 }
 
+int __weak bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
+				   enum bpf_text_poke_type new_t, void *old_addr,
+				   void *new_addr)
+{
+	if (old_t == new_t)
+		return bpf_arch_text_poke(ip, old_t, old_addr, new_addr);
+
+	return -EOPNOTSUPP;
+}
+
 void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len)
 {
 	return ERR_PTR(-ENOTSUPP);
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 6/7] bpf,x86: implement bpf_arch_text_poke_type for x86_64
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
                   ` (4 preceding siblings ...)
  2025-11-14  9:24 ` [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14  9:24 ` [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline Menglong Dong
  2025-11-14 13:38 ` [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Steven Rostedt
  7 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Implement the bpf_arch_text_poke_type() for x86_64.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/x86/net/bpf_jit_comp.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 21ce2b8457ec..c82bd282988f 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -597,8 +597,9 @@ static int emit_jump(u8 **pprog, void *func, void *ip)
 	return emit_patch(pprog, func, ip, 0xE9);
 }
 
-static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
-				void *old_addr, void *new_addr)
+static int ___bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
+				 enum bpf_text_poke_type new_t,
+				 void *old_addr, void *new_addr)
 {
 	const u8 *nop_insn = x86_nops[5];
 	u8 old_insn[X86_PATCH_SIZE];
@@ -609,7 +610,7 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	memcpy(old_insn, nop_insn, X86_PATCH_SIZE);
 	if (old_addr) {
 		prog = old_insn;
-		ret = t == BPF_MOD_CALL ?
+		ret = old_t == BPF_MOD_CALL ?
 		      emit_call(&prog, old_addr, ip) :
 		      emit_jump(&prog, old_addr, ip);
 		if (ret)
@@ -619,7 +620,7 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	memcpy(new_insn, nop_insn, X86_PATCH_SIZE);
 	if (new_addr) {
 		prog = new_insn;
-		ret = t == BPF_MOD_CALL ?
+		ret = new_t == BPF_MOD_CALL ?
 		      emit_call(&prog, new_addr, ip) :
 		      emit_jump(&prog, new_addr, ip);
 		if (ret)
@@ -640,8 +641,15 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	return ret;
 }
 
-int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
-		       void *old_addr, void *new_addr)
+static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
+				 void *old_addr, void *new_addr)
+{
+	return ___bpf_arch_text_poke(ip, t, t, old_addr, new_addr);
+}
+
+int bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
+			    enum bpf_text_poke_type new_t, void *old_addr,
+			    void *new_addr)
 {
 	if (!is_kernel_text((long)ip) &&
 	    !is_bpf_text_address((long)ip))
@@ -655,7 +663,13 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 	if (is_endbr(ip))
 		ip += ENDBR_INSN_SIZE;
 
-	return __bpf_arch_text_poke(ip, t, old_addr, new_addr);
+	return ___bpf_arch_text_poke(ip, old_t, new_t, old_addr, new_addr);
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
+		       void *old_addr, void *new_addr)
+{
+	return bpf_arch_text_poke_type(ip, t, t, old_addr, new_addr);
 }
 
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
                   ` (5 preceding siblings ...)
  2025-11-14  9:24 ` [PATCH RFC bpf-next 6/7] bpf,x86: implement bpf_arch_text_poke_type for x86_64 Menglong Dong
@ 2025-11-14  9:24 ` Menglong Dong
  2025-11-14 18:50   ` Alexei Starovoitov
  2025-11-14 13:38 ` [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Steven Rostedt
  7 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14  9:24 UTC (permalink / raw)
  To: ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

Implement the "jmp" mode for the bpf trampoline. For the ftrace_managed
case, we need only to set the FTRACE_OPS_FL_JMP on the tr->fops if "jmp"
is needed.

For the bpf poke case, the new flag BPF_TRAMP_F_JMPED is introduced to
store and check if the trampoline is in the "jmp" mode.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 include/linux/bpf.h     |  6 +++++
 kernel/bpf/trampoline.c | 53 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aec7c65539f5..3598785ac8d1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1201,6 +1201,12 @@ struct btf_func_model {
  */
 #define BPF_TRAMP_F_INDIRECT		BIT(8)
 
+/*
+ * Indicate that the trampoline is using "jmp" instead of "call". This flag
+ * is only used in the !ftrace_managed case.
+ */
+#define BPF_TRAMP_F_JMPED		BIT(9)
+
 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
  * bytes on x86.
  */
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5949095e51c3..02a9f33d8f6c 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -175,15 +175,37 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 	return tr;
 }
 
-static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
+static int bpf_text_poke(struct bpf_trampoline *tr, void *old_addr,
+			 void *new_addr)
 {
+	enum bpf_text_poke_type new_t = BPF_MOD_CALL, old_t = BPF_MOD_CALL;
 	void *ip = tr->func.addr;
 	int ret;
 
+	if (bpf_trampoline_need_jmp(tr->flags))
+		new_t = BPF_MOD_JUMP;
+	if (tr->flags & BPF_TRAMP_F_JMPED)
+		old_t = BPF_MOD_JUMP;
+
+	ret = bpf_arch_text_poke_type(ip, old_t, new_t, old_addr, new_addr);
+	if (!ret) {
+		if (new_t == BPF_MOD_JUMP)
+			tr->flags |= BPF_TRAMP_F_JMPED;
+		else
+			tr->flags &= ~BPF_TRAMP_F_JMPED;
+	}
+
+	return ret;
+}
+
+static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
+{
+	int ret;
+
 	if (tr->func.ftrace_managed)
 		ret = unregister_ftrace_direct(tr->fops, (long)old_addr, false);
 	else
-		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, NULL);
+		ret = bpf_text_poke(tr, old_addr, NULL);
 
 	return ret;
 }
@@ -191,7 +213,6 @@ static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
 static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_addr,
 			 bool lock_direct_mutex)
 {
-	void *ip = tr->func.addr;
 	int ret;
 
 	if (tr->func.ftrace_managed) {
@@ -200,7 +221,7 @@ static int modify_fentry(struct bpf_trampoline *tr, void *old_addr, void *new_ad
 		else
 			ret = modify_ftrace_direct_nolock(tr->fops, (long)new_addr);
 	} else {
-		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, old_addr, new_addr);
+		ret = bpf_text_poke(tr, old_addr, new_addr);
 	}
 	return ret;
 }
@@ -223,7 +244,7 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 		ftrace_set_filter_ip(tr->fops, (unsigned long)ip, 0, 1);
 		ret = register_ftrace_direct(tr->fops, (long)new_addr);
 	} else {
-		ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr);
+		ret = bpf_text_poke(tr, NULL, new_addr);
 	}
 
 	return ret;
@@ -415,7 +436,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 	}
 
 	/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */
-	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
+	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX |
+		      BPF_TRAMP_F_JMPED);
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
@@ -432,9 +454,17 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
 again:
-	if ((tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) &&
-	    (tr->flags & BPF_TRAMP_F_CALL_ORIG))
-		tr->flags |= BPF_TRAMP_F_ORIG_STACK;
+	if (tr->flags & BPF_TRAMP_F_CALL_ORIG) {
+		if (tr->flags & BPF_TRAMP_F_SHARE_IPMODIFY) {
+			tr->flags |= BPF_TRAMP_F_ORIG_STACK;
+		} else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_JMP)) {
+			/* Use "jmp" instead of "call" for the trampoline
+			 * in the origin call case, and we don't need to
+			 * skip the frame.
+			 */
+			tr->flags &= ~BPF_TRAMP_F_SKIP_FRAME;
+		}
+	}
 #endif
 
 	size = arch_bpf_trampoline_size(&tr->func.model, tr->flags,
@@ -465,6 +495,11 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 	if (err)
 		goto out_free;
 
+	if (bpf_trampoline_need_jmp(tr->flags))
+		tr->fops->flags |= FTRACE_OPS_FL_JMP;
+	else
+		tr->fops->flags &= ~FTRACE_OPS_FL_JMP;
+
 	WARN_ON(tr->cur_image && total == 0);
 	if (tr->cur_image)
 		/* progs already running at this address */
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type
  2025-11-14  9:24 ` [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type Menglong Dong
@ 2025-11-14 10:20   ` bot+bpf-ci
  2025-11-14 18:41   ` Alexei Starovoitov
  1 sibling, 0 replies; 24+ messages in thread
From: bot+bpf-ci @ 2025-11-14 10:20 UTC (permalink / raw)
  To: menglong8.dong, ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8fb8c9bbd..975930e10 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3712,6 +3712,10 @@ enum bpf_text_poke_type {
>  	BPF_MOD_JUMP,
>  };
>
> +int bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
> +			    enum bpf_text_poke_type new_t, void *addr1,
> +			    void *addr2);
                                                          ^^^^^^^^^^^^

The header declaration uses parameter names addr1 and addr2, but the
implementation in kernel/bpf/core.c uses old_addr and new_addr. Should
these match? When developers implement arch-specific versions, they'll
see addr1/addr2 in the header but old_addr/new_addr in the weak default
implementation. The existing bpf_arch_text_poke() uses addr1/addr2, but
the x86 implementation of that function semantically treats them as
old_addr and new_addr. Would it be clearer to use old_addr/new_addr
consistently, or addr1/addr2 to match the existing function?

> +
>  int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>  		       void *addr1, void *addr2);
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index ef4448f18..400eb70fd 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -3156,6 +3156,16 @@ int __weak bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>  	return -ENOTSUPP;
>  }
>
> +int __weak bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
> +				   enum bpf_text_poke_type new_t, void *old_addr,
> +				   void *new_addr)
                                       ^^^^^^^^^^^^^^^^^^^^^^
> +{
> +	if (old_t == new_t)
> +		return bpf_arch_text_poke(ip, old_t, old_addr, new_addr);
> +
> +	return -EOPNOTSUPP;
> +}
> +
>  void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len)
>  {
>  	return ERR_PTR(-ENOTSUPP);


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19360353328

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP
  2025-11-14  9:24 ` [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP Menglong Dong
@ 2025-11-14 10:20   ` bot+bpf-ci
  2025-11-14 10:57     ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: bot+bpf-ci @ 2025-11-14 10:20 UTC (permalink / raw)
  To: menglong8.dong, ast, rostedt
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3431 bytes --]

> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 7ded7df6e..14705dec1 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -351,6 +351,7 @@ enum {
>  	FTRACE_OPS_FL_DIRECT			= BIT(17),
>  	FTRACE_OPS_FL_SUBOP			= BIT(18),
>  	FTRACE_OPS_FL_GRAPH			= BIT(19),
> +	FTRACE_OPS_FL_JMP			= BIT(20),
>  };
>
>  #ifndef CONFIG_DYNAMIC_FTRACE_WITH_ARGS

[ ... ]

> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_JMP
> +static inline bool ftrace_is_jmp(unsigned long addr)
> +{
> +	return addr & 1;
> +}
> +
> +static inline unsigned long ftrace_jmp_set(unsigned long addr)
> +{
> +	return addr | 1UL;
> +}
> +
> +static inline unsigned long ftrace_jmp_get(unsigned long addr)
> +{
> +	return addr & ~1UL;
> +}

[ ... ]

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 42bd2ba68..b143a79e6 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5938,7 +5938,8 @@ static void remove_direct_functions_hash(struct ftrace_hash *hash, unsigned long
>  	for (i = 0; i < size; i++) {
>  		hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
>  			del = __ftrace_lookup_ip(direct_functions, entry->ip);
> -			if (del && del->direct == addr) {
> +			if (del && ftrace_jmp_get(del->direct) ==
> +				   ftrace_jmp_get(addr)) {
>  				remove_hash_entry(direct_functions, del);
>  				kfree(del);
>  			}

[ ... ]

> @@ -5994,6 +5995,9 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
>
>  	mutex_lock(&direct_mutex);
>
> +	if (ops->flags & FTRACE_OPS_FL_JMP)
> +		addr = ftrace_jmp_set(addr);
> +

When FTRACE_OPS_FL_JMP is set, this encodes the flag into the last bit
of addr. The modified addr is then stored in entry->direct and
ops->direct_call. However, this flagged address later flows through
ftrace_find_rec_direct()->ftrace_get_addr_new()->ftrace_make_call()->
ftrace_call_replace() without being masked.

Can ftrace_call_replace() generate correct CALL instructions when addr
has the LSB set? On x86, ftrace_call_replace() calls text_gen_insn()
with the raw address to generate a relative CALL instruction. If addr
has bit 0 set to 1, this would generate a CALL to an odd address,
causing execution to skip the first instruction byte at the trampoline.

Similarly, at runtime call_direct_funcs() reads ops->direct_call and
passes it to arch_ftrace_set_direct_caller() without masking. The x86
assembly then does RET to this address, which would jump to addr+1 if
the LSB is set.

The commit message says "we can tell if we should use 'jmp' for the
callback in ftrace_call_replace()" but ftrace_call_replace() isn't
modified to check or mask the bit. Are there missing changes to handle
the encoded address in these paths?

>  	/* Make sure requested entries are not already registered.. */
>  	size = 1 << hash->size_bits;
>  	for (i = 0; i < size; i++) {

[ ... ]

> @@ -6117,6 +6121,9 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
>
>  	lockdep_assert_held_once(&direct_mutex);
>
> +	if (ops->flags & FTRACE_OPS_FL_JMP)
> +		addr = ftrace_jmp_set(addr);
> +

Same issue here - the flagged addr is stored but downstream consumers
don't mask it before using as a jump target.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19360353328

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP
  2025-11-14 10:20   ` bot+bpf-ci
@ 2025-11-14 10:57     ` Menglong Dong
  0 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-14 10:57 UTC (permalink / raw)
  To: menglong8.dong, ast, rostedt, bot+bpf-ci
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

On 2025/11/14 18:20, bot+bpf-ci@kernel.org wrote:
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 7ded7df6e..14705dec1 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -351,6 +351,7 @@ enum {
> >  	FTRACE_OPS_FL_DIRECT			= BIT(17),
> >  	FTRACE_OPS_FL_SUBOP			= BIT(18),
> >  	FTRACE_OPS_FL_GRAPH			= BIT(19),
> > +	FTRACE_OPS_FL_JMP			= BIT(20),
> >  };
> >
> >  #ifndef CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> 
> [ ... ]
> 
> > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_JMP
> > +static inline bool ftrace_is_jmp(unsigned long addr)
> > +{
> > +	return addr & 1;
> > +}
> > +
> > +static inline unsigned long ftrace_jmp_set(unsigned long addr)
> > +{
> > +	return addr | 1UL;
> > +}
> > +
> > +static inline unsigned long ftrace_jmp_get(unsigned long addr)
> > +{
> > +	return addr & ~1UL;
> > +}
> 
> [ ... ]
> 
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 42bd2ba68..b143a79e6 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -5938,7 +5938,8 @@ static void remove_direct_functions_hash(struct ftrace_hash *hash, unsigned long
> >  	for (i = 0; i < size; i++) {
> >  		hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
> >  			del = __ftrace_lookup_ip(direct_functions, entry->ip);
> > -			if (del && del->direct == addr) {
> > +			if (del && ftrace_jmp_get(del->direct) ==
> > +				   ftrace_jmp_get(addr)) {
> >  				remove_hash_entry(direct_functions, del);
> >  				kfree(del);
> >  			}
> 
> [ ... ]
> 
> > @@ -5994,6 +5995,9 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
> >
> >  	mutex_lock(&direct_mutex);
> >
> > +	if (ops->flags & FTRACE_OPS_FL_JMP)
> > +		addr = ftrace_jmp_set(addr);
> > +
> 
> When FTRACE_OPS_FL_JMP is set, this encodes the flag into the last bit
> of addr. The modified addr is then stored in entry->direct and
> ops->direct_call. However, this flagged address later flows through
> ftrace_find_rec_direct()->ftrace_get_addr_new()->ftrace_make_call()->
> ftrace_call_replace() without being masked.
> 
> Can ftrace_call_replace() generate correct CALL instructions when addr
> has the LSB set? On x86, ftrace_call_replace() calls text_gen_insn()
> with the raw address to generate a relative CALL instruction. If addr
> has bit 0 set to 1, this would generate a CALL to an odd address,
> causing execution to skip the first instruction byte at the trampoline.

If DYNAMIC_FTRACE_WITH_JMP is not enabled, ftrace_jmp_set() will
return the addr directly, so it's OK on this case. When
DYNAMIC_FTRACE_WITH_JMP is enabled, take x86_64 for example
in the next patch, it will get the real address in ftrace_call_replace()
with ftrace_jmp_get().

The arch-specification things is split to the next patch.

> 
> Similarly, at runtime call_direct_funcs() reads ops->direct_call and
> passes it to arch_ftrace_set_direct_caller() without masking. The x86
> assembly then does RET to this address, which would jump to addr+1 if
> the LSB is set.
> 
> The commit message says "we can tell if we should use 'jmp' for the
> callback in ftrace_call_replace()" but ftrace_call_replace() isn't
> modified to check or mask the bit. Are there missing changes to handle
> the encoded address in these paths?

It is modified in the next patch. And in order to reduce the risk, we
can wrap the code here with CONFIG_DYNAMIC_FTRACE_WITH_JMP.

> 
> >  	/* Make sure requested entries are not already registered.. */
> >  	size = 1 << hash->size_bits;
> >  	for (i = 0; i < size; i++) {
> 
> [ ... ]
> 
> > @@ -6117,6 +6121,9 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
> >
> >  	lockdep_assert_held_once(&direct_mutex);
> >
> > +	if (ops->flags & FTRACE_OPS_FL_JMP)
> > +		addr = ftrace_jmp_set(addr);
> > +
> 
> Same issue here - the flagged addr is stored but downstream consumers
> don't mask it before using as a jump target.
> 
> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19360353328
> 





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode
  2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
                   ` (6 preceding siblings ...)
  2025-11-14  9:24 ` [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline Menglong Dong
@ 2025-11-14 13:38 ` Steven Rostedt
  2025-11-14 13:58   ` Menglong Dong
  7 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2025-11-14 13:38 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Fri, 14 Nov 2025 17:24:43 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> Alexei in [1]. And the logic will become this:
>   call foo -> jmp trampoline -> call foo-body ->
>   return foo-body -> return foo

This obviously only works when there's a single function used by that
trampoline. It also doesn't allow tracing of the return side (it's
basically just the function tracer for a single function).

Is there any mechanism to make sure that the trampoline being called is
only used by that one function? I haven't looked at the code yet, but
should there be a test that makes sure a trampoline isn't registered for
two or more different functions?

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode
  2025-11-14 13:38 ` [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Steven Rostedt
@ 2025-11-14 13:58   ` Menglong Dong
  2025-11-14 16:28     ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-14 13:58 UTC (permalink / raw)
  To: Menglong Dong, Steven Rostedt
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On 2025/11/14 21:38, Steven Rostedt wrote:
> On Fri, 14 Nov 2025 17:24:43 +0800
> Menglong Dong <menglong8.dong@gmail.com> wrote:
> 
> > Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> > Alexei in [1]. And the logic will become this:
> >   call foo -> jmp trampoline -> call foo-body ->
> >   return foo-body -> return foo
> 
> This obviously only works when there's a single function used by that
> trampoline. It also doesn't allow tracing of the return side (it's
> basically just the function tracer for a single function).

Hi, Steven. I think you misunderstand something? For the fentry/fexit,
the whole process is:

call foo -> jmp trampoline -> call all the fentry bpf progs ->
call foo-body -> return foo-body -> call all the fexit bpf progs
-> return foo.

The "call foo-body" means "origin call", and it will store the
return value of the traced function to the stack, therefore the
fexit progs can get it.

So it can trace the return side with the "fexit". And it's almost the
same as the origin logic of the bpf trampoline:

call foo -> call trampoline -> call all the fentry bpf progs ->
call foo-body -> return foo-body -> call all the fexit bpf progs
-> skip the rip -> return foo.

What I did here is just replace the "call trampoline" to
"jmp trampoline".

> 
> Is there any mechanism to make sure that the trampoline being called is
> only used by that one function? I haven't looked at the code yet, but
> should there be a test that makes sure a trampoline isn't registered for
> two or more different functions?

As for now, the bpf trampoline is per-function. Every trampoline
has a unique key, and we find the trampoline for the target function
by that key. So it can't be used by two or more different functions.

If the trampoline need to get the ip of the origin call from the stack,
such as BPF_TRAMP_F_SHARE_IPMODIFY case, we will fallback to the
"call" mode, as we can't get the rip from the stack in the "jmp" mode.
And I think this is what you mean "only work for a single function"?
Yeah, we fallback on such case.

Thanks!
Menglong Dong

> 
> -- Steve
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode
  2025-11-14 13:58   ` Menglong Dong
@ 2025-11-14 16:28     ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2025-11-14 16:28 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Menglong Dong, ast, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa,
	mhiramat, mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Fri, 14 Nov 2025 21:58:34 +0800
Menglong Dong <menglong.dong@linux.dev> wrote:

> On 2025/11/14 21:38, Steven Rostedt wrote:
> > On Fri, 14 Nov 2025 17:24:43 +0800
> > Menglong Dong <menglong8.dong@gmail.com> wrote:
> >   
> > > Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by
> > > Alexei in [1]. And the logic will become this:
> > >   call foo -> jmp trampoline -> call foo-body ->
> > >   return foo-body -> return foo  
> > 
> > This obviously only works when there's a single function used by that
> > trampoline. It also doesn't allow tracing of the return side (it's
> > basically just the function tracer for a single function).  
> 
> Hi, Steven. I think you misunderstand something? For the fentry/fexit,
> the whole process is:

Yeah, I got a bit confused by the notation above.

> 
> call foo -> jmp trampoline -> call all the fentry bpf progs ->
> call foo-body -> return foo-body -> call all the fexit bpf progs
> -> return foo.  
> 
> The "call foo-body" means "origin call", and it will store the
> return value of the traced function to the stack, therefore the
> fexit progs can get it.
> 
> So it can trace the return side with the "fexit". And it's almost the
> same as the origin logic of the bpf trampoline:

OK, so this is just the way it always works.

> 
> call foo -> call trampoline -> call all the fentry bpf progs ->
> call foo-body -> return foo-body -> call all the fexit bpf progs
> -> skip the rip -> return foo.  
> 
> What I did here is just replace the "call trampoline" to
> "jmp trampoline".
> 
> > 
> > Is there any mechanism to make sure that the trampoline being called is
> > only used by that one function? I haven't looked at the code yet, but
> > should there be a test that makes sure a trampoline isn't registered for
> > two or more different functions?  
> 
> As for now, the bpf trampoline is per-function. Every trampoline
> has a unique key, and we find the trampoline for the target function
> by that key. So it can't be used by two or more different functions.
> 
> If the trampoline need to get the ip of the origin call from the stack,
> such as BPF_TRAMP_F_SHARE_IPMODIFY case, we will fallback to the
> "call" mode, as we can't get the rip from the stack in the "jmp" mode.
> And I think this is what you mean "only work for a single function"?
> Yeah, we fallback on such case.


OK, I got lost in the notation. It doesn't need a "call" because each
trampoline is only for a single function. Hence it doesn't need to know the
return address.

-- Steve


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP
  2025-11-14  9:24 ` [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP Menglong Dong
@ 2025-11-14 16:39   ` Steven Rostedt
  2025-11-15  2:12     ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2025-11-14 16:39 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Fri, 14 Nov 2025 17:24:45 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:

> --- a/arch/x86/kernel/ftrace_64.S
> +++ b/arch/x86/kernel/ftrace_64.S
> @@ -285,8 +285,18 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
>  	ANNOTATE_NOENDBR
>  	RET
>  
> +1:
> +	testb	$1, %al
> +	jz	2f
> +	andq $0xfffffffffffffffe, %rax
> +	movq %rax, MCOUNT_REG_SIZE+8(%rsp)
> +	restore_mcount_regs
> +	/* Restore flags */
> +	popfq
> +	RET
> +
>  	/* Swap the flags with orig_rax */
> -1:	movq MCOUNT_REG_SIZE(%rsp), %rdi
> +2:	movq MCOUNT_REG_SIZE(%rsp), %rdi
>  	movq %rdi, MCOUNT_REG_SIZE-8(%rsp)
>  	movq %rax, MCOUNT_REG_SIZE(%rsp)
>  

So in this case we have:

 original_caller:
 call foo -> foo:
             call fentry -> fentry:
                            [do ftrace callbacks ]
                            move tramp_addr to stack
                            RET -> tramp_addr
                                            tramp_addr:
                                            [..]
                                            call foo_body -> foo_body:
                                                             [..]
                                                             RET -> back to tramp_addr
                                            [..]
                                            RET -> back to original_caller

I guess that looks balanced.

-- Steve

                                                         

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline
  2025-11-14  9:24 ` [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline Menglong Dong
@ 2025-11-14 18:22   ` Alexei Starovoitov
  2025-11-15  2:14     ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-14 18:22 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
> that the trampoline is not called, but "jmp".
>
> Introduce the function bpf_trampoline_need_jmp() to check if the
> trampoline is in "jmp" mode.
>
> Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
> that we make is for the stack parameter passing case, as the stack
> alignment logic changes in the "jmp" mode without the "rip". What's more,
> the location of the parameters on the stack also changes.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>  arch/x86/net/bpf_jit_comp.c | 15 ++++++++++-----
>  include/linux/bpf.h         | 12 ++++++++++++
>  2 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 2d300ab37cdd..21ce2b8457ec 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2830,7 +2830,7 @@ static int get_nr_used_regs(const struct btf_func_model *m)
>  }
>
>  static void save_args(const struct btf_func_model *m, u8 **prog,
> -                     int stack_size, bool for_call_origin)
> +                     int stack_size, bool for_call_origin, bool jmp)

I have an allergy to bool args.

Please pass flags and do
boll jmp_based_tramp = bpf_trampoline_uses_jmp(flags);

I think bpf_trampoline_uses_jmp() is more descriptive than
bpf_trampoline_need_jmp().

The actual math lgtm.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME
  2025-11-14  9:24 ` [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME Menglong Dong
@ 2025-11-14 18:23   ` Alexei Starovoitov
  0 siblings, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-14 18:23 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Some places calculate the origin_call by checking if
> BPF_TRAMP_F_SKIP_FRAME is set. However, it should use
> BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>  arch/riscv/net/bpf_jit_comp64.c | 2 +-
>  arch/x86/net/bpf_jit_comp.c     | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 45cbc7c6fe49..21c70ae3296b 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -1131,7 +1131,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>         store_args(nr_arg_slots, args_off, ctx);
>
>         /* skip to actual body of traced function */
> -       if (flags & BPF_TRAMP_F_SKIP_FRAME)
> +       if (flags & BPF_TRAMP_F_ORIG_STACK)
>                 orig_call += RV_FENTRY_NINSNS * 4;
>
>         if (flags & BPF_TRAMP_F_CALL_ORIG) {
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index de5083cb1d37..2d300ab37cdd 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -3272,7 +3272,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>
>         arg_stack_off = stack_size;
>
> -       if (flags & BPF_TRAMP_F_SKIP_FRAME) {
> +       if (flags & BPF_TRAMP_F_CALL_ORIG) {

Good catch. Ack. Pls carry it in respin, so I don't
forget that I looked at it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type
  2025-11-14  9:24 ` [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type Menglong Dong
  2025-11-14 10:20   ` bot+bpf-ci
@ 2025-11-14 18:41   ` Alexei Starovoitov
  2025-11-15  2:26     ` Menglong Dong
  1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-14 18:41 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Introduce the function bpf_arch_text_poke_type(), which is able to specify
> both the current and new opcode. If it is not implemented by the arch,
> bpf_arch_text_poke() will be called directly if the current opcode is the
> same as the new one. Otherwise, -EOPNOTSUPP will be returned.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>  include/linux/bpf.h |  4 ++++
>  kernel/bpf/core.c   | 10 ++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index d65a71042aa3..aec7c65539f5 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3711,6 +3711,10 @@ enum bpf_text_poke_type {
>         BPF_MOD_JUMP,
>  };
>
> +int bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
> +                           enum bpf_text_poke_type new_t, void *addr1,
> +                           void *addr2);
> +

Instead of adding a new helper, I think, it's cleaner to change
the existing bpf_arch_text_poke() across all archs in one patch,
and also do:

enum bpf_text_poke_type {
+       BPF_MOD_NOP,
        BPF_MOD_CALL,
        BPF_MOD_JUMP,
};

and use that instead of addr[12] = !NULL to indicate
the transition.

The callsites will be easier to read when they will look like:
bpf_arch_text_poke(ip, BPF_MOD_CALL, BPF_MOD_CALL, old_addr, new_addr);

bpf_arch_text_poke(ip, BPF_MOD_NOP, BPF_MOD_CALL, NULL, new_addr);

bpf_arch_text_poke(ip, BPF_MOD_JMP, BPF_MOD_CALL, old_addr, new_addr);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline
  2025-11-14  9:24 ` [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline Menglong Dong
@ 2025-11-14 18:50   ` Alexei Starovoitov
  2025-11-15  2:39     ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-14 18:50 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Implement the "jmp" mode for the bpf trampoline. For the ftrace_managed
> case, we need only to set the FTRACE_OPS_FL_JMP on the tr->fops if "jmp"
> is needed.
>
> For the bpf poke case, the new flag BPF_TRAMP_F_JMPED is introduced to
> store and check if the trampoline is in the "jmp" mode.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> ---
>  include/linux/bpf.h     |  6 +++++
>  kernel/bpf/trampoline.c | 53 ++++++++++++++++++++++++++++++++++-------
>  2 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index aec7c65539f5..3598785ac8d1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1201,6 +1201,12 @@ struct btf_func_model {
>   */
>  #define BPF_TRAMP_F_INDIRECT           BIT(8)
>
> +/*
> + * Indicate that the trampoline is using "jmp" instead of "call". This flag
> + * is only used in the !ftrace_managed case.
> + */
> +#define BPF_TRAMP_F_JMPED              BIT(9)
> +
>  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
>   * bytes on x86.
>   */
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 5949095e51c3..02a9f33d8f6c 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -175,15 +175,37 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>         return tr;
>  }
>
> -static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
> +static int bpf_text_poke(struct bpf_trampoline *tr, void *old_addr,
> +                        void *new_addr)

The bpf_text_poke is a generic name. It really doesn't fit here.
Use bpf_trampoline_update_fentry() or something along those lines.

>  {
> +       enum bpf_text_poke_type new_t = BPF_MOD_CALL, old_t = BPF_MOD_CALL;
>         void *ip = tr->func.addr;
>         int ret;
>
> +       if (bpf_trampoline_need_jmp(tr->flags))
> +               new_t = BPF_MOD_JUMP;
> +       if (tr->flags & BPF_TRAMP_F_JMPED)
> +               old_t = BPF_MOD_JUMP;

Now I see why you picked _need_jmp().. to alternate with F_JMPED ?
_uses_jmp() suggestions isn't quite right.

How about bpf_trampoline_must_jmp() ?
and drop if (!ret) fallback and BPF_TRAMP_F_JMPED bit.
It doesn't look to be necessary.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP
  2025-11-14 16:39   ` Steven Rostedt
@ 2025-11-15  2:12     ` Menglong Dong
  0 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-15  2:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, mhiramat,
	mark.rutland, mathieu.desnoyers, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Nov 15, 2025 at 12:39 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 14 Nov 2025 17:24:45 +0800
> Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> > --- a/arch/x86/kernel/ftrace_64.S
> > +++ b/arch/x86/kernel/ftrace_64.S
> > @@ -285,8 +285,18 @@ SYM_INNER_LABEL(ftrace_regs_caller_end, SYM_L_GLOBAL)
> >       ANNOTATE_NOENDBR
> >       RET
> >
> > +1:
> > +     testb   $1, %al
> > +     jz      2f
> > +     andq $0xfffffffffffffffe, %rax
> > +     movq %rax, MCOUNT_REG_SIZE+8(%rsp)
> > +     restore_mcount_regs
> > +     /* Restore flags */
> > +     popfq
> > +     RET
> > +
> >       /* Swap the flags with orig_rax */
> > -1:   movq MCOUNT_REG_SIZE(%rsp), %rdi
> > +2:   movq MCOUNT_REG_SIZE(%rsp), %rdi
> >       movq %rdi, MCOUNT_REG_SIZE-8(%rsp)
> >       movq %rax, MCOUNT_REG_SIZE(%rsp)
> >
>
> So in this case we have:
>
>  original_caller:
>  call foo -> foo:
>              call fentry -> fentry:
>                             [do ftrace callbacks ]
>                             move tramp_addr to stack
>                             RET -> tramp_addr
>                                             tramp_addr:
>                                             [..]
>                                             call foo_body -> foo_body:
>                                                              [..]
>                                                              RET -> back to tramp_addr
>                                             [..]
>                                             RET -> back to original_caller

Nice flow chart, which I think we can put in the commit log.

>
> I guess that looks balanced.

Yes, it is balanced.

>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline
  2025-11-14 18:22   ` Alexei Starovoitov
@ 2025-11-15  2:14     ` Menglong Dong
  0 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-15  2:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Sat, Nov 15, 2025 at 2:22 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
> > that the trampoline is not called, but "jmp".
> >
> > Introduce the function bpf_trampoline_need_jmp() to check if the
> > trampoline is in "jmp" mode.
> >
> > Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
> > that we make is for the stack parameter passing case, as the stack
> > alignment logic changes in the "jmp" mode without the "rip". What's more,
> > the location of the parameters on the stack also changes.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> >  arch/x86/net/bpf_jit_comp.c | 15 ++++++++++-----
> >  include/linux/bpf.h         | 12 ++++++++++++
> >  2 files changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 2d300ab37cdd..21ce2b8457ec 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -2830,7 +2830,7 @@ static int get_nr_used_regs(const struct btf_func_model *m)
> >  }
> >
> >  static void save_args(const struct btf_func_model *m, u8 **prog,
> > -                     int stack_size, bool for_call_origin)
> > +                     int stack_size, bool for_call_origin, bool jmp)
>
> I have an allergy to bool args.
>
> Please pass flags and do
> boll jmp_based_tramp = bpf_trampoline_uses_jmp(flags);

Ok, I'll do this way in the next patch.

>
> I think bpf_trampoline_uses_jmp() is more descriptive than
> bpf_trampoline_need_jmp().

ACK.

>
> The actual math lgtm.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type
  2025-11-14 18:41   ` Alexei Starovoitov
@ 2025-11-15  2:26     ` Menglong Dong
  0 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-15  2:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Sat, Nov 15, 2025 at 2:42 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Introduce the function bpf_arch_text_poke_type(), which is able to specify
> > both the current and new opcode. If it is not implemented by the arch,
> > bpf_arch_text_poke() will be called directly if the current opcode is the
> > same as the new one. Otherwise, -EOPNOTSUPP will be returned.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> >  include/linux/bpf.h |  4 ++++
> >  kernel/bpf/core.c   | 10 ++++++++++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index d65a71042aa3..aec7c65539f5 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -3711,6 +3711,10 @@ enum bpf_text_poke_type {
> >         BPF_MOD_JUMP,
> >  };
> >
> > +int bpf_arch_text_poke_type(void *ip, enum bpf_text_poke_type old_t,
> > +                           enum bpf_text_poke_type new_t, void *addr1,
> > +                           void *addr2);
> > +
>
> Instead of adding a new helper, I think, it's cleaner to change
> the existing bpf_arch_text_poke() across all archs in one patch,
> and also do:
>
> enum bpf_text_poke_type {
> +       BPF_MOD_NOP,
>         BPF_MOD_CALL,
>         BPF_MOD_JUMP,
> };
>
> and use that instead of addr[12] = !NULL to indicate
> the transition.
>
> The callsites will be easier to read when they will look like:
> bpf_arch_text_poke(ip, BPF_MOD_CALL, BPF_MOD_CALL, old_addr, new_addr);
>
> bpf_arch_text_poke(ip, BPF_MOD_NOP, BPF_MOD_CALL, NULL, new_addr);
>
> bpf_arch_text_poke(ip, BPF_MOD_JMP, BPF_MOD_CALL, old_addr, new_addr);

Yeah, much clearer. The new helper also makes me feel a bit
dizzy.

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline
  2025-11-14 18:50   ` Alexei Starovoitov
@ 2025-11-15  2:39     ` Menglong Dong
  2025-11-15  2:42       ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-15  2:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Sat, Nov 15, 2025 at 2:50 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 14, 2025 at 1:25 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Implement the "jmp" mode for the bpf trampoline. For the ftrace_managed
> > case, we need only to set the FTRACE_OPS_FL_JMP on the tr->fops if "jmp"
> > is needed.
> >
> > For the bpf poke case, the new flag BPF_TRAMP_F_JMPED is introduced to
> > store and check if the trampoline is in the "jmp" mode.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> >  include/linux/bpf.h     |  6 +++++
> >  kernel/bpf/trampoline.c | 53 ++++++++++++++++++++++++++++++++++-------
> >  2 files changed, 50 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index aec7c65539f5..3598785ac8d1 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1201,6 +1201,12 @@ struct btf_func_model {
> >   */
> >  #define BPF_TRAMP_F_INDIRECT           BIT(8)
> >
> > +/*
> > + * Indicate that the trampoline is using "jmp" instead of "call". This flag
> > + * is only used in the !ftrace_managed case.
> > + */
> > +#define BPF_TRAMP_F_JMPED              BIT(9)
> > +
> >  /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
> >   * bytes on x86.
> >   */
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index 5949095e51c3..02a9f33d8f6c 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -175,15 +175,37 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> >         return tr;
> >  }
> >
> > -static int unregister_fentry(struct bpf_trampoline *tr, void *old_addr)
> > +static int bpf_text_poke(struct bpf_trampoline *tr, void *old_addr,
> > +                        void *new_addr)
>
> The bpf_text_poke is a generic name. It really doesn't fit here.
> Use bpf_trampoline_update_fentry() or something along those lines.

ACK.

>
> >  {
> > +       enum bpf_text_poke_type new_t = BPF_MOD_CALL, old_t = BPF_MOD_CALL;
> >         void *ip = tr->func.addr;
> >         int ret;
> >
> > +       if (bpf_trampoline_need_jmp(tr->flags))
> > +               new_t = BPF_MOD_JUMP;
> > +       if (tr->flags & BPF_TRAMP_F_JMPED)
> > +               old_t = BPF_MOD_JUMP;
>
> Now I see why you picked _need_jmp().. to alternate with F_JMPED ?
> _uses_jmp() suggestions isn't quite right.

Ah, some kind. The flags BPF_TRAMP_F_CALL_ORIG and
BPF_TRAMP_F_SKIP_FRAME are reset during the trampoline update,
and they are not stored. So the "_need_jmp" means that use "jmp"
for the new trampoline that we are going to update.

The BPF_TRAMP_F_JMPED is used to store if the current trampoline
is in "jmp" mode.

>
> How about bpf_trampoline_must_jmp() ?
> and drop if (!ret) fallback and BPF_TRAMP_F_JMPED bit.
> It doesn't look to be necessary.

I think you are right. We can check if current trampoline is in "jmp"
mode with the "orig_flags" instead, and remove the
BPF_TRAMP_F_JMPED. That means that I need to pass the
"orig_flags" to
modify_fentry -> bpf_trampoline_update_fentry(bpf_text_poke).

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline
  2025-11-15  2:39     ` Menglong Dong
@ 2025-11-15  2:42       ` Alexei Starovoitov
  0 siblings, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-15  2:42 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Steven Rostedt, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, bpf,
	LKML, linux-trace-kernel

On Fri, Nov 14, 2025 at 6:39 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > How about bpf_trampoline_must_jmp() ?
> > and drop if (!ret) fallback and BPF_TRAMP_F_JMPED bit.
> > It doesn't look to be necessary.
>
> I think you are right. We can check if current trampoline is in "jmp"
> mode with the "orig_flags" instead, and remove the
> BPF_TRAMP_F_JMPED. That means that I need to pass the
> "orig_flags" to
> modify_fentry -> bpf_trampoline_update_fentry(bpf_text_poke).

Yep. Makes sense to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-11-15  2:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14  9:24 [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 1/7] ftrace: introduce FTRACE_OPS_FL_JMP Menglong Dong
2025-11-14 10:20   ` bot+bpf-ci
2025-11-14 10:57     ` Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 2/7] x86/ftrace: implement DYNAMIC_FTRACE_WITH_JMP Menglong Dong
2025-11-14 16:39   ` Steven Rostedt
2025-11-15  2:12     ` Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 3/7] bpf: fix the usage of BPF_TRAMP_F_SKIP_FRAME Menglong Dong
2025-11-14 18:23   ` Alexei Starovoitov
2025-11-14  9:24 ` [PATCH RFC bpf-next 4/7] bpf,x86: adjust the "jmp" mode for bpf trampoline Menglong Dong
2025-11-14 18:22   ` Alexei Starovoitov
2025-11-15  2:14     ` Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 5/7] bpf: introduce bpf_arch_text_poke_type Menglong Dong
2025-11-14 10:20   ` bot+bpf-ci
2025-11-14 18:41   ` Alexei Starovoitov
2025-11-15  2:26     ` Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 6/7] bpf,x86: implement bpf_arch_text_poke_type for x86_64 Menglong Dong
2025-11-14  9:24 ` [PATCH RFC bpf-next 7/7] bpf: implement "jmp" mode for trampoline Menglong Dong
2025-11-14 18:50   ` Alexei Starovoitov
2025-11-15  2:39     ` Menglong Dong
2025-11-15  2:42       ` Alexei Starovoitov
2025-11-14 13:38 ` [PATCH RFC bpf-next 0/7] bpf trampoline support "jmp" mode Steven Rostedt
2025-11-14 13:58   ` Menglong Dong
2025-11-14 16:28     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).